Human reinforcement learning subdivides structured action spaces by learning effector-specific values

Samuel J. Gershman, Bijan Pesaran, Nathaniel D. Daw

Research output: Contribution to journalArticle

Abstract

Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning - such as prediction error signals for action valuation associated with dopamine and the striatum - can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.

Original languageEnglish (US)
Pages (from-to)13524-13531
Number of pages8
JournalJournal of Neuroscience
Volume29
Issue number43
DOIs
StatePublished - Oct 28 2009

Fingerprint

Learning
Hand
Choice Behavior
Oxygen
Corpus Striatum
Parietal Lobe
Reward
Dopamine
Reinforcement (Psychology)
Brain

ASJC Scopus subject areas

  • Neuroscience(all)

Cite this

Human reinforcement learning subdivides structured action spaces by learning effector-specific values. / Gershman, Samuel J.; Pesaran, Bijan; Daw, Nathaniel D.

In: Journal of Neuroscience, Vol. 29, No. 43, 28.10.2009, p. 13524-13531.

Research output: Contribution to journalArticle

@article{34f9426f576e48cca599638d9c277752,
title = "Human reinforcement learning subdivides structured action spaces by learning effector-specific values",
abstract = "Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning - such as prediction error signals for action valuation associated with dopamine and the striatum - can cope with this {"}curse of dimensionality.{"} We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to {"}divide and conquer{"} reinforcement learning over high-dimensional action spaces.",
author = "Gershman, {Samuel J.} and Bijan Pesaran and Daw, {Nathaniel D.}",
year = "2009",
month = "10",
day = "28",
doi = "10.1523/JNEUROSCI.2469-09.2009",
language = "English (US)",
volume = "29",
pages = "13524--13531",
journal = "Journal of Neuroscience",
issn = "0270-6474",
publisher = "Society for Neuroscience",
number = "43",

}

TY - JOUR

T1 - Human reinforcement learning subdivides structured action spaces by learning effector-specific values

AU - Gershman, Samuel J.

AU - Pesaran, Bijan

AU - Daw, Nathaniel D.

PY - 2009/10/28

Y1 - 2009/10/28

N2 - Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning - such as prediction error signals for action valuation associated with dopamine and the striatum - can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.

AB - Humans and animals are endowed with a large number of effectors. Although this enables great behavioral flexibility, it presents an equally formidable reinforcement learning problem of discovering which actions are most valuable because of the high dimensionality of the action space. An unresolved question is how neural systems for reinforcement learning - such as prediction error signals for action valuation associated with dopamine and the striatum - can cope with this "curse of dimensionality." We propose a reinforcement learning framework that allows for learned action valuations to be decomposed into effector-specific components when appropriate to a task, and test it by studying to what extent human behavior and blood oxygen level-dependent (BOLD) activity can exploit such a decomposition in a multieffector choice task. Subjects made simultaneous decisions with their left and right hands and received separate reward feedback for each hand movement. We found that choice behavior was better described by a learning model that decomposed the values of bimanual movements into separate values for each effector, rather than a traditional model that treated the bimanual actions as unitary with a single value. A decomposition of value into effector-specific components was also observed in value-related BOLD signaling, in the form of lateralized biases in striatal correlates of prediction error and anticipatory value correlates in the intraparietal sulcus. These results suggest that the human brain can use decomposed value representations to "divide and conquer" reinforcement learning over high-dimensional action spaces.

UR - http://www.scopus.com/inward/record.url?scp=70350521769&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70350521769&partnerID=8YFLogxK

U2 - 10.1523/JNEUROSCI.2469-09.2009

DO - 10.1523/JNEUROSCI.2469-09.2009

M3 - Article

VL - 29

SP - 13524

EP - 13531

JO - Journal of Neuroscience

JF - Journal of Neuroscience

SN - 0270-6474

IS - 43

ER -