Signals in human striatum are appropriate for policy update rather than value prediction

Jian Li, Nathaniel D. Daw

Research output: Contribution to journalArticle

Abstract

Influential reinforcement learning theories propose that prediction error signals in the brain's nigrostriatal system guide learning for trial-and-error decision-making. However, since different decision variables can be learned from quantitatively similar error signals, a critical question is: what is the content of decision representations trained by the error signals? We used fMRI to monitor neural activity in a two-armed bandit counterfactual decision task that provided human subjects with information about forgone and obtained monetary outcomes so as to dissociate teaching signals that update expected values for each action, versus signals that train relative preferences between actions (a policy). The reward probabilities of both choices varied independently from each other. This specific design allowed us to test whether subjects' choice behavior was guided by policy-based methods, which directly map states to advantageous actions, or value-based methods such as Q-learning, where choice policies are instead generated by learning an intermediate representation (reward expectancy). Behaviorally, we found human participants' choices were significantly influenced by obtained as well as forgone rewards from the previous trial. We also found subjects' blood oxygen level-dependent responses in striatum were modulated in opposite directions by the experienced and forgone rewards but not by reward expectancy. This neural pattern, as well as subjects' choice behavior, is consistent with a teaching signal for developing habits or relative action preferences, rather than prediction errors for updating separate action values.

Original languageEnglish (US)
Pages (from-to)5504-5511
Number of pages8
JournalJournal of Neuroscience
Volume31
Issue number14
DOIs
StatePublished - Apr 6 2011

Fingerprint

Reward
Learning
Choice Behavior
Teaching
Habits
Decision Making
Magnetic Resonance Imaging
Oxygen
Brain

ASJC Scopus subject areas

  • Neuroscience(all)

Cite this

Signals in human striatum are appropriate for policy update rather than value prediction. / Li, Jian; Daw, Nathaniel D.

In: Journal of Neuroscience, Vol. 31, No. 14, 06.04.2011, p. 5504-5511.

Research output: Contribution to journalArticle

Li, Jian ; Daw, Nathaniel D. / Signals in human striatum are appropriate for policy update rather than value prediction. In: Journal of Neuroscience. 2011 ; Vol. 31, No. 14. pp. 5504-5511.
@article{8499ea41e9734bfeb4348e05626551a1,
title = "Signals in human striatum are appropriate for policy update rather than value prediction",
abstract = "Influential reinforcement learning theories propose that prediction error signals in the brain's nigrostriatal system guide learning for trial-and-error decision-making. However, since different decision variables can be learned from quantitatively similar error signals, a critical question is: what is the content of decision representations trained by the error signals? We used fMRI to monitor neural activity in a two-armed bandit counterfactual decision task that provided human subjects with information about forgone and obtained monetary outcomes so as to dissociate teaching signals that update expected values for each action, versus signals that train relative preferences between actions (a policy). The reward probabilities of both choices varied independently from each other. This specific design allowed us to test whether subjects' choice behavior was guided by policy-based methods, which directly map states to advantageous actions, or value-based methods such as Q-learning, where choice policies are instead generated by learning an intermediate representation (reward expectancy). Behaviorally, we found human participants' choices were significantly influenced by obtained as well as forgone rewards from the previous trial. We also found subjects' blood oxygen level-dependent responses in striatum were modulated in opposite directions by the experienced and forgone rewards but not by reward expectancy. This neural pattern, as well as subjects' choice behavior, is consistent with a teaching signal for developing habits or relative action preferences, rather than prediction errors for updating separate action values.",
author = "Jian Li and Daw, {Nathaniel D.}",
year = "2011",
month = "4",
day = "6",
doi = "10.1523/JNEUROSCI.6316-10.2011",
language = "English (US)",
volume = "31",
pages = "5504--5511",
journal = "Journal of Neuroscience",
issn = "0270-6474",
publisher = "Society for Neuroscience",
number = "14",

}

TY - JOUR

T1 - Signals in human striatum are appropriate for policy update rather than value prediction

AU - Li, Jian

AU - Daw, Nathaniel D.

PY - 2011/4/6

Y1 - 2011/4/6

N2 - Influential reinforcement learning theories propose that prediction error signals in the brain's nigrostriatal system guide learning for trial-and-error decision-making. However, since different decision variables can be learned from quantitatively similar error signals, a critical question is: what is the content of decision representations trained by the error signals? We used fMRI to monitor neural activity in a two-armed bandit counterfactual decision task that provided human subjects with information about forgone and obtained monetary outcomes so as to dissociate teaching signals that update expected values for each action, versus signals that train relative preferences between actions (a policy). The reward probabilities of both choices varied independently from each other. This specific design allowed us to test whether subjects' choice behavior was guided by policy-based methods, which directly map states to advantageous actions, or value-based methods such as Q-learning, where choice policies are instead generated by learning an intermediate representation (reward expectancy). Behaviorally, we found human participants' choices were significantly influenced by obtained as well as forgone rewards from the previous trial. We also found subjects' blood oxygen level-dependent responses in striatum were modulated in opposite directions by the experienced and forgone rewards but not by reward expectancy. This neural pattern, as well as subjects' choice behavior, is consistent with a teaching signal for developing habits or relative action preferences, rather than prediction errors for updating separate action values.

AB - Influential reinforcement learning theories propose that prediction error signals in the brain's nigrostriatal system guide learning for trial-and-error decision-making. However, since different decision variables can be learned from quantitatively similar error signals, a critical question is: what is the content of decision representations trained by the error signals? We used fMRI to monitor neural activity in a two-armed bandit counterfactual decision task that provided human subjects with information about forgone and obtained monetary outcomes so as to dissociate teaching signals that update expected values for each action, versus signals that train relative preferences between actions (a policy). The reward probabilities of both choices varied independently from each other. This specific design allowed us to test whether subjects' choice behavior was guided by policy-based methods, which directly map states to advantageous actions, or value-based methods such as Q-learning, where choice policies are instead generated by learning an intermediate representation (reward expectancy). Behaviorally, we found human participants' choices were significantly influenced by obtained as well as forgone rewards from the previous trial. We also found subjects' blood oxygen level-dependent responses in striatum were modulated in opposite directions by the experienced and forgone rewards but not by reward expectancy. This neural pattern, as well as subjects' choice behavior, is consistent with a teaching signal for developing habits or relative action preferences, rather than prediction errors for updating separate action values.

UR - http://www.scopus.com/inward/record.url?scp=79955721719&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79955721719&partnerID=8YFLogxK

U2 - 10.1523/JNEUROSCI.6316-10.2011

DO - 10.1523/JNEUROSCI.6316-10.2011

M3 - Article

VL - 31

SP - 5504

EP - 5511

JO - Journal of Neuroscience

JF - Journal of Neuroscience

SN - 0270-6474

IS - 14

ER -