Neural correlates of forward planning in a spatial decision task in humans

Dylan Alexander Simon, Nathaniel D. Daw

Research output: Contribution to journalArticle

Abstract

Although reinforcement learning (RL) theories have been influential in characterizing the mechanisms for reward-guided choice in the brain, the predominant temporal difference (TD) algorithm cannot explain many flexible or goal-directed actions that have been demonstrated behaviorally. We investigate such actions by contrasting an RL algorithm that is model based, in that it relies on learning a map or model of the task and planning within it, to traditional model-free TD learning. To distinguish these approaches in humans, we used functional magnetic resonance imaging in a continuous spatial navigation task, in which frequent changes to the layout of the maze forced subjects continually to relearn their favored routes, thereby exposing the RL mechanisms used. We sought evidence for the neural substrates of such mechanisms by comparing choice behavior and blood oxygen level-dependent (BOLD) signals to decision variables extracted from simulations of either algorithm. Both choices and value-related BOLD signals in striatum, although most often associated with TD learning, were better explained by the model-based theory. Furthermore, predecessor quantities for the model-based value computation were correlated with BOLD signals in the medial temporal lobe and frontal cortex. These results point to a significant extension of both the computational and anatomical substrates for RL in the brain.

Original languageEnglish (US)
Pages (from-to)5526-5539
Number of pages14
JournalJournal of Neuroscience
Volume31
Issue number14
DOIs
StatePublished - Apr 6 2011

Fingerprint

Learning
Temporal Lobe
Oxygen
Choice Behavior
Brain
Frontal Lobe
Reward
Magnetic Resonance Imaging
Reinforcement (Psychology)

ASJC Scopus subject areas

  • Neuroscience(all)

Cite this

Neural correlates of forward planning in a spatial decision task in humans. / Simon, Dylan Alexander; Daw, Nathaniel D.

In: Journal of Neuroscience, Vol. 31, No. 14, 06.04.2011, p. 5526-5539.

Research output: Contribution to journalArticle

Simon, Dylan Alexander ; Daw, Nathaniel D. / Neural correlates of forward planning in a spatial decision task in humans. In: Journal of Neuroscience. 2011 ; Vol. 31, No. 14. pp. 5526-5539.
@article{e29a38775edc411bae67d6ec41de62b2,
title = "Neural correlates of forward planning in a spatial decision task in humans",
abstract = "Although reinforcement learning (RL) theories have been influential in characterizing the mechanisms for reward-guided choice in the brain, the predominant temporal difference (TD) algorithm cannot explain many flexible or goal-directed actions that have been demonstrated behaviorally. We investigate such actions by contrasting an RL algorithm that is model based, in that it relies on learning a map or model of the task and planning within it, to traditional model-free TD learning. To distinguish these approaches in humans, we used functional magnetic resonance imaging in a continuous spatial navigation task, in which frequent changes to the layout of the maze forced subjects continually to relearn their favored routes, thereby exposing the RL mechanisms used. We sought evidence for the neural substrates of such mechanisms by comparing choice behavior and blood oxygen level-dependent (BOLD) signals to decision variables extracted from simulations of either algorithm. Both choices and value-related BOLD signals in striatum, although most often associated with TD learning, were better explained by the model-based theory. Furthermore, predecessor quantities for the model-based value computation were correlated with BOLD signals in the medial temporal lobe and frontal cortex. These results point to a significant extension of both the computational and anatomical substrates for RL in the brain.",
author = "Simon, {Dylan Alexander} and Daw, {Nathaniel D.}",
year = "2011",
month = "4",
day = "6",
doi = "10.1523/JNEUROSCI.4647-10.2011",
language = "English (US)",
volume = "31",
pages = "5526--5539",
journal = "Journal of Neuroscience",
issn = "0270-6474",
publisher = "Society for Neuroscience",
number = "14",

}

TY - JOUR

T1 - Neural correlates of forward planning in a spatial decision task in humans

AU - Simon, Dylan Alexander

AU - Daw, Nathaniel D.

PY - 2011/4/6

Y1 - 2011/4/6

N2 - Although reinforcement learning (RL) theories have been influential in characterizing the mechanisms for reward-guided choice in the brain, the predominant temporal difference (TD) algorithm cannot explain many flexible or goal-directed actions that have been demonstrated behaviorally. We investigate such actions by contrasting an RL algorithm that is model based, in that it relies on learning a map or model of the task and planning within it, to traditional model-free TD learning. To distinguish these approaches in humans, we used functional magnetic resonance imaging in a continuous spatial navigation task, in which frequent changes to the layout of the maze forced subjects continually to relearn their favored routes, thereby exposing the RL mechanisms used. We sought evidence for the neural substrates of such mechanisms by comparing choice behavior and blood oxygen level-dependent (BOLD) signals to decision variables extracted from simulations of either algorithm. Both choices and value-related BOLD signals in striatum, although most often associated with TD learning, were better explained by the model-based theory. Furthermore, predecessor quantities for the model-based value computation were correlated with BOLD signals in the medial temporal lobe and frontal cortex. These results point to a significant extension of both the computational and anatomical substrates for RL in the brain.

AB - Although reinforcement learning (RL) theories have been influential in characterizing the mechanisms for reward-guided choice in the brain, the predominant temporal difference (TD) algorithm cannot explain many flexible or goal-directed actions that have been demonstrated behaviorally. We investigate such actions by contrasting an RL algorithm that is model based, in that it relies on learning a map or model of the task and planning within it, to traditional model-free TD learning. To distinguish these approaches in humans, we used functional magnetic resonance imaging in a continuous spatial navigation task, in which frequent changes to the layout of the maze forced subjects continually to relearn their favored routes, thereby exposing the RL mechanisms used. We sought evidence for the neural substrates of such mechanisms by comparing choice behavior and blood oxygen level-dependent (BOLD) signals to decision variables extracted from simulations of either algorithm. Both choices and value-related BOLD signals in striatum, although most often associated with TD learning, were better explained by the model-based theory. Furthermore, predecessor quantities for the model-based value computation were correlated with BOLD signals in the medial temporal lobe and frontal cortex. These results point to a significant extension of both the computational and anatomical substrates for RL in the brain.

UR - http://www.scopus.com/inward/record.url?scp=79955709936&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79955709936&partnerID=8YFLogxK

U2 - 10.1523/JNEUROSCI.4647-10.2011

DO - 10.1523/JNEUROSCI.4647-10.2011

M3 - Article

VL - 31

SP - 5526

EP - 5539

JO - Journal of Neuroscience

JF - Journal of Neuroscience

SN - 0270-6474

IS - 14

ER -