Representation and timing in theories of the dopamine system

Nathaniel D. Daw, Aaron C. Courville, David S. Touretzky

Research output: Contribution to journalArticle

Abstract

Although the responses of dopamine neurons in the primate midbrain are well characterized as carrying a temporal difference (TD) error signal for reward prediction, existing theories do not offer a credible account of how the brain keeps track of past sensory events that may be relevant to predicting future reward. Empirically, these shortcomings of previous theories are particularly evident in their account of experiments in which animals were exposed to variation in the timing of events. The original theories mispredicted the results of such experiments due to their use of a representational device called a tapped delay line. Here we propose that a richer understanding of history representation and a better account of these experiments can be given by considering TD algorithms for a formal setting that incorporates two features not originally considered in theories of the dopaminergic response: partial observability (a distinction between the animal's sensory experience and the true underlying state of the world) and semi-Markov dynamics (an explicit account of variation in the intervals between events). The new theory situates the dopaminergic system in a richer functional and anatomical context, since it assumes (in accord with recent computational theories of cortex) that problems of partial observability and stimulus history are solved in sensory cortex using statistical modeling and inference and that the TD system predicts reward using the results of this inference rather than raw sensory data. It also accounts for a range of experimental data, including the experiments involving programmed temporal variability and other previously unmodeled dopaminergic response phenomena, which we suggest are related to subjective noise in animals' interval timing. Finally, it offers new experimental predictions and a rich theoretical framework for designing future experiments.

Original languageEnglish (US)
Pages (from-to)1637-1677
Number of pages41
JournalNeural computation
Volume18
Issue number7
DOIs
StatePublished - 2006

Fingerprint

Dopamine
Reward
Animals
Observability
History
Experiments
Dopaminergic Neurons
Signal filtering and prediction
Mesencephalon
Primates
Electric delay lines
Noise
Neurons
Brain
Equipment and Supplies

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Artificial Intelligence
  • Neuroscience(all)

Cite this

Representation and timing in theories of the dopamine system. / Daw, Nathaniel D.; Courville, Aaron C.; Touretzky, David S.

In: Neural computation, Vol. 18, No. 7, 2006, p. 1637-1677.

Research output: Contribution to journalArticle

Daw, ND, Courville, AC & Touretzky, DS 2006, 'Representation and timing in theories of the dopamine system', Neural computation, vol. 18, no. 7, pp. 1637-1677. https://doi.org/10.1162/neco.2006.18.7.1637
Daw, Nathaniel D. ; Courville, Aaron C. ; Touretzky, David S. / Representation and timing in theories of the dopamine system. In: Neural computation. 2006 ; Vol. 18, No. 7. pp. 1637-1677.
@article{0b43232c21b04a39990a1692606c8091,
title = "Representation and timing in theories of the dopamine system",
abstract = "Although the responses of dopamine neurons in the primate midbrain are well characterized as carrying a temporal difference (TD) error signal for reward prediction, existing theories do not offer a credible account of how the brain keeps track of past sensory events that may be relevant to predicting future reward. Empirically, these shortcomings of previous theories are particularly evident in their account of experiments in which animals were exposed to variation in the timing of events. The original theories mispredicted the results of such experiments due to their use of a representational device called a tapped delay line. Here we propose that a richer understanding of history representation and a better account of these experiments can be given by considering TD algorithms for a formal setting that incorporates two features not originally considered in theories of the dopaminergic response: partial observability (a distinction between the animal's sensory experience and the true underlying state of the world) and semi-Markov dynamics (an explicit account of variation in the intervals between events). The new theory situates the dopaminergic system in a richer functional and anatomical context, since it assumes (in accord with recent computational theories of cortex) that problems of partial observability and stimulus history are solved in sensory cortex using statistical modeling and inference and that the TD system predicts reward using the results of this inference rather than raw sensory data. It also accounts for a range of experimental data, including the experiments involving programmed temporal variability and other previously unmodeled dopaminergic response phenomena, which we suggest are related to subjective noise in animals' interval timing. Finally, it offers new experimental predictions and a rich theoretical framework for designing future experiments.",
author = "Daw, {Nathaniel D.} and Courville, {Aaron C.} and Touretzky, {David S.}",
year = "2006",
doi = "10.1162/neco.2006.18.7.1637",
language = "English (US)",
volume = "18",
pages = "1637--1677",
journal = "Neural computation",
issn = "0899-7667",
number = "7",

}

TY - JOUR

T1 - Representation and timing in theories of the dopamine system

AU - Daw, Nathaniel D.

AU - Courville, Aaron C.

AU - Touretzky, David S.

PY - 2006

Y1 - 2006

N2 - Although the responses of dopamine neurons in the primate midbrain are well characterized as carrying a temporal difference (TD) error signal for reward prediction, existing theories do not offer a credible account of how the brain keeps track of past sensory events that may be relevant to predicting future reward. Empirically, these shortcomings of previous theories are particularly evident in their account of experiments in which animals were exposed to variation in the timing of events. The original theories mispredicted the results of such experiments due to their use of a representational device called a tapped delay line. Here we propose that a richer understanding of history representation and a better account of these experiments can be given by considering TD algorithms for a formal setting that incorporates two features not originally considered in theories of the dopaminergic response: partial observability (a distinction between the animal's sensory experience and the true underlying state of the world) and semi-Markov dynamics (an explicit account of variation in the intervals between events). The new theory situates the dopaminergic system in a richer functional and anatomical context, since it assumes (in accord with recent computational theories of cortex) that problems of partial observability and stimulus history are solved in sensory cortex using statistical modeling and inference and that the TD system predicts reward using the results of this inference rather than raw sensory data. It also accounts for a range of experimental data, including the experiments involving programmed temporal variability and other previously unmodeled dopaminergic response phenomena, which we suggest are related to subjective noise in animals' interval timing. Finally, it offers new experimental predictions and a rich theoretical framework for designing future experiments.

AB - Although the responses of dopamine neurons in the primate midbrain are well characterized as carrying a temporal difference (TD) error signal for reward prediction, existing theories do not offer a credible account of how the brain keeps track of past sensory events that may be relevant to predicting future reward. Empirically, these shortcomings of previous theories are particularly evident in their account of experiments in which animals were exposed to variation in the timing of events. The original theories mispredicted the results of such experiments due to their use of a representational device called a tapped delay line. Here we propose that a richer understanding of history representation and a better account of these experiments can be given by considering TD algorithms for a formal setting that incorporates two features not originally considered in theories of the dopaminergic response: partial observability (a distinction between the animal's sensory experience and the true underlying state of the world) and semi-Markov dynamics (an explicit account of variation in the intervals between events). The new theory situates the dopaminergic system in a richer functional and anatomical context, since it assumes (in accord with recent computational theories of cortex) that problems of partial observability and stimulus history are solved in sensory cortex using statistical modeling and inference and that the TD system predicts reward using the results of this inference rather than raw sensory data. It also accounts for a range of experimental data, including the experiments involving programmed temporal variability and other previously unmodeled dopaminergic response phenomena, which we suggest are related to subjective noise in animals' interval timing. Finally, it offers new experimental predictions and a rich theoretical framework for designing future experiments.

UR - http://www.scopus.com/inward/record.url?scp=33745787929&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745787929&partnerID=8YFLogxK

U2 - 10.1162/neco.2006.18.7.1637

DO - 10.1162/neco.2006.18.7.1637

M3 - Article

VL - 18

SP - 1637

EP - 1677

JO - Neural computation

JF - Neural computation

SN - 0899-7667

IS - 7

ER -