Timing and partial observability in the dopamine system

Nathaniel D. Daw, Aaron C. Courville, David S. Touretzky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

According to a series of influential models, dopamine (DA) neurons signal reward prediction error using a temporal-difference (TD) algorithm. We address a problem not convincingly solved in these accounts: how to maintain a representation of cues that predict delayed consequences. Our new model uses a TD rule grounded in partially observable semi-Markov processes, a formalism that captures two largely neglected features of DA experiments: hidden state and temporal variability. Previous models predicted rewards using a tapped delay line representation of sensory inputs; we replace this with a more active process of inference about the underlying state of the world. The DA system can then learn to map these inferred states to reward predictions using TD. The new model can explain previously vexing data on the responses of DA neurons in the face of temporal variability. By combining statistical model-based learning with a physiologically grounded TD theory, it also brings into contact with physiology some insights about behavior that had previously been confined to more abstract psychological models.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 15 - Proceedings of the 2002 Conference, NIPS 2002
PublisherNeural information processing systems foundation
ISBN (Print)0262025507, 9780262025508
StatePublished - 2003
Event16th Annual Neural Information Processing Systems Conference, NIPS 2002 - Vancouver, BC, Canada
Duration: Dec 9 2002Dec 14 2002

Other

Other16th Annual Neural Information Processing Systems Conference, NIPS 2002
CountryCanada
CityVancouver, BC
Period12/9/0212/14/02

Fingerprint

Observability
Neurons
Physiology
Electric delay lines
Markov processes
Dopamine
Experiments

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

Daw, N. D., Courville, A. C., & Touretzky, D. S. (2003). Timing and partial observability in the dopamine system. In Advances in Neural Information Processing Systems 15 - Proceedings of the 2002 Conference, NIPS 2002 Neural information processing systems foundation.

Timing and partial observability in the dopamine system. / Daw, Nathaniel D.; Courville, Aaron C.; Touretzky, David S.

Advances in Neural Information Processing Systems 15 - Proceedings of the 2002 Conference, NIPS 2002. Neural information processing systems foundation, 2003.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Daw, ND, Courville, AC & Touretzky, DS 2003, Timing and partial observability in the dopamine system. in Advances in Neural Information Processing Systems 15 - Proceedings of the 2002 Conference, NIPS 2002. Neural information processing systems foundation, 16th Annual Neural Information Processing Systems Conference, NIPS 2002, Vancouver, BC, Canada, 12/9/02.
Daw ND, Courville AC, Touretzky DS. Timing and partial observability in the dopamine system. In Advances in Neural Information Processing Systems 15 - Proceedings of the 2002 Conference, NIPS 2002. Neural information processing systems foundation. 2003
Daw, Nathaniel D. ; Courville, Aaron C. ; Touretzky, David S. / Timing and partial observability in the dopamine system. Advances in Neural Information Processing Systems 15 - Proceedings of the 2002 Conference, NIPS 2002. Neural information processing systems foundation, 2003.
@inproceedings{d7633d248fd34e4e84c5dd88e08d7564,
title = "Timing and partial observability in the dopamine system",
abstract = "According to a series of influential models, dopamine (DA) neurons signal reward prediction error using a temporal-difference (TD) algorithm. We address a problem not convincingly solved in these accounts: how to maintain a representation of cues that predict delayed consequences. Our new model uses a TD rule grounded in partially observable semi-Markov processes, a formalism that captures two largely neglected features of DA experiments: hidden state and temporal variability. Previous models predicted rewards using a tapped delay line representation of sensory inputs; we replace this with a more active process of inference about the underlying state of the world. The DA system can then learn to map these inferred states to reward predictions using TD. The new model can explain previously vexing data on the responses of DA neurons in the face of temporal variability. By combining statistical model-based learning with a physiologically grounded TD theory, it also brings into contact with physiology some insights about behavior that had previously been confined to more abstract psychological models.",
author = "Daw, {Nathaniel D.} and Courville, {Aaron C.} and Touretzky, {David S.}",
year = "2003",
language = "English (US)",
isbn = "0262025507",
booktitle = "Advances in Neural Information Processing Systems 15 - Proceedings of the 2002 Conference, NIPS 2002",
publisher = "Neural information processing systems foundation",

}

TY - GEN

T1 - Timing and partial observability in the dopamine system

AU - Daw, Nathaniel D.

AU - Courville, Aaron C.

AU - Touretzky, David S.

PY - 2003

Y1 - 2003

N2 - According to a series of influential models, dopamine (DA) neurons signal reward prediction error using a temporal-difference (TD) algorithm. We address a problem not convincingly solved in these accounts: how to maintain a representation of cues that predict delayed consequences. Our new model uses a TD rule grounded in partially observable semi-Markov processes, a formalism that captures two largely neglected features of DA experiments: hidden state and temporal variability. Previous models predicted rewards using a tapped delay line representation of sensory inputs; we replace this with a more active process of inference about the underlying state of the world. The DA system can then learn to map these inferred states to reward predictions using TD. The new model can explain previously vexing data on the responses of DA neurons in the face of temporal variability. By combining statistical model-based learning with a physiologically grounded TD theory, it also brings into contact with physiology some insights about behavior that had previously been confined to more abstract psychological models.

AB - According to a series of influential models, dopamine (DA) neurons signal reward prediction error using a temporal-difference (TD) algorithm. We address a problem not convincingly solved in these accounts: how to maintain a representation of cues that predict delayed consequences. Our new model uses a TD rule grounded in partially observable semi-Markov processes, a formalism that captures two largely neglected features of DA experiments: hidden state and temporal variability. Previous models predicted rewards using a tapped delay line representation of sensory inputs; we replace this with a more active process of inference about the underlying state of the world. The DA system can then learn to map these inferred states to reward predictions using TD. The new model can explain previously vexing data on the responses of DA neurons in the face of temporal variability. By combining statistical model-based learning with a physiologically grounded TD theory, it also brings into contact with physiology some insights about behavior that had previously been confined to more abstract psychological models.

UR - http://www.scopus.com/inward/record.url?scp=84898934121&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84898934121&partnerID=8YFLogxK

M3 - Conference contribution

SN - 0262025507

SN - 9780262025508

BT - Advances in Neural Information Processing Systems 15 - Proceedings of the 2002 Conference, NIPS 2002

PB - Neural information processing systems foundation

ER -