Long-term reward prediction in TD models of the dopamine system

Nathaniel D. Daw, David S. Touretzky

Research output: Contribution to journalArticle

Abstract

This article addresses the relationship between long-term reward predictions and slow-timescale neural activity in temporal difference (TD) models of the dopamine system. Such models attempt to explain how the activity of dopamine (DA) neurons relates to errors in the prediction of future rewards. Previous models have been mostly restricted to short-term predictions of rewards expected during a single, somewhat artificially defined trial. Also, the models focused exclusively on the phasic pause-and-burst activity of primate DA neurons; the neurons' slower, tonic background activity was assumed to be constant. This has led to difficulty in explaining the results of neurochemical experiments that measure indications of DA release on a slow timescale, results that seem at first glance inconsistent with a reward prediction model. In this article, we investigate a TD model of DA activity modified so as to enable it to make longer-term predictions about rewards expected far in the future. We show that these predictions manifest themselves as slow changes in the baseline error signal, which we associate with tonic DA activity. Using this model, we make new predictions about the behavior of the DA system in a number of experimental situations. Some of these predictions suggest new computational explanations for previously puzzling data, such as indications from microdialysis studies of elevated DA activity triggered by aversive events.

Original languageEnglish (US)
Pages (from-to)2567-2583
Number of pages17
JournalNeural Computation
Volume14
Issue number11
DOIs
StatePublished - Nov 2002

Fingerprint

Reward
Dopamine
Dopaminergic Neurons
Neurons
Microdialysis
Primates

ASJC Scopus subject areas

  • Artificial Intelligence
  • Control and Systems Engineering
  • Neuroscience(all)

Cite this

Long-term reward prediction in TD models of the dopamine system. / Daw, Nathaniel D.; Touretzky, David S.

In: Neural Computation, Vol. 14, No. 11, 11.2002, p. 2567-2583.

Research output: Contribution to journalArticle

Daw, Nathaniel D. ; Touretzky, David S. / Long-term reward prediction in TD models of the dopamine system. In: Neural Computation. 2002 ; Vol. 14, No. 11. pp. 2567-2583.
@article{88764f449123417eb0390055f305ea5b,
title = "Long-term reward prediction in TD models of the dopamine system",
abstract = "This article addresses the relationship between long-term reward predictions and slow-timescale neural activity in temporal difference (TD) models of the dopamine system. Such models attempt to explain how the activity of dopamine (DA) neurons relates to errors in the prediction of future rewards. Previous models have been mostly restricted to short-term predictions of rewards expected during a single, somewhat artificially defined trial. Also, the models focused exclusively on the phasic pause-and-burst activity of primate DA neurons; the neurons' slower, tonic background activity was assumed to be constant. This has led to difficulty in explaining the results of neurochemical experiments that measure indications of DA release on a slow timescale, results that seem at first glance inconsistent with a reward prediction model. In this article, we investigate a TD model of DA activity modified so as to enable it to make longer-term predictions about rewards expected far in the future. We show that these predictions manifest themselves as slow changes in the baseline error signal, which we associate with tonic DA activity. Using this model, we make new predictions about the behavior of the DA system in a number of experimental situations. Some of these predictions suggest new computational explanations for previously puzzling data, such as indications from microdialysis studies of elevated DA activity triggered by aversive events.",
author = "Daw, {Nathaniel D.} and Touretzky, {David S.}",
year = "2002",
month = "11",
doi = "10.1162/089976602760407973",
language = "English (US)",
volume = "14",
pages = "2567--2583",
journal = "Neural Computation",
issn = "0899-7667",
publisher = "MIT Press Journals",
number = "11",

}

TY - JOUR

T1 - Long-term reward prediction in TD models of the dopamine system

AU - Daw, Nathaniel D.

AU - Touretzky, David S.

PY - 2002/11

Y1 - 2002/11

N2 - This article addresses the relationship between long-term reward predictions and slow-timescale neural activity in temporal difference (TD) models of the dopamine system. Such models attempt to explain how the activity of dopamine (DA) neurons relates to errors in the prediction of future rewards. Previous models have been mostly restricted to short-term predictions of rewards expected during a single, somewhat artificially defined trial. Also, the models focused exclusively on the phasic pause-and-burst activity of primate DA neurons; the neurons' slower, tonic background activity was assumed to be constant. This has led to difficulty in explaining the results of neurochemical experiments that measure indications of DA release on a slow timescale, results that seem at first glance inconsistent with a reward prediction model. In this article, we investigate a TD model of DA activity modified so as to enable it to make longer-term predictions about rewards expected far in the future. We show that these predictions manifest themselves as slow changes in the baseline error signal, which we associate with tonic DA activity. Using this model, we make new predictions about the behavior of the DA system in a number of experimental situations. Some of these predictions suggest new computational explanations for previously puzzling data, such as indications from microdialysis studies of elevated DA activity triggered by aversive events.

AB - This article addresses the relationship between long-term reward predictions and slow-timescale neural activity in temporal difference (TD) models of the dopamine system. Such models attempt to explain how the activity of dopamine (DA) neurons relates to errors in the prediction of future rewards. Previous models have been mostly restricted to short-term predictions of rewards expected during a single, somewhat artificially defined trial. Also, the models focused exclusively on the phasic pause-and-burst activity of primate DA neurons; the neurons' slower, tonic background activity was assumed to be constant. This has led to difficulty in explaining the results of neurochemical experiments that measure indications of DA release on a slow timescale, results that seem at first glance inconsistent with a reward prediction model. In this article, we investigate a TD model of DA activity modified so as to enable it to make longer-term predictions about rewards expected far in the future. We show that these predictions manifest themselves as slow changes in the baseline error signal, which we associate with tonic DA activity. Using this model, we make new predictions about the behavior of the DA system in a number of experimental situations. Some of these predictions suggest new computational explanations for previously puzzling data, such as indications from microdialysis studies of elevated DA activity triggered by aversive events.

UR - http://www.scopus.com/inward/record.url?scp=0036835734&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036835734&partnerID=8YFLogxK

U2 - 10.1162/089976602760407973

DO - 10.1162/089976602760407973

M3 - Article

VL - 14

SP - 2567

EP - 2583

JO - Neural Computation

JF - Neural Computation

SN - 0899-7667

IS - 11

ER -