Environmental statistics and the trade-off between model-based and TD learning in humans

Dylan A. Simon, Nathaniel D. Daw

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

There is much evidence that humans and other animals utilize a combination of model-based and model-free RL methods. Although it has been proposed that these systems may dominate according to their relative statistical efficiency in different circumstances, there is little specific evidence - especially in humans - as to the details of this trade-off. Accordingly, we examine the relative performance of different RL approaches under situations in which the statistics of reward are differentially noisy and volatile. Using theory and simulation, we show that model-free TD learning is relatively most disadvantaged in cases of high volatility and low noise. We present data from a decision-making experiment manipulating these parameters, showing that humans shift learning strategies in accord with these predictions. The statistical circumstances favoring model-based RL are also those that promote a high learning rate, which helps explain why, in psychology, the distinction between these strategies is traditionally conceived in terms of rulebased vs. incremental learning.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011
StatePublished - 2011
Event25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011 - Granada, Spain
Duration: Dec 12 2011Dec 14 2011

Other

Other25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011
CountrySpain
CityGranada
Period12/12/1112/14/11

Fingerprint

Statistics
Animals
Decision making
Experiments

ASJC Scopus subject areas

  • Information Systems

Cite this

Simon, D. A., & Daw, N. D. (2011). Environmental statistics and the trade-off between model-based and TD learning in humans. In Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011

Environmental statistics and the trade-off between model-based and TD learning in humans. / Simon, Dylan A.; Daw, Nathaniel D.

Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011. 2011.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Simon, DA & Daw, ND 2011, Environmental statistics and the trade-off between model-based and TD learning in humans. in Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011. 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011, Granada, Spain, 12/12/11.
Simon DA, Daw ND. Environmental statistics and the trade-off between model-based and TD learning in humans. In Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011. 2011
Simon, Dylan A. ; Daw, Nathaniel D. / Environmental statistics and the trade-off between model-based and TD learning in humans. Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011. 2011.
@inproceedings{aa75f39b14084b468789442bdfd24a46,
title = "Environmental statistics and the trade-off between model-based and TD learning in humans",
abstract = "There is much evidence that humans and other animals utilize a combination of model-based and model-free RL methods. Although it has been proposed that these systems may dominate according to their relative statistical efficiency in different circumstances, there is little specific evidence - especially in humans - as to the details of this trade-off. Accordingly, we examine the relative performance of different RL approaches under situations in which the statistics of reward are differentially noisy and volatile. Using theory and simulation, we show that model-free TD learning is relatively most disadvantaged in cases of high volatility and low noise. We present data from a decision-making experiment manipulating these parameters, showing that humans shift learning strategies in accord with these predictions. The statistical circumstances favoring model-based RL are also those that promote a high learning rate, which helps explain why, in psychology, the distinction between these strategies is traditionally conceived in terms of rulebased vs. incremental learning.",
author = "Simon, {Dylan A.} and Daw, {Nathaniel D.}",
year = "2011",
language = "English (US)",
isbn = "9781618395993",
booktitle = "Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011",

}

TY - GEN

T1 - Environmental statistics and the trade-off between model-based and TD learning in humans

AU - Simon, Dylan A.

AU - Daw, Nathaniel D.

PY - 2011

Y1 - 2011

N2 - There is much evidence that humans and other animals utilize a combination of model-based and model-free RL methods. Although it has been proposed that these systems may dominate according to their relative statistical efficiency in different circumstances, there is little specific evidence - especially in humans - as to the details of this trade-off. Accordingly, we examine the relative performance of different RL approaches under situations in which the statistics of reward are differentially noisy and volatile. Using theory and simulation, we show that model-free TD learning is relatively most disadvantaged in cases of high volatility and low noise. We present data from a decision-making experiment manipulating these parameters, showing that humans shift learning strategies in accord with these predictions. The statistical circumstances favoring model-based RL are also those that promote a high learning rate, which helps explain why, in psychology, the distinction between these strategies is traditionally conceived in terms of rulebased vs. incremental learning.

AB - There is much evidence that humans and other animals utilize a combination of model-based and model-free RL methods. Although it has been proposed that these systems may dominate according to their relative statistical efficiency in different circumstances, there is little specific evidence - especially in humans - as to the details of this trade-off. Accordingly, we examine the relative performance of different RL approaches under situations in which the statistics of reward are differentially noisy and volatile. Using theory and simulation, we show that model-free TD learning is relatively most disadvantaged in cases of high volatility and low noise. We present data from a decision-making experiment manipulating these parameters, showing that humans shift learning strategies in accord with these predictions. The statistical circumstances favoring model-based RL are also those that promote a high learning rate, which helps explain why, in psychology, the distinction between these strategies is traditionally conceived in terms of rulebased vs. incremental learning.

UR - http://www.scopus.com/inward/record.url?scp=84860612012&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84860612012&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781618395993

BT - Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011

ER -