Sources of suboptimality in a minimalistic explore–exploit task

Mingyu Song, Zahy Bnaya, Wei Ji Ma

Research output: Contribution to journalLetter

Abstract

People often choose between sticking with an available good option (exploitation) and trying out a new option that is uncertain but potentially more rewarding (exploration) 1,2 . Laboratory studies on explore–exploit decisions often contain real-world complexities such as non-stationary environments, stochasticity under exploitation and unknown reward distributions 3–7 . However, such factors might limit the researcher’s ability to understand the essence of people’s explore–exploit decisions. For this reason, we introduce a minimalistic task in which the optimal policy is to start off exploring and to switch to exploitation at most once in each sequence of decisions. The behaviour of 49 laboratory and 143 online participants deviated both qualitatively and quantitatively from the optimal policy, even when allowing for bias and decision noise. Instead, people seem to follow a suboptimal rule in which they switch from exploration to exploitation when the highest reward so far exceeds a certain threshold. Moreover, we show that this threshold decreases approximately linearly with the proportion of the sequence that remains, suggesting a temporal ratio law. Finally, we find evidence for ‘sequence-level’ variability that is shared across all decisions in the same sequence. Our results emphasize the importance of examining sequence-level strategies and their variability when studying sequential decision-making.

Original languageEnglish (US)
JournalNature Human Behaviour
DOIs
StatePublished - Jan 1 2019

Fingerprint

Reward
Aptitude
Noise
Decision Making
Research Personnel

ASJC Scopus subject areas

  • Social Psychology
  • Experimental and Cognitive Psychology
  • Behavioral Neuroscience

Cite this

Sources of suboptimality in a minimalistic explore–exploit task. / Song, Mingyu; Bnaya, Zahy; Ma, Wei Ji.

In: Nature Human Behaviour, 01.01.2019.

Research output: Contribution to journalLetter

@article{e38b5945938f48ddaac072d728cf3027,
title = "Sources of suboptimality in a minimalistic explore–exploit task",
abstract = "People often choose between sticking with an available good option (exploitation) and trying out a new option that is uncertain but potentially more rewarding (exploration) 1,2 . Laboratory studies on explore–exploit decisions often contain real-world complexities such as non-stationary environments, stochasticity under exploitation and unknown reward distributions 3–7 . However, such factors might limit the researcher’s ability to understand the essence of people’s explore–exploit decisions. For this reason, we introduce a minimalistic task in which the optimal policy is to start off exploring and to switch to exploitation at most once in each sequence of decisions. The behaviour of 49 laboratory and 143 online participants deviated both qualitatively and quantitatively from the optimal policy, even when allowing for bias and decision noise. Instead, people seem to follow a suboptimal rule in which they switch from exploration to exploitation when the highest reward so far exceeds a certain threshold. Moreover, we show that this threshold decreases approximately linearly with the proportion of the sequence that remains, suggesting a temporal ratio law. Finally, we find evidence for ‘sequence-level’ variability that is shared across all decisions in the same sequence. Our results emphasize the importance of examining sequence-level strategies and their variability when studying sequential decision-making.",
author = "Mingyu Song and Zahy Bnaya and Ma, {Wei Ji}",
year = "2019",
month = "1",
day = "1",
doi = "10.1038/s41562-018-0526-x",
language = "English (US)",
journal = "Nature Human Behaviour",
issn = "2397-3374",
publisher = "Nature Publishing Group",

}

TY - JOUR

T1 - Sources of suboptimality in a minimalistic explore–exploit task

AU - Song, Mingyu

AU - Bnaya, Zahy

AU - Ma, Wei Ji

PY - 2019/1/1

Y1 - 2019/1/1

N2 - People often choose between sticking with an available good option (exploitation) and trying out a new option that is uncertain but potentially more rewarding (exploration) 1,2 . Laboratory studies on explore–exploit decisions often contain real-world complexities such as non-stationary environments, stochasticity under exploitation and unknown reward distributions 3–7 . However, such factors might limit the researcher’s ability to understand the essence of people’s explore–exploit decisions. For this reason, we introduce a minimalistic task in which the optimal policy is to start off exploring and to switch to exploitation at most once in each sequence of decisions. The behaviour of 49 laboratory and 143 online participants deviated both qualitatively and quantitatively from the optimal policy, even when allowing for bias and decision noise. Instead, people seem to follow a suboptimal rule in which they switch from exploration to exploitation when the highest reward so far exceeds a certain threshold. Moreover, we show that this threshold decreases approximately linearly with the proportion of the sequence that remains, suggesting a temporal ratio law. Finally, we find evidence for ‘sequence-level’ variability that is shared across all decisions in the same sequence. Our results emphasize the importance of examining sequence-level strategies and their variability when studying sequential decision-making.

AB - People often choose between sticking with an available good option (exploitation) and trying out a new option that is uncertain but potentially more rewarding (exploration) 1,2 . Laboratory studies on explore–exploit decisions often contain real-world complexities such as non-stationary environments, stochasticity under exploitation and unknown reward distributions 3–7 . However, such factors might limit the researcher’s ability to understand the essence of people’s explore–exploit decisions. For this reason, we introduce a minimalistic task in which the optimal policy is to start off exploring and to switch to exploitation at most once in each sequence of decisions. The behaviour of 49 laboratory and 143 online participants deviated both qualitatively and quantitatively from the optimal policy, even when allowing for bias and decision noise. Instead, people seem to follow a suboptimal rule in which they switch from exploration to exploitation when the highest reward so far exceeds a certain threshold. Moreover, we show that this threshold decreases approximately linearly with the proportion of the sequence that remains, suggesting a temporal ratio law. Finally, we find evidence for ‘sequence-level’ variability that is shared across all decisions in the same sequence. Our results emphasize the importance of examining sequence-level strategies and their variability when studying sequential decision-making.

UR - http://www.scopus.com/inward/record.url?scp=85061373253&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061373253&partnerID=8YFLogxK

U2 - 10.1038/s41562-018-0526-x

DO - 10.1038/s41562-018-0526-x

M3 - Letter

C2 - 30971784

AN - SCOPUS:85061373253

JO - Nature Human Behaviour

JF - Nature Human Behaviour

SN - 2397-3374

ER -