Multi-armed bandit algorithms and empirical evaluation

Joannès Vermorel, Mehryar Mohri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms. This paper provides a preliminary empirical evaluation of several multi-armed bandit algorithms. It also describes and analyzes a new algorithm, POKER (Price Of Knowledge and Estimated Reward) whose performance compares favorably to that of other existing algorithms in several experiments. One remarkable outcome of our experiments is that the most naive approach, the ε-greedy strategy, proves to be often hard to beat.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages437-448
Number of pages12
Volume3720 LNAI
DOIs
StatePublished - 2005
Event16th European Conference on Machine Learning, ECML 2005 - Porto, Portugal
Duration: Oct 3 2005Oct 7 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3720 LNAI
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other16th European Conference on Machine Learning, ECML 2005
CountryPortugal
CityPorto
Period10/3/0510/7/05

Fingerprint

Multi-armed Bandit
Evaluation
Reward
Bandit Problems
Beat
Experiment
Experiments
Maximise
Learning
Optimization Problem
Series

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Vermorel, J., & Mohri, M. (2005). Multi-armed bandit algorithms and empirical evaluation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3720 LNAI, pp. 437-448). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3720 LNAI). https://doi.org/10.1007/11564096_42

Multi-armed bandit algorithms and empirical evaluation. / Vermorel, Joannès; Mohri, Mehryar.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3720 LNAI 2005. p. 437-448 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3720 LNAI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Vermorel, J & Mohri, M 2005, Multi-armed bandit algorithms and empirical evaluation. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 3720 LNAI, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3720 LNAI, pp. 437-448, 16th European Conference on Machine Learning, ECML 2005, Porto, Portugal, 10/3/05. https://doi.org/10.1007/11564096_42
Vermorel J, Mohri M. Multi-armed bandit algorithms and empirical evaluation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3720 LNAI. 2005. p. 437-448. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/11564096_42
Vermorel, Joannès ; Mohri, Mehryar. / Multi-armed bandit algorithms and empirical evaluation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 3720 LNAI 2005. pp. 437-448 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{c2b5eb14647f414ca28c898c8152e4fa,
title = "Multi-armed bandit algorithms and empirical evaluation",
abstract = "The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms. This paper provides a preliminary empirical evaluation of several multi-armed bandit algorithms. It also describes and analyzes a new algorithm, POKER (Price Of Knowledge and Estimated Reward) whose performance compares favorably to that of other existing algorithms in several experiments. One remarkable outcome of our experiments is that the most naive approach, the ε-greedy strategy, proves to be often hard to beat.",
author = "Joann{\`e}s Vermorel and Mehryar Mohri",
year = "2005",
doi = "10.1007/11564096_42",
language = "English (US)",
isbn = "3540292438",
volume = "3720 LNAI",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "437--448",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Multi-armed bandit algorithms and empirical evaluation

AU - Vermorel, Joannès

AU - Mohri, Mehryar

PY - 2005

Y1 - 2005

N2 - The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms. This paper provides a preliminary empirical evaluation of several multi-armed bandit algorithms. It also describes and analyzes a new algorithm, POKER (Price Of Knowledge and Estimated Reward) whose performance compares favorably to that of other existing algorithms in several experiments. One remarkable outcome of our experiments is that the most naive approach, the ε-greedy strategy, proves to be often hard to beat.

AB - The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms. This paper provides a preliminary empirical evaluation of several multi-armed bandit algorithms. It also describes and analyzes a new algorithm, POKER (Price Of Knowledge and Estimated Reward) whose performance compares favorably to that of other existing algorithms in several experiments. One remarkable outcome of our experiments is that the most naive approach, the ε-greedy strategy, proves to be often hard to beat.

UR - http://www.scopus.com/inward/record.url?scp=33646406807&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33646406807&partnerID=8YFLogxK

U2 - 10.1007/11564096_42

DO - 10.1007/11564096_42

M3 - Conference contribution

AN - SCOPUS:33646406807

SN - 3540292438

SN - 9783540292432

VL - 3720 LNAI

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 437

EP - 448

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -