Batched bandit problems

Vianney Perchet, Philippe Rigollet, Sylvain Chassang, Erik Snowberg

    Research output: Contribution to journalArticle

    Abstract

    Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

    Original languageEnglish (US)
    Pages (from-to)660-681
    Number of pages22
    JournalAnnals of Statistics
    Volume44
    Issue number2
    DOIs
    StatePublished - Apr 1 2016

    Fingerprint

    Bandit Problems
    Regret
    Batch
    Optimal Policy
    Minimax
    Clinical Trials
    Costs
    Policy
    Bandit problems
    Clinical trials
    Switching costs
    Optimal policy
    By-products

    Keywords

    • Batches
    • Grouped clinical trials, sample size determination, switching cost
    • Multi-armed bandit problems
    • Multi-phase allocation
    • Regret bounds

    ASJC Scopus subject areas

    • Statistics and Probability
    • Statistics, Probability and Uncertainty

    Cite this

    Perchet, V., Rigollet, P., Chassang, S., & Snowberg, E. (2016). Batched bandit problems. Annals of Statistics, 44(2), 660-681. https://doi.org/10.1214/15-AOS1381

    Batched bandit problems. / Perchet, Vianney; Rigollet, Philippe; Chassang, Sylvain; Snowberg, Erik.

    In: Annals of Statistics, Vol. 44, No. 2, 01.04.2016, p. 660-681.

    Research output: Contribution to journalArticle

    Perchet, V, Rigollet, P, Chassang, S & Snowberg, E 2016, 'Batched bandit problems', Annals of Statistics, vol. 44, no. 2, pp. 660-681. https://doi.org/10.1214/15-AOS1381
    Perchet V, Rigollet P, Chassang S, Snowberg E. Batched bandit problems. Annals of Statistics. 2016 Apr 1;44(2):660-681. https://doi.org/10.1214/15-AOS1381
    Perchet, Vianney ; Rigollet, Philippe ; Chassang, Sylvain ; Snowberg, Erik. / Batched bandit problems. In: Annals of Statistics. 2016 ; Vol. 44, No. 2. pp. 660-681.
    @article{a67565f7a73f40a3b50ca1be7e6b35fd,
    title = "Batched bandit problems",
    abstract = "Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.",
    keywords = "Batches, Grouped clinical trials, sample size determination, switching cost, Multi-armed bandit problems, Multi-phase allocation, Regret bounds",
    author = "Vianney Perchet and Philippe Rigollet and Sylvain Chassang and Erik Snowberg",
    year = "2016",
    month = "4",
    day = "1",
    doi = "10.1214/15-AOS1381",
    language = "English (US)",
    volume = "44",
    pages = "660--681",
    journal = "Annals of Statistics",
    issn = "0090-5364",
    publisher = "Institute of Mathematical Statistics",
    number = "2",

    }

    TY - JOUR

    T1 - Batched bandit problems

    AU - Perchet, Vianney

    AU - Rigollet, Philippe

    AU - Chassang, Sylvain

    AU - Snowberg, Erik

    PY - 2016/4/1

    Y1 - 2016/4/1

    N2 - Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

    AB - Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

    KW - Batches

    KW - Grouped clinical trials, sample size determination, switching cost

    KW - Multi-armed bandit problems

    KW - Multi-phase allocation

    KW - Regret bounds

    UR - http://www.scopus.com/inward/record.url?scp=84963656247&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84963656247&partnerID=8YFLogxK

    U2 - 10.1214/15-AOS1381

    DO - 10.1214/15-AOS1381

    M3 - Article

    VL - 44

    SP - 660

    EP - 681

    JO - Annals of Statistics

    JF - Annals of Statistics

    SN - 0090-5364

    IS - 2

    ER -