A time and space efficient algorithm for contextual linear bandits

José Bento, Stratis Ioannidis, Shanmugavelayutham Muthukrishnan, Jinyun Yan

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been proposed that achieve O(logT) regret after T time steps. However, proposed methods either have a computation complexity per iteration that scales linearly with T or achieve regrets that grow linearly with the number of contexts |χ|. We propose an ε-greedy type of algorithm that solves both limitations. In particular, when contexts are variables in ℝd, we prove that our algorithm has a constant computation complexity per iteration of O(poly(d)) and can achieve a regret of O(poly(d) log T) even when |χ| = Ω(2d). In addition, unlike previous algorithms, its space complexity scales like O(Kd2) and does not grow with T.

    Original languageEnglish (US)
    Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings
    Pages257-272
    Number of pages16
    EditionPART 1
    DOIs
    StatePublished - Oct 31 2013
    EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013 - Prague, Czech Republic
    Duration: Sep 23 2013Sep 27 2013

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    NumberPART 1
    Volume8188 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013
    CountryCzech Republic
    CityPrague
    Period9/23/139/27/13

    Fingerprint

    Regret
    Efficient Algorithms
    Multi-armed Bandit
    Linearly
    Bandit Problems
    Iteration
    Algorithm Complexity
    Space Complexity
    Reward
    Linear Function
    Scenarios
    Context

    Keywords

    • Contextual Linear Bandits
    • Space and Time Efficiency

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Science(all)

    Cite this

    Bento, J., Ioannidis, S., Muthukrishnan, S., & Yan, J. (2013). A time and space efficient algorithm for contextual linear bandits. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings (PART 1 ed., pp. 257-272). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8188 LNAI, No. PART 1). https://doi.org/10.1007/978-3-642-40988-2_17

    A time and space efficient algorithm for contextual linear bandits. / Bento, José; Ioannidis, Stratis; Muthukrishnan, Shanmugavelayutham; Yan, Jinyun.

    Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings. PART 1. ed. 2013. p. 257-272 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8188 LNAI, No. PART 1).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Bento, J, Ioannidis, S, Muthukrishnan, S & Yan, J 2013, A time and space efficient algorithm for contextual linear bandits. in Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings. PART 1 edn, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 1, vol. 8188 LNAI, pp. 257-272, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013, Prague, Czech Republic, 9/23/13. https://doi.org/10.1007/978-3-642-40988-2_17
    Bento J, Ioannidis S, Muthukrishnan S, Yan J. A time and space efficient algorithm for contextual linear bandits. In Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings. PART 1 ed. 2013. p. 257-272. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1). https://doi.org/10.1007/978-3-642-40988-2_17
    Bento, José ; Ioannidis, Stratis ; Muthukrishnan, Shanmugavelayutham ; Yan, Jinyun. / A time and space efficient algorithm for contextual linear bandits. Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings. PART 1. ed. 2013. pp. 257-272 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); PART 1).
    @inproceedings{cc3ac7d0809d492495272fecaefe92c3,
    title = "A time and space efficient algorithm for contextual linear bandits",
    abstract = "We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been proposed that achieve O(logT) regret after T time steps. However, proposed methods either have a computation complexity per iteration that scales linearly with T or achieve regrets that grow linearly with the number of contexts |χ|. We propose an ε-greedy type of algorithm that solves both limitations. In particular, when contexts are variables in ℝd, we prove that our algorithm has a constant computation complexity per iteration of O(poly(d)) and can achieve a regret of O(poly(d) log T) even when |χ| = Ω(2d). In addition, unlike previous algorithms, its space complexity scales like O(Kd2) and does not grow with T.",
    keywords = "Contextual Linear Bandits, Space and Time Efficiency",
    author = "Jos{\'e} Bento and Stratis Ioannidis and Shanmugavelayutham Muthukrishnan and Jinyun Yan",
    year = "2013",
    month = "10",
    day = "31",
    doi = "10.1007/978-3-642-40988-2_17",
    language = "English (US)",
    isbn = "9783642409875",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    number = "PART 1",
    pages = "257--272",
    booktitle = "Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings",
    edition = "PART 1",

    }

    TY - GEN

    T1 - A time and space efficient algorithm for contextual linear bandits

    AU - Bento, José

    AU - Ioannidis, Stratis

    AU - Muthukrishnan, Shanmugavelayutham

    AU - Yan, Jinyun

    PY - 2013/10/31

    Y1 - 2013/10/31

    N2 - We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been proposed that achieve O(logT) regret after T time steps. However, proposed methods either have a computation complexity per iteration that scales linearly with T or achieve regrets that grow linearly with the number of contexts |χ|. We propose an ε-greedy type of algorithm that solves both limitations. In particular, when contexts are variables in ℝd, we prove that our algorithm has a constant computation complexity per iteration of O(poly(d)) and can achieve a regret of O(poly(d) log T) even when |χ| = Ω(2d). In addition, unlike previous algorithms, its space complexity scales like O(Kd2) and does not grow with T.

    AB - We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been proposed that achieve O(logT) regret after T time steps. However, proposed methods either have a computation complexity per iteration that scales linearly with T or achieve regrets that grow linearly with the number of contexts |χ|. We propose an ε-greedy type of algorithm that solves both limitations. In particular, when contexts are variables in ℝd, we prove that our algorithm has a constant computation complexity per iteration of O(poly(d)) and can achieve a regret of O(poly(d) log T) even when |χ| = Ω(2d). In addition, unlike previous algorithms, its space complexity scales like O(Kd2) and does not grow with T.

    KW - Contextual Linear Bandits

    KW - Space and Time Efficiency

    UR - http://www.scopus.com/inward/record.url?scp=84886538344&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84886538344&partnerID=8YFLogxK

    U2 - 10.1007/978-3-642-40988-2_17

    DO - 10.1007/978-3-642-40988-2_17

    M3 - Conference contribution

    SN - 9783642409875

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 257

    EP - 272

    BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings

    ER -