### Abstract

We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been proposed that achieve O(logT) regret after T time steps. However, proposed methods either have a computation complexity per iteration that scales linearly with T or achieve regrets that grow linearly with the number of contexts |χ|. We propose an ε-greedy type of algorithm that solves both limitations. In particular, when contexts are variables in ℝ^{d}, we prove that our algorithm has a constant computation complexity per iteration of O(poly(d)) and can achieve a regret of O(poly(d) log T) even when |χ| = Ω(2^{d}). In addition, unlike previous algorithms, its space complexity scales like O(Kd^{2}) and does not grow with T.

Original language | English (US) |
---|---|

Title of host publication | Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings |

Pages | 257-272 |

Number of pages | 16 |

Edition | PART 1 |

DOIs | |

State | Published - Oct 31 2013 |

Event | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013 - Prague, Czech Republic Duration: Sep 23 2013 → Sep 27 2013 |

### Publication series

Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|

Number | PART 1 |

Volume | 8188 LNAI |

ISSN (Print) | 0302-9743 |

ISSN (Electronic) | 1611-3349 |

### Conference

Conference | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013 |
---|---|

Country | Czech Republic |

City | Prague |

Period | 9/23/13 → 9/27/13 |

### Fingerprint

### Keywords

- Contextual Linear Bandits
- Space and Time Efficiency

### ASJC Scopus subject areas

- Theoretical Computer Science
- Computer Science(all)

### Cite this

*Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings*(PART 1 ed., pp. 257-272). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8188 LNAI, No. PART 1). https://doi.org/10.1007/978-3-642-40988-2_17

**A time and space efficient algorithm for contextual linear bandits.** / Bento, José; Ioannidis, Stratis; Muthukrishnan, Shanmugavelayutham; Yan, Jinyun.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings.*PART 1 edn, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), no. PART 1, vol. 8188 LNAI, pp. 257-272, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013, Prague, Czech Republic, 9/23/13. https://doi.org/10.1007/978-3-642-40988-2_17

}

TY - GEN

T1 - A time and space efficient algorithm for contextual linear bandits

AU - Bento, José

AU - Ioannidis, Stratis

AU - Muthukrishnan, Shanmugavelayutham

AU - Yan, Jinyun

PY - 2013/10/31

Y1 - 2013/10/31

N2 - We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been proposed that achieve O(logT) regret after T time steps. However, proposed methods either have a computation complexity per iteration that scales linearly with T or achieve regrets that grow linearly with the number of contexts |χ|. We propose an ε-greedy type of algorithm that solves both limitations. In particular, when contexts are variables in ℝd, we prove that our algorithm has a constant computation complexity per iteration of O(poly(d)) and can achieve a regret of O(poly(d) log T) even when |χ| = Ω(2d). In addition, unlike previous algorithms, its space complexity scales like O(Kd2) and does not grow with T.

AB - We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been proposed that achieve O(logT) regret after T time steps. However, proposed methods either have a computation complexity per iteration that scales linearly with T or achieve regrets that grow linearly with the number of contexts |χ|. We propose an ε-greedy type of algorithm that solves both limitations. In particular, when contexts are variables in ℝd, we prove that our algorithm has a constant computation complexity per iteration of O(poly(d)) and can achieve a regret of O(poly(d) log T) even when |χ| = Ω(2d). In addition, unlike previous algorithms, its space complexity scales like O(Kd2) and does not grow with T.

KW - Contextual Linear Bandits

KW - Space and Time Efficiency

UR - http://www.scopus.com/inward/record.url?scp=84886538344&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84886538344&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-40988-2_17

DO - 10.1007/978-3-642-40988-2_17

M3 - Conference contribution

AN - SCOPUS:84886538344

SN - 9783642409875

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 257

EP - 272

BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings

ER -