Skip to main navigation Skip to search Skip to main content

Bandits with many optimal arms

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

5 Downloads (Pure)

Abstract

We consider a stochastic bandit problem with a possibly infinite number of arms. We write p* for the proportion of optimal arms and A for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting in terms of the problem parameters T (the budget), p* and A. For the objective of minimizing the cumulative regret, we provide a lower bound of order fi(log(T)/(p* A)) and a UCB-style algorithm with matching upper bound up to a factor of log(1 / A) . Our algorithm needs p* to calibrate its parameters, and we prove that this knowledge is necessary, since adapting to p* in this setting is impossible. For best-arm identification we also provide a lower bound of order fi(exp(—cTA2p*)) on the probability of outputting a sub-optimal arm where c > 0 is an absolute constant. We also provide an elimination algorithm with an upper bound matching the lower bound up to a factor of order log(T) in the exponential, and that does not need p* or A as parameter. Our results apply directly to the three related problems of competing against the j-th best arm, identifying an e good arm, and finding an arm with mean larger than a quantile of a known order.

Original languageEnglish
Title of host publication35th Conference on Neural Information Processing Systems, NeurIPS 2021
EditorsMarc'Aurelio Ranzato, Alina Beygelzimer, Yann Dauphin, Percy S. Liang, Jenn Wortman Vaughan
PublisherNeural information processing systems foundation
Pages22457-22469
Number of pages13
ISBN (Electronic)9781713845393
Publication statusPublished - 2021
Externally publishedYes
Event35th Conference on Neural Information Processing Systems, NeurIPS 2021 - Virtual, Online
Duration: 6 Dec 202114 Dec 2021
Conference number: 35

Publication series

NameAdvances in Neural Information Processing Systems
PublisherNeural Information Processing Systems Foundation
Volume34
ISSN (Print)1049-5258

Conference

Conference35th Conference on Neural Information Processing Systems, NeurIPS 2021
Abbreviated titleNeurIPS 2021
CityVirtual, Online
Period6/12/2114/12/21

Fingerprint

Dive into the research topics of 'Bandits with many optimal arms'. Together they form a unique fingerprint.

Cite this