Abstract
We consider a stochastic bandit problem with a possibly infinite number of arms. We write p* for the proportion of optimal arms and A for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting in terms of the problem parameters T (the budget), p* and A. For the objective of minimizing the cumulative regret, we provide a lower bound of order fi(log(T)/(p* A)) and a UCB-style algorithm with matching upper bound up to a factor of log(1 / A) . Our algorithm needs p* to calibrate its parameters, and we prove that this knowledge is necessary, since adapting to p* in this setting is impossible. For best-arm identification we also provide a lower bound of order fi(exp(—cTA2p*)) on the probability of outputting a sub-optimal arm where c > 0 is an absolute constant. We also provide an elimination algorithm with an upper bound matching the lower bound up to a factor of order log(T) in the exponential, and that does not need p* or A as parameter. Our results apply directly to the three related problems of competing against the j-th best arm, identifying an e good arm, and finding an arm with mean larger than a quantile of a known order.
| Original language | English |
|---|---|
| Title of host publication | 35th Conference on Neural Information Processing Systems, NeurIPS 2021 |
| Editors | Marc'Aurelio Ranzato, Alina Beygelzimer, Yann Dauphin, Percy S. Liang, Jenn Wortman Vaughan |
| Publisher | Neural information processing systems foundation |
| Pages | 22457-22469 |
| Number of pages | 13 |
| ISBN (Electronic) | 9781713845393 |
| Publication status | Published - 2021 |
| Externally published | Yes |
| Event | 35th Conference on Neural Information Processing Systems, NeurIPS 2021 - Virtual, Online Duration: 6 Dec 2021 → 14 Dec 2021 Conference number: 35 |
Publication series
| Name | Advances in Neural Information Processing Systems |
|---|---|
| Publisher | Neural Information Processing Systems Foundation |
| Volume | 34 |
| ISSN (Print) | 1049-5258 |
Conference
| Conference | 35th Conference on Neural Information Processing Systems, NeurIPS 2021 |
|---|---|
| Abbreviated title | NeurIPS 2021 |
| City | Virtual, Online |
| Period | 6/12/21 → 14/12/21 |
Fingerprint
Dive into the research topics of 'Bandits with many optimal arms'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver