Hierarchical Knowledge-Gradient for Sequential Sampling

Martijn R.K. Mes, Warren B. Powel, Peter I. Frazier

Research output: Book/ReportReportOther research output

40 Downloads (Pure)

Abstract

We consider the problem of selecting the best of a finite but very large set of alternatives. Each alternative may be characterized by a multi-dimensional vector and has independent normal rewards. This problem arises in various settings such as (i) ranking and selection, (ii) simulation optimization where the unknown mean of each alternative is estimated with stochastic simulation output, and (iii) approximate dynamic programming where we need to estimate values based on Monte-Carlo simulation. We use a Bayesian probability model for the unknown reward of each alternative and follow a fully sequential sampling policy called the knowledge-gradient policy. This policy myopically optimizes the expected increment in the value of sampling information in each time period. Because the number of alternatives is large, we propose a hierarchical aggregation technique that uses the common features shared by alternatives to learn about many alternatives from even a single measurement, thus greatly reducing the measurement effort required. We demonstrate how this hierarchical knowledge-gradient policy can be applied to efficiently maximize a continuous function and prove that this policy finds a globally optimal alternative in the limit.
Original languageUndefined
Place of PublicationEnschede
PublisherUniversity of Twente, Research School for Operations Management and Logistics (BETA)
Number of pages33
ISBN (Print)9789038620787
Publication statusPublished - 2009

Publication series

NameBeta working papers
PublisherBeta Research School for Operations Management and Logistics, University of Twente
No.290

Keywords

  • ranking and selection
  • IR-70209
  • adaptive learning
  • Bayesian Statistics
  • hierarchical statistics
  • Sequential decision analysis

Cite this