Bayesian exploration for approximate dynamic programming

Ilya O. Ryzhov, Martijn R.K. Mes, Warren B. Powell, Gerald van den Berg

Research output: Contribution to journalArticleAcademicpeer-review

6 Downloads (Pure)

Abstract

Approximate dynamic programming (ADP) is a general methodological framework for multistage stochastic optimization problems in transportation, finance, energy, and other domains. We propose a new approach to the exploration/exploitation dilemma in ADP that leverages two important concepts from the optimal learning literature: first, we show how a Bayesian belief structure can be used to express uncertainty about the value function in ADP; second, we develop a new exploration strategy based on the concept of value of information and prove that it systematically explores the state space. An important advantage of our framework is that it can be integrated into both parametric and nonparametric value function approximations, which are widely used in practical implementations of ADP. We evaluate this strategy on a variety of distinct resource allocation problems and demonstrate that, although more computationally intensive, it is highly competitive against other exploration strategies.

Original languageEnglish
Pages (from-to)198-214
Number of pages17
JournalOperations research
Volume67
Issue number1
DOIs
Publication statusPublished - 18 Jan 2019

Fingerprint

Dynamic programming
Finance
Resource allocation
Approximate dynamic programming
Value function

Keywords

  • Approximate dynamic programming
  • Bayesian learning
  • Correlated beliefs
  • Optimal learning
  • Value of information

Cite this

Ryzhov, Ilya O. ; Mes, Martijn R.K. ; Powell, Warren B. ; van den Berg, Gerald. / Bayesian exploration for approximate dynamic programming. In: Operations research. 2019 ; Vol. 67, No. 1. pp. 198-214.
@article{eaa8572c2f8446c3819580d593cfbe19,
title = "Bayesian exploration for approximate dynamic programming",
abstract = "Approximate dynamic programming (ADP) is a general methodological framework for multistage stochastic optimization problems in transportation, finance, energy, and other domains. We propose a new approach to the exploration/exploitation dilemma in ADP that leverages two important concepts from the optimal learning literature: first, we show how a Bayesian belief structure can be used to express uncertainty about the value function in ADP; second, we develop a new exploration strategy based on the concept of value of information and prove that it systematically explores the state space. An important advantage of our framework is that it can be integrated into both parametric and nonparametric value function approximations, which are widely used in practical implementations of ADP. We evaluate this strategy on a variety of distinct resource allocation problems and demonstrate that, although more computationally intensive, it is highly competitive against other exploration strategies.",
keywords = "Approximate dynamic programming, Bayesian learning, Correlated beliefs, Optimal learning, Value of information",
author = "Ryzhov, {Ilya O.} and Mes, {Martijn R.K.} and Powell, {Warren B.} and {van den Berg}, Gerald",
year = "2019",
month = "1",
day = "18",
doi = "10.1287/opre.2018.1772",
language = "English",
volume = "67",
pages = "198--214",
journal = "Operations research",
issn = "0030-364X",
publisher = "INFORMS Institute for Operations Research and the Management Sciences",
number = "1",

}

Bayesian exploration for approximate dynamic programming. / Ryzhov, Ilya O.; Mes, Martijn R.K.; Powell, Warren B.; van den Berg, Gerald.

In: Operations research, Vol. 67, No. 1, 18.01.2019, p. 198-214.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Bayesian exploration for approximate dynamic programming

AU - Ryzhov, Ilya O.

AU - Mes, Martijn R.K.

AU - Powell, Warren B.

AU - van den Berg, Gerald

PY - 2019/1/18

Y1 - 2019/1/18

N2 - Approximate dynamic programming (ADP) is a general methodological framework for multistage stochastic optimization problems in transportation, finance, energy, and other domains. We propose a new approach to the exploration/exploitation dilemma in ADP that leverages two important concepts from the optimal learning literature: first, we show how a Bayesian belief structure can be used to express uncertainty about the value function in ADP; second, we develop a new exploration strategy based on the concept of value of information and prove that it systematically explores the state space. An important advantage of our framework is that it can be integrated into both parametric and nonparametric value function approximations, which are widely used in practical implementations of ADP. We evaluate this strategy on a variety of distinct resource allocation problems and demonstrate that, although more computationally intensive, it is highly competitive against other exploration strategies.

AB - Approximate dynamic programming (ADP) is a general methodological framework for multistage stochastic optimization problems in transportation, finance, energy, and other domains. We propose a new approach to the exploration/exploitation dilemma in ADP that leverages two important concepts from the optimal learning literature: first, we show how a Bayesian belief structure can be used to express uncertainty about the value function in ADP; second, we develop a new exploration strategy based on the concept of value of information and prove that it systematically explores the state space. An important advantage of our framework is that it can be integrated into both parametric and nonparametric value function approximations, which are widely used in practical implementations of ADP. We evaluate this strategy on a variety of distinct resource allocation problems and demonstrate that, although more computationally intensive, it is highly competitive against other exploration strategies.

KW - Approximate dynamic programming

KW - Bayesian learning

KW - Correlated beliefs

KW - Optimal learning

KW - Value of information

UR - http://www.scopus.com/inward/record.url?scp=85062108374&partnerID=8YFLogxK

U2 - 10.1287/opre.2018.1772

DO - 10.1287/opre.2018.1772

M3 - Article

VL - 67

SP - 198

EP - 214

JO - Operations research

JF - Operations research

SN - 0030-364X

IS - 1

ER -