Computing the exact solution of an MDP model is generally difficult and possibly intractable for realistically sized problem instances. A powerful technique to solve the large scale discrete time multistage stochastic control processes is Approximate Dynamic Programming (ADP). Although ADP is used as an umbrella term for a broad spectrum of methods to approximate the optimal solution of MDPs, the common denominator is typically to combine optimization with simulation, use approximations of the optimal values of the Bellman’s equations, and use approximate policies. This chapter aims to present and illustrate the basics of these steps by a number of practical and instructive examples. We use three examples (1) to explain the basics of ADP, relying on value iteration with an approximation of the value functions, (2) to provide insight into implementation issues, and (3) to provide test cases for the reader to validate its own ADP implementations.
|Title of host publication||Markov Decision Processes in Practice|
|Editors||Richard Boucherie, Nico M. van Dijk|
|Publication status||Published - 11 Mar 2017|
|Name||International Series in Operations Research & Management Science|
Mes, M. R. K., & Perez Rivera, A. E. (2017). Approximate Dynamic Programming by Practical Examples. In R. Boucherie, & N. M. van Dijk (Eds.), Markov Decision Processes in Practice (pp. 63-101). (International Series in Operations Research & Management Science; No. 248). Springer. https://doi.org/10.1007/978-3-319-47766-4_3