Deep reinforcement learning in linear discrete action spaces

Wouter van Heeswijk, Han La Poutre

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

1 Citation (Scopus)
161 Downloads (Pure)


Problems in operations research are typically combinatorial and high-dimensional. To a degree, linear programs may efficiently solve such large decision problems. For stochastic multi-period problems, decomposition into a sequence of one-stage decisions with approximated downstream effects is often necessary, e.g., by deploying reinforcement learning to obtain value function approximations (VFAs). When embedding such VFAs into one-stage linear programs, VFA design is restricted by linearity. This paper presents an integrated simulation approach for such complex optimization problems, developing a deep reinforcement learning algorithm that combines linear programming and neural network VFAs. Our proposed method embeds neural network VFAs into one-stage linear decision problems, combining the nonlinear expressive power of neural networks with the efficiency of solving linear programs. As a proof of concept, we perform numerical experiments on a transportation problem. The neural network VFAs consistently outperform polynomial VFAs as well as other benchmarks, with limited design and tuning effort.

Original languageEnglish
Title of host publicationProceedings of the 2020 Winter Simulation Conference, WSC 2020
EditorsK.-H. Bae, B. Feng, S. Kim, S. Lazarova-Molnar, Z. Zheng, T. Roeder, R. Thiesing
Place of PublicationPiscataway, NJ
Number of pages12
ISBN (Electronic)978-1-7281-9499-8
ISBN (Print)978-1-7281-9500-1
Publication statusPublished - 29 Mar 2021
Externally publishedYes
EventWinter Simulation Conference, WSC 2020: Simulation Drives Innovation - Virtual Conference, Orlando, United States
Duration: 14 Dec 202018 Dec 2020

Publication series

NameProceedings - Winter Simulation Conference
ISSN (Print)0891-7736
ISSN (Electronic)1558-4305


ConferenceWinter Simulation Conference, WSC 2020
Abbreviated titleWSC 2020
Country/TerritoryUnited States
Internet address


  • 22/2 OA procedure


Dive into the research topics of 'Deep reinforcement learning in linear discrete action spaces'. Together they form a unique fingerprint.

Cite this