Multi-echelon inventory optimization using deep reinforcement learning

Kevin Geevers*, Lotte van Hezewijk, Martijn R.K. Mes

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

1 Citation (Scopus)
59 Downloads (Pure)


This paper studies the applicability of a deep reinforcement learning approach to three different multi-echelon inventory systems, with the objective of minimizing the holding and backorder costs. First, we conduct an extensive literature review to map the current applications of reinforcement learning in multi-echelon inventory systems. Next, we apply our deep reinforcement learning method to three cases with different network structures (linear, divergent, and general structures). The linear and divergent cases are derived from literature, whereas the general case is based on a real-life manufacturer. We apply the proximal policy optimization (PPO) algorithm, with a continuous action space, and show that it consistently outperforms the benchmark solution. It achieves an average improvement of 16.4% for the linear case, 11.3% for the divergent case, and 6.6% for the general case. We explain the limitations of our approach and propose avenues for future research.

Original languageEnglish
JournalCentral European journal of operations research
Publication statusE-pub ahead of print/First online - 19 Jul 2023


  • Backorders
  • Deep reinforcement learning
  • Inventory control
  • Multi-echelon
  • Proximal policy optimization


Dive into the research topics of 'Multi-echelon inventory optimization using deep reinforcement learning'. Together they form a unique fingerprint.

Cite this