A Comparison of Reinforcement Learning Policies for Dynamic Vehicle Routing Problems with Stochastic Customer Requests

Fabian Akkerman, Martijn Mes, Willem van Jaarsveld

Research output: Working paper

119 Downloads (Pure)


This paper presents directions for using reinforcement learning with neural networks for dynamic vehicle routing problems (DVRPs). DVRPs involve sequential decision-making under uncertainty where the expected future consequences are ideally included in current decision-making. A frequently used framework for these problems is approximate dynamic programming (ADP) or reinforcement learning (RL), often in conjunction with a parametric value function approximation (VFA). A straightforward way to use VFA in DVRP is linear regression (LVFA), but more complex, non-linear predictors, e.g., neural network VFAs (NNVFA), are also widely used. Alternatively, we may represent the policy directly, using a linear policy function approximation (LPFA) or neural network PFA (NNPFA). The abundance of policies and design choices complicate the use of neural networks for DVRPs in research and practice. This paper presents an empirical comparison of LVFA, LPFA, NNVFA, and NNPFA policies. The comparison is conducted on several problem variants of the DVRP with stochastic customer requests. To validate our findings, we study realistic extensions of the stylized problem on (i) a same-day parcel pickup and delivery case in the city of Amsterdam, the Netherlands, and (ii) the routing of robots in an automated storage and retrieval system (AS/RS). We find that (i) whether neural network-based approaches or linear policies are better depends subtly on problem characteristics, (ii) the potential ability of neural networks to improve upon linear policies is drawn from their ability to capture complex relationships between state variables and the downstream costs of the state, but (iii) comes at the expense of considerably longer computational times. Furthermore, (iv) supplying engineered features may distort models, but can potentially help the neural network to find a better policy. Finally, (v) in most cases, NNPFAs outperform NNVFAs, while NNVFAs are significantly faster.
Original languageEnglish
Publication statusIn preparation - Sept 2022


  • Value function approximation
  • Policy function approximation
  • Neural networks
  • Dynamic vehicle routing
  • Reinforcement learning


Dive into the research topics of 'A Comparison of Reinforcement Learning Policies for Dynamic Vehicle Routing Problems with Stochastic Customer Requests'. Together they form a unique fingerprint.

Cite this