A comparison of reinforcement learning policies for dynamic vehicle routing problems with stochastic customer requests

Fabian Akkerman*, Martijn Mes, Willem van Jaarsveld

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

13 Downloads (Pure)

Abstract

This paper presents directions for using reinforcement learning with neural networks for dynamic vehicle routing problems (DVRPs). DVRPs involve sequential decision-making under uncertainty where the expected future consequences are ideally included in current decision-making. A frequently used framework for these problems is approximate dynamic programming (ADP) or reinforcement learning (RL), often in conjunction with a parametric value function approximation (VFA). A straightforward way to use VFA in DVRP is linear regression (LVFA), but more complex, non-linear predictors, e.g., neural network VFAs (NNVFA), are also widely used. Alternatively, we may represent the policy directly, using a linear policy function approximation (LPFA) or neural network PFA (NNPFA). The abundance of policies and design choices complicate the use of neural networks for DVRPs in research and practice. We provide a structured overview of the similarities and differences between the policy classes. Furthermore, we present an empirical comparison of LVFA, LPFA, NNVFA, and NNPFA policies. The comparison is conducted on several problem variants of the DVRP with stochastic customer requests. To validate our findings, we study realistic extensions of the stylized problem on (i) a same-day parcel pickup and delivery case in the city of Amsterdam, the Netherlands, and (ii) the routing of robots in an automated storage and retrieval system (AS/RS). Based on our empirical evaluation, we provide insights into the advantages and disadvantages of neural network policies compared to linear policies, and value-based approaches compared to policy-based approaches.

Original languageEnglish
Article number110747
Number of pages18
JournalComputers and Industrial Engineering
Volume200
Early online date30 Nov 2024
DOIs
Publication statusPublished - Feb 2025

Keywords

  • UT-Hybrid-D
  • Neural networks
  • Policy function approximation
  • Reinforcement learning
  • Stochastic customer requests
  • Value function approximation
  • Dynamic vehicle routing

Fingerprint

Dive into the research topics of 'A comparison of reinforcement learning policies for dynamic vehicle routing problems with stochastic customer requests'. Together they form a unique fingerprint.

Cite this