Reinforcement learning for humanitarian relief distribution with trucks and UAVs under travel time uncertainty

Robert M. van Steenbergen*, Martijn Mes, Wouter J.A. van Heeswijk

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

21 Downloads (Pure)


Effective humanitarian relief operations are challenging in the aftermath of disasters, as trucks are often faced with considerable travel time uncertainties due to damaged transportation networks. Efficient deployment of Unmanned Aerial Vehicles (UAVs) potentially mitigates this problem, supplementing truck fleets in an impactful manner. To plan last-mile relief distribution in this setting, we introduce a multi-trip, split-delivery vehicle routing problem with trucks and UAVs, soft time windows, and stochastic travel times for last-mile relief distribution, formulated as a stochastic dynamic program. Within a finite time horizon, we aim to maximize a weighted objective function comprising the number of goods delivered, the number of different locations visited, and late arrival penalties. Our study offers insights into dealing with travel time uncertainty in humanitarian logistics by (i) deploying Unmanned Aerial Vehicles (UAVs) as partial substitutes for trucks, (ii) evaluating dynamic solutions generated by two deep reinforcement learning (RL) approaches – specifically value function approximation (VFA) and policy function approximation (PFA) – and (iii) comparing the RL solutions with solutions stemming from mathematical programming and dynamic heuristics. Experiments are performed on both Solomon-based instances and two real-world cases. The real-world cases – the 2015 Nepal earthquake and the 2018 Indonesia tsunami – are based on locally collected field data and real-world UAV specifications, and aim to provide practical insights. The experimental results show that dynamic decision-making improves both performance and robustness of humanitarian operations, achieving reductions in lateness penalties of around 85% compared to static solutions based on expected travel times. Furthermore, the results show that replacing half of the trucks with UAVs improves the weighted objective value by 11% to 56%, benefitting both reliability and location coverage. The results indicate that both the deployment of UAVs and the use of dynamic methods successfully mitigate travel time uncertainties in humanitarian operations.
Original languageEnglish
Article number104401
Number of pages28
JournalTransportation Research Part C: Emerging Technologies
Early online date7 Nov 2023
Publication statusPublished - Dec 2023


  • Humanitarian logistics
  • Last-mile relief distribution
  • Travel time uncertainty
  • UAVs
  • Reinforcement learning
  • Comparative analysis
  • UT-Hybrid-D


Dive into the research topics of 'Reinforcement learning for humanitarian relief distribution with trucks and UAVs under travel time uncertainty'. Together they form a unique fingerprint.

Cite this