TY - JOUR
T1 - Reinforcement learning for humanitarian relief distribution with trucks and UAVs under travel time uncertainty
AU - van Steenbergen, Robert M.
AU - Mes, Martijn
AU - van Heeswijk, Wouter J.A.
N1 - Funding Information:
We thank Akshayat Koirala for his scrutiny and perseverance during the local collection of field data about the 2015 earthquake by visiting government officials, NGOs, local agencies, VDC chiefs, and ward members at various villages in the districts of Nuwakot and Dhading in Nepal in August 2021. We thank the three anonymous reviewers for their valuable suggestions and insightful comments during the review process which has greatly enhanced this paper. We also thank Anne Zander for her knowledgeable feedback during the review process. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Publisher Copyright:
© 2023 The Author(s)
PY - 2023/12
Y1 - 2023/12
N2 - Effective humanitarian relief operations are challenging in the aftermath of disasters, as trucks are often faced with considerable travel time uncertainties due to damaged transportation networks. Efficient deployment of Unmanned Aerial Vehicles (UAVs) potentially mitigates this problem, supplementing truck fleets in an impactful manner. To plan last-mile relief distribution in this setting, we introduce a multi-trip, split-delivery vehicle routing problem with trucks and UAVs, soft time windows, and stochastic travel times for last-mile relief distribution, formulated as a stochastic dynamic program. Within a finite time horizon, we aim to maximize a weighted objective function comprising the number of goods delivered, the number of different locations visited, and late arrival penalties. Our study offers insights into dealing with travel time uncertainty in humanitarian logistics by (i) deploying Unmanned Aerial Vehicles (UAVs) as partial substitutes for trucks, (ii) evaluating dynamic solutions generated by two deep reinforcement learning (RL) approaches – specifically value function approximation (VFA) and policy function approximation (PFA) – and (iii) comparing the RL solutions with solutions stemming from mathematical programming and dynamic heuristics. Experiments are performed on both Solomon-based instances and two real-world cases. The real-world cases – the 2015 Nepal earthquake and the 2018 Indonesia tsunami – are based on locally collected field data and real-world UAV specifications, and aim to provide practical insights. The experimental results show that dynamic decision-making improves both performance and robustness of humanitarian operations, achieving reductions in lateness penalties of around 85% compared to static solutions based on expected travel times. Furthermore, the results show that replacing half of the trucks with UAVs improves the weighted objective value by 11% to 56%, benefitting both reliability and location coverage. The results indicate that both the deployment of UAVs and the use of dynamic methods successfully mitigate travel time uncertainties in humanitarian operations.
AB - Effective humanitarian relief operations are challenging in the aftermath of disasters, as trucks are often faced with considerable travel time uncertainties due to damaged transportation networks. Efficient deployment of Unmanned Aerial Vehicles (UAVs) potentially mitigates this problem, supplementing truck fleets in an impactful manner. To plan last-mile relief distribution in this setting, we introduce a multi-trip, split-delivery vehicle routing problem with trucks and UAVs, soft time windows, and stochastic travel times for last-mile relief distribution, formulated as a stochastic dynamic program. Within a finite time horizon, we aim to maximize a weighted objective function comprising the number of goods delivered, the number of different locations visited, and late arrival penalties. Our study offers insights into dealing with travel time uncertainty in humanitarian logistics by (i) deploying Unmanned Aerial Vehicles (UAVs) as partial substitutes for trucks, (ii) evaluating dynamic solutions generated by two deep reinforcement learning (RL) approaches – specifically value function approximation (VFA) and policy function approximation (PFA) – and (iii) comparing the RL solutions with solutions stemming from mathematical programming and dynamic heuristics. Experiments are performed on both Solomon-based instances and two real-world cases. The real-world cases – the 2015 Nepal earthquake and the 2018 Indonesia tsunami – are based on locally collected field data and real-world UAV specifications, and aim to provide practical insights. The experimental results show that dynamic decision-making improves both performance and robustness of humanitarian operations, achieving reductions in lateness penalties of around 85% compared to static solutions based on expected travel times. Furthermore, the results show that replacing half of the trucks with UAVs improves the weighted objective value by 11% to 56%, benefitting both reliability and location coverage. The results indicate that both the deployment of UAVs and the use of dynamic methods successfully mitigate travel time uncertainties in humanitarian operations.
KW - Humanitarian logistics
KW - Last-mile relief distribution
KW - Travel time uncertainty
KW - UAVs
KW - Reinforcement learning
KW - Comparative analysis
KW - UT-Hybrid-D
U2 - 10.1016/j.trc.2023.104401
DO - 10.1016/j.trc.2023.104401
M3 - Article
SN - 0968-090X
VL - 157
JO - Transportation Research Part C: Emerging Technologies
JF - Transportation Research Part C: Emerging Technologies
M1 - 104401
ER -