Reinforcement Learning and SLAM based approach for mobile robot navigation in unknown environments

Research output: Contribution to conferencePaperAcademicpeer-review

Abstract

Autonomous navigation of robots in unknown environments from their current position to a desired target without colliding with obstacles represents an important aspect in the field of mobile robots. In literature, traditional methods do exist in case a complete knowledge of the environment is available. However, this is not the case in real-life applications where a complete knowledge about stochastic environments can hardly be obtained where the positions of the obstacles are unknown. Our main goal is to navigate a skid-steering mobile robot (SSMR) with non-holonomic constraints in an unknown environment to its desired target in minimum time while avoiding colliding with the obstacles. In the context of autonomous cart navigation, autonomous navigation of mobile robots in unstructured environments can be formulated as a Reinforcement learning (RL) problem. Reinforcement learning is a paradigm in which the agent (robot) learns its optimal path by interacting with the environment and receiving a reward based on its actions. Based on this action-reward scenario, the optimal action for each state can be discovered by maximizing a predefined accumulated reward that reflects the quality of the trajectory taken by the robot. Reinforcement learning can be categorized into two main methods (Bagnell J. Peters J. Kober, 2013), namely value based methods (Q-learning, Deep Q-learning) and policy based methods (REINFORCE with Policy Gradient).
(i) Value based methods: In these methods, the value function that maps each state-action pair into a value is learned. Thanks
to these methods, the best action to take for each state, the action
with the biggest value, can be found. This works well in case of
a finite set of actions.
(ii) Policy based methods: In policy based methods, instead of
learning value functions that give an indication of the expected
total reward for each state-action pair, the policy function that
maps the state into action is optimized directly (Brundage. M
Arulkumaran. K and A., 2017). Policy search methods are more
effective in high dimensional action spaces, or when using continuous
actions like the case of mobile robot navigation (Neumann
G. Peters J. Deisenroth, 2013).
However, since both of these methods have their own drawbacks,
there is hybrid method called Actor-Critic that employs both value
functions and policy search (Brundage. M Arulkumaran. K and
A., 2017). In this method, the actor adjust the policy parameters
by a policy gradient ascent whereas the critic estimates the action
value function using a policy evaluation algorithm such as
temporal difference (TD) learning. This can be done through two
neural networks running in parallel.
Herein, we introduce a reinforcement learning based navigation
system that enhance the navigation capabilities of a skid steering
mobile robot. This goal will be achieved by using asynchronous
deep deterministic policy gradient (ADDPG) through
an actor-critic algorithm (Badia A. Mirza M. Graves A. Harley
T. Lillicrap T. Silver D. Kavukcuoglu K. Mnih, 2016). Two deep
neural-networks are introduced in this case. The first one (actornetwork)
transfers the input vector that represents the states of
the robot to linear and angular velocity commands that represent
the actions. Whereas the second neural network (critic-network)
is responsible for predicting the Q-value of the state and action
pairs. Deterministic policies will be utilized since it was proposed
that they outperform their stochastic counterparts in highdimensional
action spaces (Lever G. Hess N. Degris T. Wierstra
D. Riedmiller M. Silver, 2014). On the other hand, Simultaneous
Localization and Mapping also known as SLAM techniques are
to be integrated with reinforcement learning in an attempt to improve
the learning rate by providing more accurate estimation of
the robots states. SLAM has been an active topic for many years
(C. Cadena, 2016) because it provides two fundamental components
to robotics: where I am, and what I see.
Herein, we introduce how the navigation problem of non-holonomic
mobile robots can be formulated as a reinforcement learning problem
that could be solved by using ADDPG actor-critic algorithm.
Besides, we share the experimental results on how SLAM would
help reinforcement learning at all by comparing the learning rate
without SLAM (comparison when a partial map of the environment
is given and when no map is available). For implementation
of the parallel running neural networks, we do our initial experiments
using the GPU cluster of University of Twente. However,
when it comes to real-life implementations and demonstrations
on real robots, the hardware becomes the key component to determine
whether the goal can be achieved or not. Therefore, we
implement the trained networks on NVIDIA Jetson TX2 in order
to show real-life implementation possibilities when the algorithms
need to run in a navigating robot (Jetson TX2 Module,
online documentation, n.d.). In this sense, NVIDIA Jetson TX2
gives opportunity to run even multiple complex neural networks
real-time on a credic card size processor unit which can easily
be carried by a small (less than 20cm length) robot. We show our
experiments on sensor selection and the OpenAI package in ROSGazebo
platform as a possible simulation environment. Last but
not least, we discuss the real-life implementation scenarios using
our NVIDIA Jetson TX2 and how realistic the simulation results
could be when the algorithms are implemented on the real robot.
The early experiments indicate that, for an optimal design of a
SLAM and reinforcement learning based system to truly work in
real-world applications in order to reach a goal in an unknown
environment, the hardware setup and the algorithms must be collaboratively
designed.
Original languageEnglish
Publication statusPublished - Jun 2019
EventISPRS Workshop Indoor 3D 2019 - Waaier, Carré and Hal B buildings, Enschede, Netherlands
Duration: 11 Jun 201912 Jun 2019
http://indoor3d.net/2019/

Workshop

WorkshopISPRS Workshop Indoor 3D 2019
CountryNetherlands
CityEnschede
Period11/06/1912/06/19
Internet address

Fingerprint

Reinforcement learning
Mobile robots
Navigation
Robots
Silver
Neural networks
Hardware
Angular velocity
Robotics
Trajectories

Cite this

@conference{e6201d197a9d433b869083c9bf605491,
title = "Reinforcement Learning and SLAM based approach for mobile robot navigation in unknown environments",
abstract = "Autonomous navigation of robots in unknown environments from their current position to a desired target without colliding with obstacles represents an important aspect in the field of mobile robots. In literature, traditional methods do exist in case a complete knowledge of the environment is available. However, this is not the case in real-life applications where a complete knowledge about stochastic environments can hardly be obtained where the positions of the obstacles are unknown. Our main goal is to navigate a skid-steering mobile robot (SSMR) with non-holonomic constraints in an unknown environment to its desired target in minimum time while avoiding colliding with the obstacles. In the context of autonomous cart navigation, autonomous navigation of mobile robots in unstructured environments can be formulated as a Reinforcement learning (RL) problem. Reinforcement learning is a paradigm in which the agent (robot) learns its optimal path by interacting with the environment and receiving a reward based on its actions. Based on this action-reward scenario, the optimal action for each state can be discovered by maximizing a predefined accumulated reward that reflects the quality of the trajectory taken by the robot. Reinforcement learning can be categorized into two main methods (Bagnell J. Peters J. Kober, 2013), namely value based methods (Q-learning, Deep Q-learning) and policy based methods (REINFORCE with Policy Gradient).(i) Value based methods: In these methods, the value function that maps each state-action pair into a value is learned. Thanksto these methods, the best action to take for each state, the actionwith the biggest value, can be found. This works well in case ofa finite set of actions.(ii) Policy based methods: In policy based methods, instead oflearning value functions that give an indication of the expectedtotal reward for each state-action pair, the policy function thatmaps the state into action is optimized directly (Brundage. MArulkumaran. K and A., 2017). Policy search methods are moreeffective in high dimensional action spaces, or when using continuousactions like the case of mobile robot navigation (NeumannG. Peters J. Deisenroth, 2013).However, since both of these methods have their own drawbacks,there is hybrid method called Actor-Critic that employs both valuefunctions and policy search (Brundage. M Arulkumaran. K andA., 2017). In this method, the actor adjust the policy parametersby a policy gradient ascent whereas the critic estimates the actionvalue function using a policy evaluation algorithm such astemporal difference (TD) learning. This can be done through twoneural networks running in parallel.Herein, we introduce a reinforcement learning based navigationsystem that enhance the navigation capabilities of a skid steeringmobile robot. This goal will be achieved by using asynchronousdeep deterministic policy gradient (ADDPG) throughan actor-critic algorithm (Badia A. Mirza M. Graves A. HarleyT. Lillicrap T. Silver D. Kavukcuoglu K. Mnih, 2016). Two deepneural-networks are introduced in this case. The first one (actornetwork)transfers the input vector that represents the states ofthe robot to linear and angular velocity commands that representthe actions. Whereas the second neural network (critic-network)is responsible for predicting the Q-value of the state and actionpairs. Deterministic policies will be utilized since it was proposedthat they outperform their stochastic counterparts in highdimensionalaction spaces (Lever G. Hess N. Degris T. WierstraD. Riedmiller M. Silver, 2014). On the other hand, SimultaneousLocalization and Mapping also known as SLAM techniques areto be integrated with reinforcement learning in an attempt to improvethe learning rate by providing more accurate estimation ofthe robots states. SLAM has been an active topic for many years(C. Cadena, 2016) because it provides two fundamental componentsto robotics: where I am, and what I see.Herein, we introduce how the navigation problem of non-holonomicmobile robots can be formulated as a reinforcement learning problemthat could be solved by using ADDPG actor-critic algorithm.Besides, we share the experimental results on how SLAM wouldhelp reinforcement learning at all by comparing the learning ratewithout SLAM (comparison when a partial map of the environmentis given and when no map is available). For implementationof the parallel running neural networks, we do our initial experimentsusing the GPU cluster of University of Twente. However,when it comes to real-life implementations and demonstrationson real robots, the hardware becomes the key component to determinewhether the goal can be achieved or not. Therefore, weimplement the trained networks on NVIDIA Jetson TX2 in orderto show real-life implementation possibilities when the algorithmsneed to run in a navigating robot (Jetson TX2 Module,online documentation, n.d.). In this sense, NVIDIA Jetson TX2gives opportunity to run even multiple complex neural networksreal-time on a credic card size processor unit which can easilybe carried by a small (less than 20cm length) robot. We show ourexperiments on sensor selection and the OpenAI package in ROSGazeboplatform as a possible simulation environment. Last butnot least, we discuss the real-life implementation scenarios usingour NVIDIA Jetson TX2 and how realistic the simulation resultscould be when the algorithms are implemented on the real robot.The early experiments indicate that, for an optimal design of aSLAM and reinforcement learning based system to truly work inreal-world applications in order to reach a goal in an unknownenvironment, the hardware setup and the algorithms must be collaborativelydesigned.",
author = "Beril Sirma{\cc}ek and Nicol{\`o} Botteghi and Mustafa Khaled and Stefano Stramigioli and Mannes Poel",
year = "2019",
month = "6",
language = "English",
note = "ISPRS Workshop Indoor 3D 2019 ; Conference date: 11-06-2019 Through 12-06-2019",
url = "http://indoor3d.net/2019/",

}

Sirmaçek, B, Botteghi, N, Khaled, M, Stramigioli, S & Poel, M 2019, 'Reinforcement Learning and SLAM based approach for mobile robot navigation in unknown environments' Paper presented at ISPRS Workshop Indoor 3D 2019, Enschede, Netherlands, 11/06/19 - 12/06/19, .

Reinforcement Learning and SLAM based approach for mobile robot navigation in unknown environments. / Sirmaçek, Beril ; Botteghi, Nicolò ; Khaled, Mustafa; Stramigioli, Stefano ; Poel, Mannes .

2019. Paper presented at ISPRS Workshop Indoor 3D 2019, Enschede, Netherlands.

Research output: Contribution to conferencePaperAcademicpeer-review

TY - CONF

T1 - Reinforcement Learning and SLAM based approach for mobile robot navigation in unknown environments

AU - Sirmaçek, Beril

AU - Botteghi, Nicolò

AU - Khaled, Mustafa

AU - Stramigioli, Stefano

AU - Poel, Mannes

PY - 2019/6

Y1 - 2019/6

N2 - Autonomous navigation of robots in unknown environments from their current position to a desired target without colliding with obstacles represents an important aspect in the field of mobile robots. In literature, traditional methods do exist in case a complete knowledge of the environment is available. However, this is not the case in real-life applications where a complete knowledge about stochastic environments can hardly be obtained where the positions of the obstacles are unknown. Our main goal is to navigate a skid-steering mobile robot (SSMR) with non-holonomic constraints in an unknown environment to its desired target in minimum time while avoiding colliding with the obstacles. In the context of autonomous cart navigation, autonomous navigation of mobile robots in unstructured environments can be formulated as a Reinforcement learning (RL) problem. Reinforcement learning is a paradigm in which the agent (robot) learns its optimal path by interacting with the environment and receiving a reward based on its actions. Based on this action-reward scenario, the optimal action for each state can be discovered by maximizing a predefined accumulated reward that reflects the quality of the trajectory taken by the robot. Reinforcement learning can be categorized into two main methods (Bagnell J. Peters J. Kober, 2013), namely value based methods (Q-learning, Deep Q-learning) and policy based methods (REINFORCE with Policy Gradient).(i) Value based methods: In these methods, the value function that maps each state-action pair into a value is learned. Thanksto these methods, the best action to take for each state, the actionwith the biggest value, can be found. This works well in case ofa finite set of actions.(ii) Policy based methods: In policy based methods, instead oflearning value functions that give an indication of the expectedtotal reward for each state-action pair, the policy function thatmaps the state into action is optimized directly (Brundage. MArulkumaran. K and A., 2017). Policy search methods are moreeffective in high dimensional action spaces, or when using continuousactions like the case of mobile robot navigation (NeumannG. Peters J. Deisenroth, 2013).However, since both of these methods have their own drawbacks,there is hybrid method called Actor-Critic that employs both valuefunctions and policy search (Brundage. M Arulkumaran. K andA., 2017). In this method, the actor adjust the policy parametersby a policy gradient ascent whereas the critic estimates the actionvalue function using a policy evaluation algorithm such astemporal difference (TD) learning. This can be done through twoneural networks running in parallel.Herein, we introduce a reinforcement learning based navigationsystem that enhance the navigation capabilities of a skid steeringmobile robot. This goal will be achieved by using asynchronousdeep deterministic policy gradient (ADDPG) throughan actor-critic algorithm (Badia A. Mirza M. Graves A. HarleyT. Lillicrap T. Silver D. Kavukcuoglu K. Mnih, 2016). Two deepneural-networks are introduced in this case. The first one (actornetwork)transfers the input vector that represents the states ofthe robot to linear and angular velocity commands that representthe actions. Whereas the second neural network (critic-network)is responsible for predicting the Q-value of the state and actionpairs. Deterministic policies will be utilized since it was proposedthat they outperform their stochastic counterparts in highdimensionalaction spaces (Lever G. Hess N. Degris T. WierstraD. Riedmiller M. Silver, 2014). On the other hand, SimultaneousLocalization and Mapping also known as SLAM techniques areto be integrated with reinforcement learning in an attempt to improvethe learning rate by providing more accurate estimation ofthe robots states. SLAM has been an active topic for many years(C. Cadena, 2016) because it provides two fundamental componentsto robotics: where I am, and what I see.Herein, we introduce how the navigation problem of non-holonomicmobile robots can be formulated as a reinforcement learning problemthat could be solved by using ADDPG actor-critic algorithm.Besides, we share the experimental results on how SLAM wouldhelp reinforcement learning at all by comparing the learning ratewithout SLAM (comparison when a partial map of the environmentis given and when no map is available). For implementationof the parallel running neural networks, we do our initial experimentsusing the GPU cluster of University of Twente. However,when it comes to real-life implementations and demonstrationson real robots, the hardware becomes the key component to determinewhether the goal can be achieved or not. Therefore, weimplement the trained networks on NVIDIA Jetson TX2 in orderto show real-life implementation possibilities when the algorithmsneed to run in a navigating robot (Jetson TX2 Module,online documentation, n.d.). In this sense, NVIDIA Jetson TX2gives opportunity to run even multiple complex neural networksreal-time on a credic card size processor unit which can easilybe carried by a small (less than 20cm length) robot. We show ourexperiments on sensor selection and the OpenAI package in ROSGazeboplatform as a possible simulation environment. Last butnot least, we discuss the real-life implementation scenarios usingour NVIDIA Jetson TX2 and how realistic the simulation resultscould be when the algorithms are implemented on the real robot.The early experiments indicate that, for an optimal design of aSLAM and reinforcement learning based system to truly work inreal-world applications in order to reach a goal in an unknownenvironment, the hardware setup and the algorithms must be collaborativelydesigned.

AB - Autonomous navigation of robots in unknown environments from their current position to a desired target without colliding with obstacles represents an important aspect in the field of mobile robots. In literature, traditional methods do exist in case a complete knowledge of the environment is available. However, this is not the case in real-life applications where a complete knowledge about stochastic environments can hardly be obtained where the positions of the obstacles are unknown. Our main goal is to navigate a skid-steering mobile robot (SSMR) with non-holonomic constraints in an unknown environment to its desired target in minimum time while avoiding colliding with the obstacles. In the context of autonomous cart navigation, autonomous navigation of mobile robots in unstructured environments can be formulated as a Reinforcement learning (RL) problem. Reinforcement learning is a paradigm in which the agent (robot) learns its optimal path by interacting with the environment and receiving a reward based on its actions. Based on this action-reward scenario, the optimal action for each state can be discovered by maximizing a predefined accumulated reward that reflects the quality of the trajectory taken by the robot. Reinforcement learning can be categorized into two main methods (Bagnell J. Peters J. Kober, 2013), namely value based methods (Q-learning, Deep Q-learning) and policy based methods (REINFORCE with Policy Gradient).(i) Value based methods: In these methods, the value function that maps each state-action pair into a value is learned. Thanksto these methods, the best action to take for each state, the actionwith the biggest value, can be found. This works well in case ofa finite set of actions.(ii) Policy based methods: In policy based methods, instead oflearning value functions that give an indication of the expectedtotal reward for each state-action pair, the policy function thatmaps the state into action is optimized directly (Brundage. MArulkumaran. K and A., 2017). Policy search methods are moreeffective in high dimensional action spaces, or when using continuousactions like the case of mobile robot navigation (NeumannG. Peters J. Deisenroth, 2013).However, since both of these methods have their own drawbacks,there is hybrid method called Actor-Critic that employs both valuefunctions and policy search (Brundage. M Arulkumaran. K andA., 2017). In this method, the actor adjust the policy parametersby a policy gradient ascent whereas the critic estimates the actionvalue function using a policy evaluation algorithm such astemporal difference (TD) learning. This can be done through twoneural networks running in parallel.Herein, we introduce a reinforcement learning based navigationsystem that enhance the navigation capabilities of a skid steeringmobile robot. This goal will be achieved by using asynchronousdeep deterministic policy gradient (ADDPG) throughan actor-critic algorithm (Badia A. Mirza M. Graves A. HarleyT. Lillicrap T. Silver D. Kavukcuoglu K. Mnih, 2016). Two deepneural-networks are introduced in this case. The first one (actornetwork)transfers the input vector that represents the states ofthe robot to linear and angular velocity commands that representthe actions. Whereas the second neural network (critic-network)is responsible for predicting the Q-value of the state and actionpairs. Deterministic policies will be utilized since it was proposedthat they outperform their stochastic counterparts in highdimensionalaction spaces (Lever G. Hess N. Degris T. WierstraD. Riedmiller M. Silver, 2014). On the other hand, SimultaneousLocalization and Mapping also known as SLAM techniques areto be integrated with reinforcement learning in an attempt to improvethe learning rate by providing more accurate estimation ofthe robots states. SLAM has been an active topic for many years(C. Cadena, 2016) because it provides two fundamental componentsto robotics: where I am, and what I see.Herein, we introduce how the navigation problem of non-holonomicmobile robots can be formulated as a reinforcement learning problemthat could be solved by using ADDPG actor-critic algorithm.Besides, we share the experimental results on how SLAM wouldhelp reinforcement learning at all by comparing the learning ratewithout SLAM (comparison when a partial map of the environmentis given and when no map is available). For implementationof the parallel running neural networks, we do our initial experimentsusing the GPU cluster of University of Twente. However,when it comes to real-life implementations and demonstrationson real robots, the hardware becomes the key component to determinewhether the goal can be achieved or not. Therefore, weimplement the trained networks on NVIDIA Jetson TX2 in orderto show real-life implementation possibilities when the algorithmsneed to run in a navigating robot (Jetson TX2 Module,online documentation, n.d.). In this sense, NVIDIA Jetson TX2gives opportunity to run even multiple complex neural networksreal-time on a credic card size processor unit which can easilybe carried by a small (less than 20cm length) robot. We show ourexperiments on sensor selection and the OpenAI package in ROSGazeboplatform as a possible simulation environment. Last butnot least, we discuss the real-life implementation scenarios usingour NVIDIA Jetson TX2 and how realistic the simulation resultscould be when the algorithms are implemented on the real robot.The early experiments indicate that, for an optimal design of aSLAM and reinforcement learning based system to truly work inreal-world applications in order to reach a goal in an unknownenvironment, the hardware setup and the algorithms must be collaborativelydesigned.

M3 - Paper

ER -