### Abstract

(i) Value based methods: In these methods, the value function that maps each state-action pair into a value is learned. Thanks

to these methods, the best action to take for each state, the action

with the biggest value, can be found. This works well in case of

a finite set of actions.

(ii) Policy based methods: In policy based methods, instead of

learning value functions that give an indication of the expected

total reward for each state-action pair, the policy function that

maps the state into action is optimized directly (Brundage. M

Arulkumaran. K and A., 2017). Policy search methods are more

effective in high dimensional action spaces, or when using continuous

actions like the case of mobile robot navigation (Neumann

G. Peters J. Deisenroth, 2013).

However, since both of these methods have their own drawbacks,

there is hybrid method called Actor-Critic that employs both value

functions and policy search (Brundage. M Arulkumaran. K and

A., 2017). In this method, the actor adjust the policy parameters

by a policy gradient ascent whereas the critic estimates the action

value function using a policy evaluation algorithm such as

temporal difference (TD) learning. This can be done through two

neural networks running in parallel.

Herein, we introduce a reinforcement learning based navigation

system that enhance the navigation capabilities of a skid steering

mobile robot. This goal will be achieved by using asynchronous

deep deterministic policy gradient (ADDPG) through

an actor-critic algorithm (Badia A. Mirza M. Graves A. Harley

T. Lillicrap T. Silver D. Kavukcuoglu K. Mnih, 2016). Two deep

neural-networks are introduced in this case. The first one (actornetwork)

transfers the input vector that represents the states of

the robot to linear and angular velocity commands that represent

the actions. Whereas the second neural network (critic-network)

is responsible for predicting the Q-value of the state and action

pairs. Deterministic policies will be utilized since it was proposed

that they outperform their stochastic counterparts in highdimensional

action spaces (Lever G. Hess N. Degris T. Wierstra

D. Riedmiller M. Silver, 2014). On the other hand, Simultaneous

Localization and Mapping also known as SLAM techniques are

to be integrated with reinforcement learning in an attempt to improve

the learning rate by providing more accurate estimation of

the robots states. SLAM has been an active topic for many years

(C. Cadena, 2016) because it provides two fundamental components

to robotics: where I am, and what I see.

Herein, we introduce how the navigation problem of non-holonomic

mobile robots can be formulated as a reinforcement learning problem

that could be solved by using ADDPG actor-critic algorithm.

Besides, we share the experimental results on how SLAM would

help reinforcement learning at all by comparing the learning rate

without SLAM (comparison when a partial map of the environment

is given and when no map is available). For implementation

of the parallel running neural networks, we do our initial experiments

using the GPU cluster of University of Twente. However,

when it comes to real-life implementations and demonstrations

on real robots, the hardware becomes the key component to determine

whether the goal can be achieved or not. Therefore, we

implement the trained networks on NVIDIA Jetson TX2 in order

to show real-life implementation possibilities when the algorithms

need to run in a navigating robot (Jetson TX2 Module,

online documentation, n.d.). In this sense, NVIDIA Jetson TX2

gives opportunity to run even multiple complex neural networks

real-time on a credic card size processor unit which can easily

be carried by a small (less than 20cm length) robot. We show our

experiments on sensor selection and the OpenAI package in ROSGazebo

platform as a possible simulation environment. Last but

not least, we discuss the real-life implementation scenarios using

our NVIDIA Jetson TX2 and how realistic the simulation results

could be when the algorithms are implemented on the real robot.

The early experiments indicate that, for an optimal design of a

SLAM and reinforcement learning based system to truly work in

real-world applications in order to reach a goal in an unknown

environment, the hardware setup and the algorithms must be collaboratively

designed.

Original language | English |
---|---|

Publication status | Published - Jun 2019 |

Event | ISPRS Workshop Indoor 3D 2019 - Waaier, Carré and Hal B buildings, Enschede, Netherlands Duration: 11 Jun 2019 → 12 Jun 2019 http://indoor3d.net/2019/ |

### Workshop

Workshop | ISPRS Workshop Indoor 3D 2019 |
---|---|

Country | Netherlands |

City | Enschede |

Period | 11/06/19 → 12/06/19 |

Internet address |

### Fingerprint

### Cite this

*Reinforcement Learning and SLAM based approach for mobile robot navigation in unknown environments*. Paper presented at ISPRS Workshop Indoor 3D 2019, Enschede, Netherlands.

}

**Reinforcement Learning and SLAM based approach for mobile robot navigation in unknown environments.** / Sirmaçek, Beril ; Botteghi, Nicolò ; Khaled, Mustafa; Stramigioli, Stefano ; Poel, Mannes .

Research output: Contribution to conference › Paper

TY - CONF

T1 - Reinforcement Learning and SLAM based approach for mobile robot navigation in unknown environments

AU - Sirmaçek, Beril

AU - Botteghi, Nicolò

AU - Khaled, Mustafa

AU - Stramigioli, Stefano

AU - Poel, Mannes

PY - 2019/6

Y1 - 2019/6

N2 - Autonomous navigation of robots in unknown environments from their current position to a desired target without colliding with obstacles represents an important aspect in the field of mobile robots. In literature, traditional methods do exist in case a complete knowledge of the environment is available. However, this is not the case in real-life applications where a complete knowledge about stochastic environments can hardly be obtained where the positions of the obstacles are unknown. Our main goal is to navigate a skid-steering mobile robot (SSMR) with non-holonomic constraints in an unknown environment to its desired target in minimum time while avoiding colliding with the obstacles. In the context of autonomous cart navigation, autonomous navigation of mobile robots in unstructured environments can be formulated as a Reinforcement learning (RL) problem. Reinforcement learning is a paradigm in which the agent (robot) learns its optimal path by interacting with the environment and receiving a reward based on its actions. Based on this action-reward scenario, the optimal action for each state can be discovered by maximizing a predefined accumulated reward that reflects the quality of the trajectory taken by the robot. Reinforcement learning can be categorized into two main methods (Bagnell J. Peters J. Kober, 2013), namely value based methods (Q-learning, Deep Q-learning) and policy based methods (REINFORCE with Policy Gradient).(i) Value based methods: In these methods, the value function that maps each state-action pair into a value is learned. Thanksto these methods, the best action to take for each state, the actionwith the biggest value, can be found. This works well in case ofa finite set of actions.(ii) Policy based methods: In policy based methods, instead oflearning value functions that give an indication of the expectedtotal reward for each state-action pair, the policy function thatmaps the state into action is optimized directly (Brundage. MArulkumaran. K and A., 2017). Policy search methods are moreeffective in high dimensional action spaces, or when using continuousactions like the case of mobile robot navigation (NeumannG. Peters J. Deisenroth, 2013).However, since both of these methods have their own drawbacks,there is hybrid method called Actor-Critic that employs both valuefunctions and policy search (Brundage. M Arulkumaran. K andA., 2017). In this method, the actor adjust the policy parametersby a policy gradient ascent whereas the critic estimates the actionvalue function using a policy evaluation algorithm such astemporal difference (TD) learning. This can be done through twoneural networks running in parallel.Herein, we introduce a reinforcement learning based navigationsystem that enhance the navigation capabilities of a skid steeringmobile robot. This goal will be achieved by using asynchronousdeep deterministic policy gradient (ADDPG) throughan actor-critic algorithm (Badia A. Mirza M. Graves A. HarleyT. Lillicrap T. Silver D. Kavukcuoglu K. Mnih, 2016). Two deepneural-networks are introduced in this case. The first one (actornetwork)transfers the input vector that represents the states ofthe robot to linear and angular velocity commands that representthe actions. Whereas the second neural network (critic-network)is responsible for predicting the Q-value of the state and actionpairs. Deterministic policies will be utilized since it was proposedthat they outperform their stochastic counterparts in highdimensionalaction spaces (Lever G. Hess N. Degris T. WierstraD. Riedmiller M. Silver, 2014). On the other hand, SimultaneousLocalization and Mapping also known as SLAM techniques areto be integrated with reinforcement learning in an attempt to improvethe learning rate by providing more accurate estimation ofthe robots states. SLAM has been an active topic for many years(C. Cadena, 2016) because it provides two fundamental componentsto robotics: where I am, and what I see.Herein, we introduce how the navigation problem of non-holonomicmobile robots can be formulated as a reinforcement learning problemthat could be solved by using ADDPG actor-critic algorithm.Besides, we share the experimental results on how SLAM wouldhelp reinforcement learning at all by comparing the learning ratewithout SLAM (comparison when a partial map of the environmentis given and when no map is available). For implementationof the parallel running neural networks, we do our initial experimentsusing the GPU cluster of University of Twente. However,when it comes to real-life implementations and demonstrationson real robots, the hardware becomes the key component to determinewhether the goal can be achieved or not. Therefore, weimplement the trained networks on NVIDIA Jetson TX2 in orderto show real-life implementation possibilities when the algorithmsneed to run in a navigating robot (Jetson TX2 Module,online documentation, n.d.). In this sense, NVIDIA Jetson TX2gives opportunity to run even multiple complex neural networksreal-time on a credic card size processor unit which can easilybe carried by a small (less than 20cm length) robot. We show ourexperiments on sensor selection and the OpenAI package in ROSGazeboplatform as a possible simulation environment. Last butnot least, we discuss the real-life implementation scenarios usingour NVIDIA Jetson TX2 and how realistic the simulation resultscould be when the algorithms are implemented on the real robot.The early experiments indicate that, for an optimal design of aSLAM and reinforcement learning based system to truly work inreal-world applications in order to reach a goal in an unknownenvironment, the hardware setup and the algorithms must be collaborativelydesigned.

AB - Autonomous navigation of robots in unknown environments from their current position to a desired target without colliding with obstacles represents an important aspect in the field of mobile robots. In literature, traditional methods do exist in case a complete knowledge of the environment is available. However, this is not the case in real-life applications where a complete knowledge about stochastic environments can hardly be obtained where the positions of the obstacles are unknown. Our main goal is to navigate a skid-steering mobile robot (SSMR) with non-holonomic constraints in an unknown environment to its desired target in minimum time while avoiding colliding with the obstacles. In the context of autonomous cart navigation, autonomous navigation of mobile robots in unstructured environments can be formulated as a Reinforcement learning (RL) problem. Reinforcement learning is a paradigm in which the agent (robot) learns its optimal path by interacting with the environment and receiving a reward based on its actions. Based on this action-reward scenario, the optimal action for each state can be discovered by maximizing a predefined accumulated reward that reflects the quality of the trajectory taken by the robot. Reinforcement learning can be categorized into two main methods (Bagnell J. Peters J. Kober, 2013), namely value based methods (Q-learning, Deep Q-learning) and policy based methods (REINFORCE with Policy Gradient).(i) Value based methods: In these methods, the value function that maps each state-action pair into a value is learned. Thanksto these methods, the best action to take for each state, the actionwith the biggest value, can be found. This works well in case ofa finite set of actions.(ii) Policy based methods: In policy based methods, instead oflearning value functions that give an indication of the expectedtotal reward for each state-action pair, the policy function thatmaps the state into action is optimized directly (Brundage. MArulkumaran. K and A., 2017). Policy search methods are moreeffective in high dimensional action spaces, or when using continuousactions like the case of mobile robot navigation (NeumannG. Peters J. Deisenroth, 2013).However, since both of these methods have their own drawbacks,there is hybrid method called Actor-Critic that employs both valuefunctions and policy search (Brundage. M Arulkumaran. K andA., 2017). In this method, the actor adjust the policy parametersby a policy gradient ascent whereas the critic estimates the actionvalue function using a policy evaluation algorithm such astemporal difference (TD) learning. This can be done through twoneural networks running in parallel.Herein, we introduce a reinforcement learning based navigationsystem that enhance the navigation capabilities of a skid steeringmobile robot. This goal will be achieved by using asynchronousdeep deterministic policy gradient (ADDPG) throughan actor-critic algorithm (Badia A. Mirza M. Graves A. HarleyT. Lillicrap T. Silver D. Kavukcuoglu K. Mnih, 2016). Two deepneural-networks are introduced in this case. The first one (actornetwork)transfers the input vector that represents the states ofthe robot to linear and angular velocity commands that representthe actions. Whereas the second neural network (critic-network)is responsible for predicting the Q-value of the state and actionpairs. Deterministic policies will be utilized since it was proposedthat they outperform their stochastic counterparts in highdimensionalaction spaces (Lever G. Hess N. Degris T. WierstraD. Riedmiller M. Silver, 2014). On the other hand, SimultaneousLocalization and Mapping also known as SLAM techniques areto be integrated with reinforcement learning in an attempt to improvethe learning rate by providing more accurate estimation ofthe robots states. SLAM has been an active topic for many years(C. Cadena, 2016) because it provides two fundamental componentsto robotics: where I am, and what I see.Herein, we introduce how the navigation problem of non-holonomicmobile robots can be formulated as a reinforcement learning problemthat could be solved by using ADDPG actor-critic algorithm.Besides, we share the experimental results on how SLAM wouldhelp reinforcement learning at all by comparing the learning ratewithout SLAM (comparison when a partial map of the environmentis given and when no map is available). For implementationof the parallel running neural networks, we do our initial experimentsusing the GPU cluster of University of Twente. However,when it comes to real-life implementations and demonstrationson real robots, the hardware becomes the key component to determinewhether the goal can be achieved or not. Therefore, weimplement the trained networks on NVIDIA Jetson TX2 in orderto show real-life implementation possibilities when the algorithmsneed to run in a navigating robot (Jetson TX2 Module,online documentation, n.d.). In this sense, NVIDIA Jetson TX2gives opportunity to run even multiple complex neural networksreal-time on a credic card size processor unit which can easilybe carried by a small (less than 20cm length) robot. We show ourexperiments on sensor selection and the OpenAI package in ROSGazeboplatform as a possible simulation environment. Last butnot least, we discuss the real-life implementation scenarios usingour NVIDIA Jetson TX2 and how realistic the simulation resultscould be when the algorithms are implemented on the real robot.The early experiments indicate that, for an optimal design of aSLAM and reinforcement learning based system to truly work inreal-world applications in order to reach a goal in an unknownenvironment, the hardware setup and the algorithms must be collaborativelydesigned.

M3 - Paper

ER -