TOWARDS CONTINUOUS CONTROL FOR MOBILE ROBOT NAVIGATION: A REINFORCEMENT LEARNING AND SLAM BASED APPROACH

Mustafa Khaled, Beril Sirmaçek, Stefano Stramigioli, Mannes Poel

    Research output: Contribution to conferencePaperpeer-review

    Abstract

    Autonomous navigation of robots in unknown environments from their current position to a desired target without colliding with obstacles represents an important aspect in the field of mobile robots. In literature, traditional methods do exist in case a complete knowledge of the environment is available. However, this is not the case in real-life applications where a complete knowledge about stochastic environments can hardly be obtained where the positions of the obstacles are unknown. Our main goal is to navigate a skid-steering mobile robot (SSMR) with non-holonomic constraints in an unknown environment to its desired target in minimum time while avoiding colliding with the obstacles. In the context of autonomous cart navigation, autonomous navigation of mobile robots in unstructured environments can be formulated as a Reinforcement learning (RL) problem. Reinforcement learning is a paradigm in which the agent (robot) learns its optimal path by interacting with the environment and receiving a reward based on its actions. Based on this action-reward scenario, the optimal action for each state can be discovered by maximizing a predefined accumulated reward that reflects the quality of the trajectory taken by the robot. Reinforcement learning can be categorized into two main methods (Bagnell J. Peters J. Kober, 2013), namely value based methods (Q-learning, Deep Q-learning) and policy based methods (REINFORCE with Policy Gradient).
    (i) Value based methods: In these methods, the value function that maps each state-action pair into a value is learned. Thanks
    to these methods, the best action to take for each state, the action
    with the biggest value, can be found. This works well in case of
    a finite set of actions.
    (ii) Policy based methods: In policy based methods, instead of
    learning value functions that give an indication of the expected
    total reward for each state-action pair, the policy function that
    maps the state into action is optimized directly (Brundage. M
    Arulkumaran. K and A., 2017). Policy search methods are more
    effective in high dimensional action spaces, or when using continuous
    actions like the case of mobile robot navigation (Neumann
    G. Peters J. Deisenroth, 2013).
    However, since both of these methods have their own drawbacks,
    there is hybrid method called Actor-Critic that employs both value
    functions and policy search (Brundage. M Arulkumaran. K and
    A., 2017). In this method, the actor adjust the policy parameters
    by a policy gradient ascent whereas the critic estimates the action
    value function using a policy evaluation algorithm such as
    temporal difference (TD) learning. This can be done through two
    neural networks running in parallel.
    Herein, we introduce a reinforcement learning based navigation
    system that enhance the navigation capabilities of a skid steering
    mobile robot. This goal will be achieved by using asynchronous
    deep deterministic policy gradient (ADDPG) through
    an actor-critic algorithm (Badia A. Mirza M. Graves A. Harley
    T. Lillicrap T. Silver D. Kavukcuoglu K. Mnih, 2016). Two deep
    neural-networks are introduced in this case. The first one (actornetwork)
    transfers the input vector that represents the states of
    the robot to linear and angular velocity commands that represent
    the actions. Whereas the second neural network (critic-network)
    is responsible for predicting the Q-value of the state and action
    pairs. Deterministic policies will be utilized since it was proposed
    that they outperform their stochastic counterparts in highdimensional
    action spaces (Lever G. Hess N. Degris T. Wierstra
    D. Riedmiller M. Silver, 2014). On the other hand, Simultaneous
    Localization and Mapping also known as SLAM techniques are
    to be integrated with reinforcement learning in an attempt to improve
    the learning rate by providing more accurate estimation of
    the robots states. SLAM has been an active topic for many years
    (C. Cadena, 2016) because it provides two fundamental components
    to robotics: where I am, and what I see.
    Herein, we introduce how the navigation problem of non-holonomic
    mobile robots can be formulated as a reinforcement learning problem
    that could be solved by using ADDPG actor-critic algorithm.
    Besides, we share the experimental results on how SLAM would
    help reinforcement learning at all by comparing the learning rate
    without SLAM (comparison when a partial map of the environment
    is given and when no map is available). For implementation
    of the parallel running neural networks, we do our initial experiments
    using the GPU cluster of University of Twente. However,
    when it comes to real-life implementations and demonstrations
    on real robots, the hardware becomes the key component to determine
    whether the goal can be achieved or not. Therefore, we
    implement the trained networks on NVIDIA Jetson TX2 in order
    to show real-life implementation possibilities when the algorithms
    need to run in a navigating robot (Jetson TX2 Module,
    online documentation, n.d.). In this sense, NVIDIA Jetson TX2
    gives opportunity to run even multiple complex neural networks
    real-time on a credic card size processor unit which can easily
    be carried by a small (less than 20cm length) robot. We show our
    experiments on sensor selection and the OpenAI package in ROSGazebo
    platform as a possible simulation environment. Last but
    not least, we discuss the real-life implementation scenarios using
    our NVIDIA Jetson TX2 and how realistic the simulation results
    could be when the algorithms are implemented on the real robot.
    The early experiments indicate that, for an optimal design of a
    SLAM and reinforcement learning based system to truly work in
    real-world applications in order to reach a goal in an unknown
    environment, the hardware setup and the algorithms must be collaboratively
    designed.
    Original languageEnglish
    Publication statusPublished - Jun 2019
    EventISPRS Workshop Indoor 3D 2019 - Waaier, Carré and Hal B buildings, Enschede, Netherlands
    Duration: 11 Jun 201912 Jun 2019
    http://indoor3d.net/2019/

    Workshop

    WorkshopISPRS Workshop Indoor 3D 2019
    Country/TerritoryNetherlands
    CityEnschede
    Period11/06/1912/06/19
    Internet address

    Fingerprint

    Dive into the research topics of 'TOWARDS CONTINUOUS CONTROL FOR MOBILE ROBOT NAVIGATION: A REINFORCEMENT LEARNING AND SLAM BASED APPROACH'. Together they form a unique fingerprint.

    Cite this