Low Dimensional State Representation Learning with Reward-shaped Priors

Nicolò Botteghi, Ruben Obbink, Daan Geijs, Mannes Poel, Beril Sirmacek, Christoph Brune, Abeje Mersha, Stefano Stramigioli

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

16 Downloads (Pure)


Reinforcement Learning has been able to solve many complicated robotics tasks without any need for feature engineering in an end-to-end fashion. However, learning the optimal policy directly from the sensory inputs, i.e the observations, often requires processing and storage of a huge amount of data. In the context of robotics, the cost of data from real robotics hardware is usually very high, thus solutions that achieve high sample-efficiency are needed. We propose a method that aims at learning a mapping from the observations into a lower-dimensional state space. This mapping is learned with unsupervised learning using loss functions shaped to incorporate prior knowledge of the environment and the task. Using the samples from the state space, the optimal policy is quickly and efficiently learned. We test the method on several mobile robot navigation tasks in a simulation environment and also on a real robot. A video of our experiments can be found at: https://youtu.be/dgWxmfSv95U.

Original languageEnglish
Title of host publication2020 25th International Conference on Pattern Recognition (ICPR)
Number of pages8
ISBN (Electronic)978-1-7281-8808-9
Publication statusPublished - 5 May 2021
Event25th International Conference on Pattern Recognition, ICPR 2020 - Online conference, Virtual, Milan, Italy
Duration: 10 Jan 202115 Jan 2021
Conference number: 25


Conference25th International Conference on Pattern Recognition, ICPR 2020
Abbreviated titleICPR
CityVirtual, Milan
Internet address


  • Deep learning
  • Reinforcement learning
  • State representation learning
  • Robotics


Dive into the research topics of 'Low Dimensional State Representation Learning with Reward-shaped Priors'. Together they form a unique fingerprint.

Cite this