Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives

Ernst Moritz Hahn, Mateo Perez*, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, Dominik Wojtczak

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Omega-regular properties—specified using linear time temporal logic or various forms of omega-automata—find increasing use in specifying the objectives of reinforcement learning (RL). The key problem that arises is that of faithful and effective translation of the objective into a scalar reward for model-free RL. A recent approach exploits Büchi automata with restricted nondeterminism to reduce the search for an optimal policy for an -regular property to that for a simple reachability objective. A possible drawback of this translation is that reachability rewards are sparse, being reaped only at the end of each episode. Another approach reduces the search for an optimal policy to an optimization problem with two interdependent discount parameters. While this approach provides denser rewards than the reduction to reachability, it is not easily mapped to off-the-shelf RL algorithms. We propose a reward scheme that reduces the search for an optimal policy to an optimization problem with a single discount parameter that produces dense rewards and is compatible with off-the-shelf RL algorithms. Finally, we report an experimental comparison of these and other reward schemes for model-free RL with omega-regular objectives.

Original languageEnglish
Title of host publicationAutomated Technology for Verification and Analysis
Subtitle of host publication18th International Symposium, ATVA 2020, Hanoi, Vietnam, October 19–23, 2020, Proceedings
EditorsDang Van Hung, Oleg Sokolsky
Place of PublicationCham
PublisherSpringer
Pages108-124
Number of pages17
ISBN (Electronic)978-3-030-59152-6
ISBN (Print)978-3-030-59151-9
DOIs
Publication statusPublished - 2020
Event18th International Symposium on Automated Technology for Verification and Analysis, ATVA 2020 - Online Event, Hanoi, Viet Nam
Duration: 19 Oct 202023 Oct 2020
Conference number: 18

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume12302
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th International Symposium on Automated Technology for Verification and Analysis, ATVA 2020
Abbreviated titleATVA 2020
CountryViet Nam
CityHanoi
Period19/10/2023/10/20

Fingerprint

Dive into the research topics of 'Faithful and Effective Reward Schemes for Model-Free Reinforcement Learning of Omega-Regular Objectives'. Together they form a unique fingerprint.

Cite this