Abstract
This paper investigates the use of model-free reinforcement learning to compute the optimal value in two-player stochastic games with parity objectives. In this setting, two decision makers, player Min and player Max, compete on a finite game arena - a stochastic game graph with unknown but fixed probability distributions - to minimize and maximize, respectively, the probability of satisfying a parity objective. We give a reduction from stochastic parity games to a family of stochastic reachability games with a parameter ε, such that the value of a stochastic parity game equals the limit of the values of the corresponding simple stochastic games as the parameter ε tends to 0. Since this reduction does not require the knowledge of the probabilistic transition structure of the underlying game arena, model-free reinforcement learning algorithms, such as minimax Q-learning, can be used to approximate the value and mutual best-response strategies for both players in the underlying stochastic parity game. We also present a streamlined reduction from 1 1/2-player parity games to reachability games that avoids recourse to nondeterminism. Finally, we report on the experimental evaluations of both reductions.
| Original language | English |
|---|---|
| Title of host publication | 31st International Conference on Concurrency Theory (CONCUR 2020) |
| Editors | Igor Konnov, Laura Kovacs |
| Place of Publication | Leibniz |
| Publisher | Dagstuhl |
| Number of pages | 16 |
| ISBN (Electronic) | 9783959771603 |
| ISBN (Print) | 978-3-95977-160-3 |
| DOIs | |
| Publication status | Published - 2020 |
| Event | 31st International Conference on Concurrency Theory, CONCUR 2020 - Online Duration: 1 Sept 2020 → 4 Sept 2020 Conference number: 31 |
Publication series
| Name | Leibniz International Proceedings in Informatics (LIPIcs) |
|---|---|
| Publisher | Schloss Dagstuhl - Leibniz-Zentrum für Informatik |
| Volume | 171 |
| ISSN (Print) | 1868-8969 |
Conference
| Conference | 31st International Conference on Concurrency Theory, CONCUR 2020 |
|---|---|
| Abbreviated title | CONCUR |
| Period | 1/09/20 → 4/09/20 |
Keywords
- Reinforcement learning
- Stochastic games
- Omega-regular objectives