Model-Free Reinforcement Learning for Branching Markov Decision Processes

Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, Dominik Wojtczak

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

3 Citations (Scopus)
73 Downloads (Pure)

Abstract

We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach.
Original languageEnglish
Title of host publicationComputer Aided Verification - 33rd International Conference, CAV 2021, Virtual Event, July 20-23, 2021, Proceedings, Part II
EditorsAlexandra Silva, K. Rustan M. Leino
PublisherSpringer
Pages651-673
Number of pages23
DOIs
Publication statusPublished - 15 Jul 2021
Event33rd International Conference on Computer Aided Verification, CAV 2021 - Virtual Event
Duration: 20 Jul 202123 Jul 2021
Conference number: 33

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume12760

Conference

Conference33rd International Conference on Computer Aided Verification, CAV 2021
Abbreviated titleCAV 2021
CityVirtual Event
Period20/07/2123/07/21

Fingerprint

Dive into the research topics of 'Model-Free Reinforcement Learning for Branching Markov Decision Processes'. Together they form a unique fingerprint.

Cite this