Abstract
We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of various types that, while spawning other entities, generate a payoff. In comparison with BMCs, where the evolution of a each entity of the same type follows the same probabilistic pattern, BMDPs allow an external controller to pick from a range of options. This permits us to study the best/worst behaviour of the system. We generalise model-free reinforcement learning techniques to compute an optimal control strategy of an unknown BMDP in the limit. We present results of an implementation that demonstrate the practicality of the approach.
Original language | English |
---|---|
Title of host publication | Computer Aided Verification - 33rd International Conference, CAV 2021, Virtual Event, July 20-23, 2021, Proceedings, Part II |
Editors | Alexandra Silva, K. Rustan M. Leino |
Publisher | Springer |
Pages | 651-673 |
Number of pages | 23 |
DOIs | |
Publication status | Published - 15 Jul 2021 |
Event | 33rd International Conference on Computer Aided Verification, CAV 2021 - Virtual Event Duration: 20 Jul 2021 → 23 Jul 2021 Conference number: 33 |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer |
Volume | 12760 |
Conference
Conference | 33rd International Conference on Computer Aided Verification, CAV 2021 |
---|---|
Abbreviated title | CAV 2021 |
City | Virtual Event |
Period | 20/07/21 → 23/07/21 |