Abstract
We apply reinforcement learning to approximate the optimal probability that a stochastic hybrid system satisfies a temporal logic formula. We consider systems with (non)linear continuous dynamics, random events following general continuous probability distributions, and discrete nondeterministic choices. We present a discretized view of states to the learner, but simulate the continuous system. Once we have learned a near-optimal scheduler resolving the choices, we use statistical model checking to estimate its probability of satisfying the formula. We implemented the approach using Q-learning in the tools HYPEG and modes, which support Petri net- and hybrid automata-based models, respectively. Via two case studies, we show the feasibility of the approach, and compare its performance and effectiveness to existing analytical techniques for a linear model. We find that our new approach quickly finds near-optimal prophetic as well as non-prophetic schedulers, which maximize or minimize the probability that a specific signal temporal logic property is satisfied.
| Original language | English |
|---|---|
| Title of host publication | MEMOCODE '21 |
| Subtitle of host publication | Prtoceedings of the 19th ACM-IEEE International Conference on Formal Methods and Models for System Design, Virtual Event, China, November 20-22, 2021 |
| Editors | S. Arun-Kumar, Dominique Méry, Indranil Saha, Lijun Zhang |
| Publisher | ACM Press |
| Pages | 44-55 |
| Number of pages | 12 |
| ISBN (Print) | 978-1-4503-9127-6 |
| DOIs | |
| Publication status | Published - 20 Nov 2021 |
| Event | 19th ACM-IEEE International Conference on Formal Methods and Models for System Design, MEMOCODE 2021 - Virtual Event Duration: 20 Nov 2021 → 22 Nov 2021 Conference number: 19 |
Conference
| Conference | 19th ACM-IEEE International Conference on Formal Methods and Models for System Design, MEMOCODE 2021 |
|---|---|
| Abbreviated title | MEMOCODE 2021 |
| City | Virtual Event |
| Period | 20/11/21 → 22/11/21 |