Optimistic Value Iteration

Arnd Hartmanns*, Benjamin Lucien Kaminski

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

35 Citations (Scopus)
96 Downloads (Pure)

Abstract

Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides lower bounds on infinite-horizon probabilities and rewards. Two “sound” variations, which also deliver an upper bound, have recently appeared. In this paper, we present a new sound approach that leverages value iteration’s ability to usually deliver good lower bounds: we obtain a lower bound via standard value iteration, use the result to “guess” an upper bound, and prove the latter’s correctness. We present this optimistic value iteration approach for computing reachability probabilities as well as expected rewards. It is easy to implement and performs well, as we show via an extensive experimental evaluation using our implementation within the mcsta model checker of the Modest Toolset.
Original languageEnglish
Title of host publicationComputer Aided Verification
Subtitle of host publication32nd International Conference, CAV 2020, Los Angeles, CA, USA, July 21–24, 2020, Proceedings, Part II
EditorsShuvendu K. Lahiri, Chao Wang
PublisherSpringer
Pages488-511
Number of pages24
ISBN (Electronic)978-3-030-53291-8
ISBN (Print)978-3-030-53290-1
DOIs
Publication statusPublished - 14 Jul 2020
Event32nd International Conference on Computer-Aided Verification, CAV 2020 - Virtual Conference, Los Angeles, United States
Duration: 19 Jul 202024 Jul 2020
Conference number: 32
http://i-cav.org/2020/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume12225

Conference

Conference32nd International Conference on Computer-Aided Verification, CAV 2020
Abbreviated titleCAV 2020
Country/TerritoryUnited States
CityLos Angeles
Period19/07/2024/07/20
Internet address

Fingerprint

Dive into the research topics of 'Optimistic Value Iteration'. Together they form a unique fingerprint.

Cite this