Skip to main navigation Skip to search Skip to main content

PyDSMC: Statistical Model Checking for Neural Agents Using the Gymnasium Interface

  • Timo P. Gros*
  • , Arnd Hartmanns
  • , Ivo Hoese
  • , Joshua Meyer
  • , Nicola J. Müller
  • , Verena Wolf
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

2 Downloads (Pure)

Abstract

Artificial intelligence (AI) has achieved remarkable success in sequential decision-making. However, evaluating its neural agents remains challenging, as current methods often rely on interpreting training curves only, overlooking key statistical factors. Existing tools that allow a formal evaluation also require white-box formal models, making them impractical for most AI benchmarks based on the black-box Gymnasium interface. We introduce PyDSMC, a lightweight and easy-to-use Python tool for statistical model checking of neural agents on arbitrary Gymnasium environments. PyDSMC automates the selection of statistical methods to compute confidence intervals, supporting both convergence-based and resource-limited evaluation settings. We empirically demonstrate the importance of rigorous agent evaluation and showcase PyDSMC ’s capabilities to more reliably judge and report an AI agent’s performance.

Original languageEnglish
Title of host publicationQuantitative Evaluation of Systems and Formal Modeling and Analysis of Timed Systems
Subtitle of host publicationSecond International Joint Conference, QEST+FORMATS 2025, Proceedings
EditorsPavithra Prabhakar, Andrea Vandin
Place of PublicationCham
PublisherSpringer
Pages134-156
Number of pages23
Edition1
ISBN (Electronic)978-3-032-05792-1
ISBN (Print)978-3-032-05791-4
DOIs
Publication statusPublished - 2026
Event2nd International Joint Conference on Quantitative Evaluation of Systems and Formal Modeling and Analysis of Timed Systems, QEST+FORMATS 2025 - Aarhus University, Aarhus, Denmark
Duration: 26 Aug 202528 Aug 2025
Conference number: 2
https://www.qest.org/qest-formats-2025/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume16143 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd International Joint Conference on Quantitative Evaluation of Systems and Formal Modeling and Analysis of Timed Systems, QEST+FORMATS 2025
Abbreviated titleQEST+FORMATS 2025
Country/TerritoryDenmark
CityAarhus
Period26/08/2528/08/25
OtherQEST - International Conference on Quantitative Evaluation of SysTems;
FORMATS - International Conference on Formal Modeling and Analysis of Timed Systems.
Internet address

Keywords

  • This work was part of the MISSION (Models in Space Systems: Integration, Operation, and Networking) project, funded by the European Union’s Horizon 2020 research and innovation programme under Marie Skłodowska-Curie Actions grant number 101008233.
  • 2026 OA procedure

Fingerprint

Dive into the research topics of 'PyDSMC: Statistical Model Checking for Neural Agents Using the Gymnasium Interface'. Together they form a unique fingerprint.

Cite this