TY - JOUR

T1 - On hypothesis testing for statistical model checking

AU - Reijsbergen, D.P.

AU - de Boer, Pieter-Tjerk

AU - Scheinhardt, Willem R.W.

AU - Haverkort, Boudewijn R.H.M.

N1 - eemcs-eprint-26168

PY - 2015/8

Y1 - 2015/8

N2 - Hypothesis testing is an important part of statistical model checking (SMC). It is typically used to verify statements of the form $p>p_0$ or $p<p_0$, where $p$ is an unknown probability intrinsic to the system model and $p_0$ is a given threshold value. Many techniques for this have been introduced in the SMC literature. We give a comprehensive overview and comparison of these techniques, starting by introducing a framework in which they can all be described. We distinguish between three classes of techniques, differing in what type of output correctness guarantees they give when the true $p$ is very close to the threshold $p_0$. For each technique, we show how to parametrise it in terms of quantities that are meaningful to the user. Having parametrised them consistently, we graphically compare the boundaries of their decision thresholds, and numerically compare the correctness, power and efficiency of the tests. A companion website allows users to get more insight in the properties of the tests by interactively manipulating the parameters.

AB - Hypothesis testing is an important part of statistical model checking (SMC). It is typically used to verify statements of the form $p>p_0$ or $p<p_0$, where $p$ is an unknown probability intrinsic to the system model and $p_0$ is a given threshold value. Many techniques for this have been introduced in the SMC literature. We give a comprehensive overview and comparison of these techniques, starting by introducing a framework in which they can all be described. We distinguish between three classes of techniques, differing in what type of output correctness guarantees they give when the true $p$ is very close to the threshold $p_0$. For each technique, we show how to parametrise it in terms of quantities that are meaningful to the user. Having parametrised them consistently, we graphically compare the boundaries of their decision thresholds, and numerically compare the correctness, power and efficiency of the tests. A companion website allows users to get more insight in the properties of the tests by interactively manipulating the parameters.

KW - EWI-26168

KW - IR-96736

KW - METIS-312683

U2 - 10.1007/s10009-014-0350-1

DO - 10.1007/s10009-014-0350-1

M3 - Article

VL - 17

SP - 377

EP - 395

JO - International journal on software tools for technology transfer

JF - International journal on software tools for technology transfer

SN - 1433-2779

IS - 4

ER -