Flexible forms of testing, such as adaptive testing and testing on demand, usually require large item pools to avoid overexposure of the items. After an initial investment in the development of an item generator, test items can be (semi-)automatically generated in a negligible amount of time. A distinction can be made between item generation rules that have a fixed effect on item difficulty and rules that result in variations in surface features of the items only. The combination of these two types of rules results in clusters ("families") of items with similar psychometric properties. Between-family variation is then caused by the former type of rules and within-family variation by the latter. In this thesis, item response theory models are discussed that can be used to estimate the psychometric properties of item families, based on responses to samples of their items. A distinction is made between models that assume equal parameters for all items within a family, and models that allow item-specific deviations from their family means. Knowledge of the applied item generation rules is used to define covariates for the family difficulty parameters of the models. Bayesian parameter estimation and model fit assessment methods are discussed, as well as methods for designing new tests based on a Fisher information measure for item families. The methodology is illustrated using simulated data and real datasets on intelligence test items and statistical word problems.
|Award date||23 Mar 2012|
|Place of Publication||Enschede|
|Publication status||Published - 23 Mar 2012|