Two new methods have been proposed to determine unexpected sum scores on sub-tests (testlets) both for paper-and-pencil tests and computer adaptive tests. A method based on a conservative bound using the hypergeometric distribution, denoted p, was compared with a method where the probability for each score combination was calculated using a highest density region (HDR). Furthermore, these methods were compared with the standardized log-likelihood statistic with and without a correction for the estimated latent trait value (denoted as l*z and lz, respectively). Data were simulated on the basis of the one-parameter logistic model, and both parametric and non-parametric logistic regression was used to obtain estimates of the latent trait. Results showed that it is important to take the trait level into account when comparing subtest scores. In a nonparametric item response theory (IRT) context, on adapted version of the HDR method was a powerful alterative to p. In a parametric IRT context, results showed that l*z had the highest power when the data were simulated conditionally on the estimated latent trait level.
|Number of pages||17|
|Journal||Journal of educational measurement|
|Publication status||Published - 2004|