TY - JOUR
T1 - Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design
AU - Beguin, Anton
AU - Hanson, Bradley A.
PY - 2001
Y1 - 2001
N2 - Item response theory item parameters can be estimated using data from a common-item equating design either separately for each form or concurrently across forms. This paper reports the results of a simulation study of separate versus concurrent item parameter estimation. Using simulated data from a test with 60 dichotomous items, four factors were considered: (a) estimation program (MULTILOG versus BILOG-MG), (b) sample size per form (3,000 versus 1,000), (c) number of common items (20 versus 10), and (d) equivalent versus nonequivalent groups taking the two forms (no mean difference versus a mean difference of 1 SD). In addition, four methods of item parameter scaling were used in the separate estimation condition: two item characteristic curve methods (Stocking-Lord and Haebara) and two moment methods (Mean/Mean and Mean/Sigma). Concurrent estimation generally resulted in lower error than separate estimation, although not universally so. The results suggest that one factor accounting for the lower error when using concurrent estimation may be that the parameter estimates for the common item parameters are based on larger samples. It is argued that the results of this study, together with other research on this topic, are not sufficient to recommend completely avoiding separate estimation in favor of concurrent estimation.
AB - Item response theory item parameters can be estimated using data from a common-item equating design either separately for each form or concurrently across forms. This paper reports the results of a simulation study of separate versus concurrent item parameter estimation. Using simulated data from a test with 60 dichotomous items, four factors were considered: (a) estimation program (MULTILOG versus BILOG-MG), (b) sample size per form (3,000 versus 1,000), (c) number of common items (20 versus 10), and (d) equivalent versus nonequivalent groups taking the two forms (no mean difference versus a mean difference of 1 SD). In addition, four methods of item parameter scaling were used in the separate estimation condition: two item characteristic curve methods (Stocking-Lord and Haebara) and two moment methods (Mean/Mean and Mean/Sigma). Concurrent estimation generally resulted in lower error than separate estimation, although not universally so. The results suggest that one factor accounting for the lower error when using concurrent estimation may be that the parameter estimates for the common item parameters are based on larger samples. It is argued that the results of this study, together with other research on this topic, are not sufficient to recommend completely avoiding separate estimation in favor of concurrent estimation.
KW - IR-60183
KW - METIS-313768
U2 - 10.1177/0146621602026001001
DO - 10.1177/0146621602026001001
M3 - Article
SN - 0146-6216
VL - 26
SP - 3
EP - 24
JO - Applied psychological measurement
JF - Applied psychological measurement
IS - 1
ER -