TY - CHAP
T1 - Classifying unstructed textual data using the Product Score Model
T2 - an alternative text mining algorithm
AU - He, Qiwei
AU - Veldkamp, Bernard P.
PY - 2012
Y1 - 2012
N2 - Unstructured textual data such as students’ essays and life narratives can provide helpful information in educational and psychological measurement, but often contain irregularities and ambiguities, which creates difficulties in analysis. Text mining techniques that seek to extract useful information from textual data sources through identifying interesting patterns are promising. This chapter describes the general procedures of text classification using text mining and presents an alternative machine learning algorithm for text classification, named the product score model (PSM). Using the bag-of-words representation (single words), we conducted a comparative study between PSM and two commonly used classification models, decision tree and naïve Bayes. An application of these three models is illustrated for real textual data. The results showed the PSM performed the most efficiently and stably in classifying text. Implications of these results for the PSM are further discussed and recommendations about its use are given
AB - Unstructured textual data such as students’ essays and life narratives can provide helpful information in educational and psychological measurement, but often contain irregularities and ambiguities, which creates difficulties in analysis. Text mining techniques that seek to extract useful information from textual data sources through identifying interesting patterns are promising. This chapter describes the general procedures of text classification using text mining and presents an alternative machine learning algorithm for text classification, named the product score model (PSM). Using the bag-of-words representation (single words), we conducted a comparative study between PSM and two commonly used classification models, decision tree and naïve Bayes. An application of these three models is illustrated for real textual data. The results showed the PSM performed the most efficiently and stably in classifying text. Implications of these results for the PSM are further discussed and recommendations about its use are given
U2 - 10.3990/3.9789036533744.ch5
DO - 10.3990/3.9789036533744.ch5
M3 - Chapter
SN - 9789036533744
SP - 47
EP - 62
BT - Psychometrics in Practice at RCEC
A2 - Eggen, T.J.H.M.
A2 - Veldkamp, B.P.
PB - RCEC
CY - Enschede
ER -