Classifying unstructed textual data using the Product Score Model: an alternative text mining algorithm

Research output: Chapter in Book/Report/Conference proceedingChapterAcademic

179 Downloads (Pure)

Abstract

Unstructured textual data such as students’ essays and life narratives can provide helpful information in educational and psychological measurement, but often contain irregularities and ambiguities, which creates difficulties in analysis. Text mining techniques that seek to extract useful information from textual data sources through identifying interesting patterns are promising. This chapter describes the general procedures of text classification using text mining and presents an alternative machine learning algorithm for text classification, named the product score model (PSM). Using the bag-of-words representation (single words), we conducted a comparative study between PSM and two commonly used classification models, decision tree and naïve Bayes. An application of these three models is illustrated for real textual data. The results showed the PSM performed the most efficiently and stably in classifying text. Implications of these results for the PSM are further discussed and recommendations about its use are given
Original languageEnglish
Title of host publicationPsychometrics in Practice at RCEC
EditorsT.J.H.M. Eggen, B.P. Veldkamp
Place of PublicationEnschede
PublisherRCEC
Pages47-62
ISBN (Print)9789036533744
DOIs
Publication statusPublished - 2012

Fingerprint

Dive into the research topics of 'Classifying unstructed textual data using the Product Score Model: an alternative text mining algorithm'. Together they form a unique fingerprint.

Cite this