How Different are Language Models and Word Clouds?

Rianne Kaptein, Djoerd Hiemstra, Jaap Kamps

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

17 Citations (Scopus)
262 Downloads (Pure)

Abstract

Word clouds are a summarised representation of a document’s text, similar to tag clouds which summarise the tags assigned to documents. Word clouds are similar to language models in the sense that they represent a document by its word distribution. In this paper we investigate the differences between word cloud and language modelling approaches, and specifically whether effective language modelling techniques also improve word clouds. We evaluate the quality of the language model using a system evaluation test bed, and evaluate the quality of the resulting word cloud with a user study. Our experiments show that different language modelling techniques can be applied to improve a standard word cloud that uses a TF weighting scheme in combination with stopword removal. Including bigrams in the word clouds and a parsimonious term weighting scheme are the most effective in both the system evaluation and the user study.
Original languageUndefined
Title of host publication32nd European Conference on Information Retrieval (ECIR 2010)
Place of PublicationBerlin
PublisherSpringer
Pages556-568
Number of pages13
ISBN (Print)978-3-642-12274-3
DOIs
Publication statusPublished - Mar 2010
Event32nd European Conference on Information Retrieval, ECIR 2010: (IR Resarch) - Milton Keynes, United Kingdom
Duration: 28 Mar 201031 Mar 2010
Conference number: 32

Publication series

NameLecture Notes in Computer Science
PublisherSpringer Verlag
Volume5993
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference32nd European Conference on Information Retrieval, ECIR 2010
Abbreviated titleECIR
Country/TerritoryUnited Kingdom
CityMilton Keynes
Period28/03/1031/03/10

Keywords

  • METIS-271094
  • EWI-18654
  • IR-74078

Cite this