Abstract
Original language | English |
---|---|
Title of host publication | Discovery Science |
Subtitle of host publication | 14th International Conference, DS 2011, Espoo, Finland, October 5-7, 2011. Proceedings |
Editors | Tapio Elomaa, Jaakko Hollmén, Heikki Mannila |
Place of Publication | Berlin, Heidelberg |
Publisher | Springer |
Pages | 292-306 |
ISBN (Electronic) | 978-3-642-24477-3 |
ISBN (Print) | 978-3-642-24476-6 |
DOIs | |
Publication status | Published - 1 Oct 2011 |
Event | 14th International Conference on Discovery Science, DS 2011 - Espoo, Finland Duration: 5 Oct 2011 → 7 Oct 2011 Conference number: 14 |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer |
Volume | 6926 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Name | Lecture Notes in Artificial Intelligence |
---|---|
Publisher | Springer |
Conference
Conference | 14th International Conference on Discovery Science, DS 2011 |
---|---|
Abbreviated title | DS |
Country | Finland |
City | Espoo |
Period | 5/10/11 → 7/10/11 |
Fingerprint
Keywords
- Text classification
- Visualization
- User Interface
- Word clouds
- Document labeling
- Document annotation
Cite this
}
Word Clouds for Efficient Document Labeling. / Seifert, Christin; Ulbrich, Eva; Granitzer, Michael.
Discovery Science: 14th International Conference, DS 2011, Espoo, Finland, October 5-7, 2011. Proceedings. ed. / Tapio Elomaa; Jaakko Hollmén; Heikki Mannila. Berlin, Heidelberg : Springer, 2011. p. 292-306 (Lecture Notes in Computer Science; Vol. 6926), (Lecture Notes in Artificial Intelligence).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Academic › peer-review
TY - GEN
T1 - Word Clouds for Efficient Document Labeling
AU - Seifert, Christin
AU - Ulbrich, Eva
AU - Granitzer, Michael
PY - 2011/10/1
Y1 - 2011/10/1
N2 - In text classification the amount and quality of training data is crucial for the performance of the classifier. The generation of training data is done by human labelers - a tedious and time-consuming work. We propose to use condensed representations of text documents instead of the full-text document to reduce the labeling time for single documents. These condensed representations are key sentences and key phrases and can be generated in a fully unsupervised way. The key phrases are presented in a layout similar to a tag cloud. In a user study with 37 participants we evaluated whether document labeling with these condensed representations can be done faster and equally accurate by the human labelers. Our evaluation shows that the users labeled word clouds twice as fast but as accurately as full-text documents. While further investigations for different classification tasks are necessary, this insight could potentially reduce costs for the labeling process of text documents.
AB - In text classification the amount and quality of training data is crucial for the performance of the classifier. The generation of training data is done by human labelers - a tedious and time-consuming work. We propose to use condensed representations of text documents instead of the full-text document to reduce the labeling time for single documents. These condensed representations are key sentences and key phrases and can be generated in a fully unsupervised way. The key phrases are presented in a layout similar to a tag cloud. In a user study with 37 participants we evaluated whether document labeling with these condensed representations can be done faster and equally accurate by the human labelers. Our evaluation shows that the users labeled word clouds twice as fast but as accurately as full-text documents. While further investigations for different classification tasks are necessary, this insight could potentially reduce costs for the labeling process of text documents.
KW - Text classification
KW - Visualization
KW - User Interface
KW - Word clouds
KW - Document labeling
KW - Document annotation
U2 - 10.1007/978-3-642-24477-3
DO - 10.1007/978-3-642-24477-3
M3 - Conference contribution
SN - 978-3-642-24476-6
T3 - Lecture Notes in Computer Science
SP - 292
EP - 306
BT - Discovery Science
A2 - Elomaa, Tapio
A2 - Hollmén, Jaakko
A2 - Mannila, Heikki
PB - Springer
CY - Berlin, Heidelberg
ER -