Word Clouds for Efficient Document Labeling

Christin Seifert, Eva Ulbrich, Michael Granitzer

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    3 Citations (Scopus)

    Abstract

    In text classification the amount and quality of training data is crucial for the performance of the classifier. The generation of training data is done by human labelers - a tedious and time-consuming work. We propose to use condensed representations of text documents instead of the full-text document to reduce the labeling time for single documents. These condensed representations are key sentences and key phrases and can be generated in a fully unsupervised way. The key phrases are presented in a layout similar to a tag cloud. In a user study with 37 participants we evaluated whether document labeling with these condensed representations can be done faster and equally accurate by the human labelers. Our evaluation shows that the users labeled word clouds twice as fast but as accurately as full-text documents. While further investigations for different classification tasks are necessary, this insight could potentially reduce costs for the labeling process of text documents.
    Original languageEnglish
    Title of host publicationDiscovery Science
    Subtitle of host publication14th International Conference, DS 2011, Espoo, Finland, October 5-7, 2011. Proceedings
    EditorsTapio Elomaa, Jaakko Hollmén, Heikki Mannila
    Place of PublicationBerlin, Heidelberg
    PublisherSpringer
    Pages292-306
    ISBN (Electronic)978-3-642-24477-3
    ISBN (Print)978-3-642-24476-6
    DOIs
    Publication statusPublished - 1 Oct 2011
    Event14th International Conference on Discovery Science, DS 2011 - Espoo, Finland
    Duration: 5 Oct 20117 Oct 2011
    Conference number: 14

    Publication series

    NameLecture Notes in Computer Science
    PublisherSpringer
    Volume6926
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349
    NameLecture Notes in Artificial Intelligence
    PublisherSpringer

    Conference

    Conference14th International Conference on Discovery Science, DS 2011
    Abbreviated titleDS
    CountryFinland
    CityEspoo
    Period5/10/117/10/11

    Fingerprint

    Labeling
    Classifiers
    Costs

    Keywords

    • Text classification
    • Visualization
    • User Interface
    • Word clouds
    • Document labeling
    • Document annotation

    Cite this

    Seifert, C., Ulbrich, E., & Granitzer, M. (2011). Word Clouds for Efficient Document Labeling. In T. Elomaa, J. Hollmén, & H. Mannila (Eds.), Discovery Science: 14th International Conference, DS 2011, Espoo, Finland, October 5-7, 2011. Proceedings (pp. 292-306). (Lecture Notes in Computer Science; Vol. 6926), (Lecture Notes in Artificial Intelligence). Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-24477-3
    Seifert, Christin ; Ulbrich, Eva ; Granitzer, Michael. / Word Clouds for Efficient Document Labeling. Discovery Science: 14th International Conference, DS 2011, Espoo, Finland, October 5-7, 2011. Proceedings. editor / Tapio Elomaa ; Jaakko Hollmén ; Heikki Mannila. Berlin, Heidelberg : Springer, 2011. pp. 292-306 (Lecture Notes in Computer Science). (Lecture Notes in Artificial Intelligence).
    @inproceedings{67feff638b564f5cb8730adf469b8d73,
    title = "Word Clouds for Efficient Document Labeling",
    abstract = "In text classification the amount and quality of training data is crucial for the performance of the classifier. The generation of training data is done by human labelers - a tedious and time-consuming work. We propose to use condensed representations of text documents instead of the full-text document to reduce the labeling time for single documents. These condensed representations are key sentences and key phrases and can be generated in a fully unsupervised way. The key phrases are presented in a layout similar to a tag cloud. In a user study with 37 participants we evaluated whether document labeling with these condensed representations can be done faster and equally accurate by the human labelers. Our evaluation shows that the users labeled word clouds twice as fast but as accurately as full-text documents. While further investigations for different classification tasks are necessary, this insight could potentially reduce costs for the labeling process of text documents.",
    keywords = "Text classification, Visualization, User Interface, Word clouds, Document labeling, Document annotation",
    author = "Christin Seifert and Eva Ulbrich and Michael Granitzer",
    year = "2011",
    month = "10",
    day = "1",
    doi = "10.1007/978-3-642-24477-3",
    language = "English",
    isbn = "978-3-642-24476-6",
    series = "Lecture Notes in Computer Science",
    publisher = "Springer",
    pages = "292--306",
    editor = "Tapio Elomaa and Jaakko Hollm{\'e}n and Heikki Mannila",
    booktitle = "Discovery Science",

    }

    Seifert, C, Ulbrich, E & Granitzer, M 2011, Word Clouds for Efficient Document Labeling. in T Elomaa, J Hollmén & H Mannila (eds), Discovery Science: 14th International Conference, DS 2011, Espoo, Finland, October 5-7, 2011. Proceedings. Lecture Notes in Computer Science, vol. 6926, Lecture Notes in Artificial Intelligence, Springer, Berlin, Heidelberg, pp. 292-306, 14th International Conference on Discovery Science, DS 2011, Espoo, Finland, 5/10/11. https://doi.org/10.1007/978-3-642-24477-3

    Word Clouds for Efficient Document Labeling. / Seifert, Christin; Ulbrich, Eva; Granitzer, Michael.

    Discovery Science: 14th International Conference, DS 2011, Espoo, Finland, October 5-7, 2011. Proceedings. ed. / Tapio Elomaa; Jaakko Hollmén; Heikki Mannila. Berlin, Heidelberg : Springer, 2011. p. 292-306 (Lecture Notes in Computer Science; Vol. 6926), (Lecture Notes in Artificial Intelligence).

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    TY - GEN

    T1 - Word Clouds for Efficient Document Labeling

    AU - Seifert, Christin

    AU - Ulbrich, Eva

    AU - Granitzer, Michael

    PY - 2011/10/1

    Y1 - 2011/10/1

    N2 - In text classification the amount and quality of training data is crucial for the performance of the classifier. The generation of training data is done by human labelers - a tedious and time-consuming work. We propose to use condensed representations of text documents instead of the full-text document to reduce the labeling time for single documents. These condensed representations are key sentences and key phrases and can be generated in a fully unsupervised way. The key phrases are presented in a layout similar to a tag cloud. In a user study with 37 participants we evaluated whether document labeling with these condensed representations can be done faster and equally accurate by the human labelers. Our evaluation shows that the users labeled word clouds twice as fast but as accurately as full-text documents. While further investigations for different classification tasks are necessary, this insight could potentially reduce costs for the labeling process of text documents.

    AB - In text classification the amount and quality of training data is crucial for the performance of the classifier. The generation of training data is done by human labelers - a tedious and time-consuming work. We propose to use condensed representations of text documents instead of the full-text document to reduce the labeling time for single documents. These condensed representations are key sentences and key phrases and can be generated in a fully unsupervised way. The key phrases are presented in a layout similar to a tag cloud. In a user study with 37 participants we evaluated whether document labeling with these condensed representations can be done faster and equally accurate by the human labelers. Our evaluation shows that the users labeled word clouds twice as fast but as accurately as full-text documents. While further investigations for different classification tasks are necessary, this insight could potentially reduce costs for the labeling process of text documents.

    KW - Text classification

    KW - Visualization

    KW - User Interface

    KW - Word clouds

    KW - Document labeling

    KW - Document annotation

    U2 - 10.1007/978-3-642-24477-3

    DO - 10.1007/978-3-642-24477-3

    M3 - Conference contribution

    SN - 978-3-642-24476-6

    T3 - Lecture Notes in Computer Science

    SP - 292

    EP - 306

    BT - Discovery Science

    A2 - Elomaa, Tapio

    A2 - Hollmén, Jaakko

    A2 - Mannila, Heikki

    PB - Springer

    CY - Berlin, Heidelberg

    ER -

    Seifert C, Ulbrich E, Granitzer M. Word Clouds for Efficient Document Labeling. In Elomaa T, Hollmén J, Mannila H, editors, Discovery Science: 14th International Conference, DS 2011, Espoo, Finland, October 5-7, 2011. Proceedings. Berlin, Heidelberg: Springer. 2011. p. 292-306. (Lecture Notes in Computer Science). (Lecture Notes in Artificial Intelligence). https://doi.org/10.1007/978-3-642-24477-3