Automated Blog Classification: A Cross-Domain Approach

Elisabeth Lex, Christin Seifert, Michael Granitzer, Andreas Juffinger

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    1 Citation (Scopus)
    11 Downloads (Pure)

    Abstract

    The automated classification of blogs is highly important for the relatively new field of blog analysis. To classify blogs into topics or other categories, usually supervised text classification algorithms are applied. However, supervised text classifiers need a sufficient large amount of labeled data to learn a good model. Especially for blogs, data labeled with terms that capture current and actual topics are not available and data labeled in the past is usually not applicable due to topic drifts. Besides, tagged blogs collected from the web exhibit a vocabulary that is rather heterogeneous, diverse and not commonly agreed upon. In our work, we focus on news-related blogs dealing with current events. Our goal is to classify blog posts into given, common newspaper categories. As a baseline, we have high quality labeled data from a German news corpus. Our approach is to exploit the labeled data from the news corpus and use this knowledge to perform cross-domain classification on the unlabeled blogs. We need a solution with high performance, because both our corpora are dynamic and our classifier model needs to be up-to-date. In this work, we evaluated a number of text classification algorithms with different parameter settings by means of accuracy and complexity. Qualitative and quantitative analysis revealed that a recently proposed centroid-based algorithm, the Class-Feature-Centroid classifier (CFC), serves best for our setting because it achieves a comparable accuracy with state-of-the-art text classifiers and outperforms all other algorithms regarding complexity and memory consumption.
    Original languageEnglish
    Title of host publicationWWW/Internet 2009
    Subtitle of host publicationProceedings of the IADIS International Conference on WWW/Internet
    EditorsBebo White, Pedro Isaías, Miguel Baptista Nunes
    PublisherIADIS
    Pages598-605
    Number of pages8
    ISBN (Print)978-972-8924-93-5
    Publication statusPublished - 2009
    EventIADIS International Conference WWW/Internet 2009 - Rome, Italy
    Duration: 19 Nov 200922 Nov 2009

    Conference

    ConferenceIADIS International Conference WWW/Internet 2009
    CountryItaly
    CityRome
    Period19/11/0922/11/09

    Fingerprint Dive into the research topics of 'Automated Blog Classification: A Cross-Domain Approach'. Together they form a unique fingerprint.

  • Cite this

    Lex, E., Seifert, C., Granitzer, M., & Juffinger, A. (2009). Automated Blog Classification: A Cross-Domain Approach. In B. White, P. Isaías, & M. B. Nunes (Eds.), WWW/Internet 2009: Proceedings of the IADIS International Conference on WWW/Internet (pp. 598-605). IADIS.