Cross-Domain Classification: Trade-Off between Complexity and Accuracy

Elisabeth Lex, Christin Seifert, Michael Granitzer, Andreas Juffinger

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    Abstract

    Text classification is one of the core applications in data mining due to the huge amount of not categorized digital data available. Training a text classifier generates a model that reflects the characteristics of the domain. However, if no training data is available, labeled data from a related but different domain might be exploited to perform crossdomain classification. In our work, we aim to accurately classify unlabeled blogs into commonly agreed newspaper categories using labeled data from the news domain. The labeled news and the unlabeled blog corpus are highly dynamic and hourly growing with a topic drift, so a trade-off between accuracy and performance is required. Our approach is to apply a fast novel centroid-based algorithm, the Class-Feature-Centroid Classifier (CFC), to perform efficient cross-domain classification. Experiments showed that this algorithm achieves a comparable accuracy than k-NN and is slightly better than Support Vector Machines (SVM), yet at linear time cost for training and classification. The benefit of this approach is that the linear time complexity enables us to efficiently generate an accurate classifier, reflecting the topic drift, several times per day on a huge dataset.
    Original languageEnglish
    Title of host publication2009 International Conference for Internet Technology and Secured Transactions, (ICITST)
    Subtitle of host publicationLondon, UK, 9-12 November 2009
    Place of PublicationPiscataway, NJ
    PublisherIEEE
    Number of pages6
    ISBN (Electronic)978-1-4244-5648-2, 978-0-9546628-2-0
    ISBN (Print)978-1-4244-5647-5
    DOIs
    Publication statusPublished - 2009
    Event4th International Conference for Internet Technology and Secured Transactions, ICITST 2009 - London, United Kingdom
    Duration: 9 Nov 200912 Nov 2009
    Conference number: 4

    Conference

    Conference4th International Conference for Internet Technology and Secured Transactions, ICITST 2009
    Abbreviated titleICITST
    CountryUnited Kingdom
    CityLondon
    Period9/11/0912/11/09

    Fingerprint Dive into the research topics of 'Cross-Domain Classification: Trade-Off between Complexity and Accuracy'. Together they form a unique fingerprint.

    Cite this