Towards designing an email classification system using multi-view based semi-supervised learning

Wenjuan Li, Weizhi Meng, Zhiyuan Tan, Yang Xiang

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

14 Citations (Scopus)
505 Downloads (Pure)

Abstract

The goal of email classification is to classify user emails into spam and legitimate ones. Many supervised learning algorithms have been invented in this domain to accomplish the task, and these algorithms require a large number of labeled training data. However, data labeling is a labor intensive task and requires in-depth domain knowledge. Thus, only a very small proportion of the data can be labeled in practice. This bottleneck greatly degrades the effectiveness of supervised email classification systems. In order to address this problem, in this work, we first identify some critical issues regarding supervised machine learning-based email classification. Then we propose an effective classification model based on multi-view disagreement-based semi-supervised learning. The motivation behind the attempt of using multi-view and semi-supervised learning is that multi-view can provide richer information for classification, which is often ignored by literature, and semisupervised learning supplies with the capability of coping with labeled and unlabeled data. In the evaluation, we demonstrate that the multi-view data can improve the email classification than using a single view data, and that the proposed model working with our algorithm can achieve better performance as compared to the existing similar algorithms.
Original languageEnglish
Title of host publication13th IEEE International Conference on Trust, Security and Privacy in Computing and Communications 2014
PublisherIEEE
Pages174-181
Number of pages8
ISBN (Electronic)978-1-4799-6513-7
ISBN (Print)978-1-4799-6514-4
DOIs
Publication statusPublished - 19 Jan 2015
Externally publishedYes
Event13th IEEE International Conference on Trust, Security and Privacy in Computing and Communications 2014 - Future Internet Technology (FIT) Building, Tsinghua University, Beijing, China
Duration: 24 Sept 201426 Sept 2014
Conference number: 13
http://www.greenorbs.org/TrustCom2014/

Conference

Conference13th IEEE International Conference on Trust, Security and Privacy in Computing and Communications 2014
Abbreviated titleTrustCom 2014
Country/TerritoryChina
CityBeijing
Period24/09/1426/09/14
Internet address

Keywords

  • SCS-Cybersecurity
  • EWI-25643
  • IR-93921
  • Machine Learning Applications
  • METIS-309859
  • Network Security
  • Email Classification
  • Semi-Supervised Learning
  • Multi-View

Fingerprint

Dive into the research topics of 'Towards designing an email classification system using multi-view based semi-supervised learning'. Together they form a unique fingerprint.

Cite this