A Hybrid Feature Selection Approach for Arabic Documents Classification

Mena Badieh Habib, Ahmed A. E. Sarhan (Editor), Abdel-Badeeh M. Salem (Editor), Zaki T. Fayed, Tarek F. Gharib

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. Text categorization algorithms usually represent documents as bags of words and consequently have to deal with huge number of features. Feature selection tries to find a set of relevant terms to improve both efficiency and generalization. There are two main approaches for feature selection, local and global. In Arabic text categorization it was found that using global feature selection gives higher results but may affect some documents in a way so that they do not show any terms in the set of selected features. On the other hand local feature selection is used to overcome this problem but gives lower classification rate. In this paper a hybrid approach of global and local feature selection technique is proposed and compared with both local and global feature selection techniques. Results are reported on a set of 1132 document of six different topics showing that the proposed hybrid feature selection overcome the disadvantages of both of feature selection approaches.
Original languageEnglish
Pages (from-to)1-7
Number of pages7
JournalEgyptian Computer Science Journal
Volume28
Issue number3
Publication statusPublished - Sep 2006

Keywords

  • Text Mining
  • EWI-19329
  • Feature Selection
  • Document Classification
  • IR-75677

Fingerprint Dive into the research topics of 'A Hybrid Feature Selection Approach for Arabic Documents Classification'. Together they form a unique fingerprint.

Cite this