Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. Text categorization algorithms usually represent documents as bags of words and consequently have to deal with huge number of features. Feature selection tries to find a set of relevant terms to improve both efficiency and generalization. There are two main approaches for feature selection, local and global. In Arabic text categorization it was found that using global feature selection gives higher results but may affect some documents in a way so that they do not show any terms in the set of selected features. On the other hand local feature selection is used to overcome this problem but gives lower classification rate. In this paper a hybrid approach of global and local feature selection technique is proposed and compared with both local and global feature selection techniques. Results are reported on a set of 1132 document of six different topics showing that the proposed hybrid feature selection overcome the disadvantages of both of feature selection approaches.
|Number of pages||7|
|Journal||Egyptian Computer Science Journal|
|Publication status||Published - Sep 2006|
- Text Mining
- Feature Selection
- Document Classification