An Intelligent System For Arabic Text Categorization

M.M. Syiam, Mohamed F. Tolba (Editor), Z.T. Fayed, Mohamed S. Abdel-Wahab (Editor), Said A. Ghoniemy (Editor), Mena Badieh Habib

Research output: Contribution to journalArticleAcademicpeer-review

404 Downloads (Pure)


Text Categorization (classification) is the process of classifying documents into a predefined set of categories based on their content. In this paper, an intelligent Arabic text categorization system is presented. Machine learning algorithms are used in this system. Many algorithms for stemming and feature selection are tried. Moreover, the document is represented using several term weighting schemes and finally the k-nearest neighbor and Rocchio classifiers are used for classification process. Experiments are performed over self collected data corpus and the results show that the suggested hybrid method of statistical and light stemmers is the most suitable stemming algorithm for Arabic language. The results also show that a hybrid approach of document frequency and information gain is the preferable feature selection criterion and normalized-tfidf is the best weighting scheme. Finally, Rocchio classifier has the advantage over k-nearest neighbor classifier in the classification process. The experimental results illustrate that the proposed model is an efficient method and gives generalization accuracy of about 98%.
Original languageUndefined
Pages (from-to)1-19
Number of pages19
JournalInternational journal of cooperative information systems
Issue number1
Publication statusPublished - Jan 2006


  • IR-75671
  • EWI-19190

Cite this