期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2016
卷号:84
期号:3
出版社:Journal of Theoretical and Applied
摘要:Multi-label text classification has become progressively more important in recent years, where each document can be given multiple labels concurrently. Multi-label text classification is a main challenging task because of the large space of all potential label sets, which is exponential to the number of candidate labels. Among the disadvantages of the earlier multi-label classification methods is that they typically do not scale up with the number of specific labels and the number of training examples. A large amount of computational time for classification is required for a large amount of text documents with high dimensionality, especially, the Arabic language which has a very complex morphology and rich in nature. Furthermore, current researches have paid a little attention to the multi-label classification for Arabic text. Hence, this study aims to design and develop a new method for multi-label text classification for Arabic texts based on a binary relevance method. This binary relevance is made up from a different set of machine learning classifiers. The four multi-label classification approaches, namely: the set of SVM classifiers, the set of KNN classifiers, the set of NB classifiers and the set of the different type of classifiers were empirically evaluated in this research. Moreover, three feature selection methods (Odd ratio, Chi-square and Mutual information) were studied and their performances were investigated to enhance the performance of the Arabic multi-label text classification. The objective is to efficiently incorporate classification algorithms and feature selection to create a more accurate multi-label classification process. To evaluate the model, a manually standard interpreted data is used. The results show that the machine learning binary relevance classifiers which consists from a different set of machine learning classifiers attains the best result. It has achieved a good performance, with an overall F-measure of 86.8% for the multi-label classification of Arabic text. Besides, the results show an important effect from the used feature selection methods on the classification. Distinctly, the set of the different set of algorithms proves to be an efficient and suitable method for the Arabic multi-label text classification.
关键词:Arabic Text Classification; Multi-label Classification; Feature Selection; Statistical Methods