摘要:Nowadays, many applications that use large data have been developed due to the existence of the Internet of Things. These applications are translated into different languages and require automated text classification (ATC). The ATC process depends on the content of one or more predefined classes. However, this process is problematic for the Arabic translation of the data. This study aims to solve this issue by investigating the performances of three classification algorithms, namely, k-nearest neighbor (KNN), decision tree (DT), and nave Bayes (NB) classifiers, on Saudi Press Agency datasets. Results showed that the NB algorithm outperformed DT and KNN algorithms in terms of precision, recall, and F1. In future works, a new algorithm that can improve the handling of the ATC problem will be developed.
关键词:arabic data sets; data mining; decision tree; K-nearest neighbors; NB algorithms; text classification;
term weighting