期刊名称:International Journal of Computer Science and Network Security
印刷版ISSN:1738-7906
出版年度:2018
卷号:18
期号:12
页码:95-101
出版社:International Journal of Computer Science and Network Security
摘要:Scientific publications has been increasing enormously, with this increase classification of scientific publications is becoming challenging task. The core objective of this research is to analyze the performance of classification algorithms using Scopus dataset. In text classification, classification and feature extraction from the document using extracted features are the major issues for decreasing the performances in different algorithms. In this paper, performances of classification algorithms such as Na?ve Bayes (NB) and K-Nearest Neighbor (K-NN) shown better improvement using Bayesian boost and bagging. The performance results were analyzed through selected classification algorithms over 10K documents from Scopus examined using F-measure and produced comparison matrices to estimate accuracy, precision and recall using NB and KNN classifier. Further, data preprocessing and cleaning steps are induced on the selected dataset and class imbalance issues are analyzed to increase the performance of text classification algorithms. Experimental results showed performances over 7% using K-NN and revealed better as compared to NB.
关键词:K-NN; na?ve bayes; text classification; rapid miner; feature extraction