期刊名称:INTERNATIONAL JOURNAL OF INFORMATION SCIENCE AND MANAGEMENT
印刷版ISSN:2008-8302
电子版ISSN:2008-8310
出版年度:2019
卷号:17
期号:1
语种:English
出版社:REGIONAL INFORMATION CENTER FOR SCIENCE AND TECHNOLOGY
摘要:Heterogeneous data in all groups are growing on the web nowadays. Because of the variety of data types in the web search results, it is common to classify the results in order to find the preferred data. Many machine learning methods are used to classify textual data. The main challenges in data classification are the cost of classifier and performance of classification. A traditional model in IR and text data representation is the vector space model. In this representation cost of computations are dependent upon the dimension of the vector. Another problem is to select effective features and prune unwanted terms. Latent semantic indexing is used to transform VSM to orthogonal semantic space with term relation consideration. Experimental results showed that LSI semantic space can achieve better performance in computation time and classification accuracy. This result showed that semantic topic space has less noise so the accuracy will increase. Less vector dimension also reduces the computational complexity.