期刊名称:International Journal of Information Science and Management (IJISM)
印刷版ISSN:2008-8302
电子版ISSN:2008-8310
出版年度:2019
卷号:17
期号:1
页码:33-46
出版社:REGIONAL INFORMATION CENTER FOR SCIENCE AND TECHNOLOGY
摘要:Heterogeneous data in all groups are growing on the web nowadays. Because of the
variety of data types in the web search results, it is common to classify the results
in order to find the preferred data. Many machine learning methods are used to
classify textual data. The main challenges in data classification are the cost of
classifier and performance of classification. A traditional model in IR and text data
representation is the vector space model. In this representation cost of
computations are dependent upon the dimension of the vector. Another problem is
to select effective features and prune unwanted terms. Latent semantic indexing is
used to transform VSM to orthogonal semantic space with term relation
consideration. Experimental results showed that LSI semantic space can achieve
better performance in computation time and classification accuracy. This result
showed that semantic topic space has less noise so the accuracy will increase. Less
vector dimension also reduces the computational complexity.
其他摘要:Heterogeneous data in all groups are growing on the web nowadays. Because of the variety of data types in the web search results, it is common to classify the results in order to find the preferred data. Many machine learning methods are used to classify textual data. The main challenges in data classification are the cost of classifier and performance of classification. A traditional model in IR and text data representation is the vector space model. In this representation cost of computations are dependent upon the dimension of the vector. Another problem is to select effective features and prune unwanted terms. Latent semantic indexing is used to transform VSM to orthogonal semantic space with term relation consideration. Experimental results showed that LSI semantic space can achieve better performance in computation time and classification accuracy. This result showed that semantic topic space has less noise so the accuracy will increase. Less vector dimension also reduces the computational complexity.
关键词:Persian Text Classification; Vector Space Model; Latent Semantic Indexing (LSI).