期刊名称:IAENG International Journal of Computer Science
印刷版ISSN:1819-656X
电子版ISSN:1819-9224
出版年度:2021
卷号:48
期号:1
语种:English
出版社:IAENG - International Association of Engineers
摘要:Effectiveness of a topic extraction method is depends on the capability to extract information from a large amount of data. This paper proposes a new solution for selecting a set of topics from text documents as well as a new approach to set the weight of a term in clusters of documents. The process of revealing topics from multiple documents starts with a preprocessing step, which aims to omit the unnecessary portion of the text. After that, the weights of terms in documents are calculated based on the term frequency-inverse document frequency method. Then, feature transformation based on singular value decomposition is employed to build weight for clustering. The clustering process is conducted using the Growing Neural Gas method. Finally, to determine the weights of terms in clusters as a way of selecting topics from clusters, the proposed probabilistic inverse cluster frequency term-cluster method is applied. Experiments show that the framework attains satisfactory results indicated by the average accuracy of 0.8606, 0.7406, 0.4039, and 0.6647 for topics obtained from Binary2, Multi5, Multi7, and Multi10 categories of 20Newsgroup dataset.
关键词:topic extraction;growing neural gas clustering;probabilistic inverse cluster frequency term-cluster weighting;feature transformation