期刊名称:Indian Journal of Computer Science and Engineering
印刷版ISSN:2231-3850
电子版ISSN:0976-5166
出版年度:2021
卷号:12
期号:4
页码:790-797
DOI:10.21817/indjcse/2021/v12i4/211204014
语种:English
出版社:Engg Journals Publications
摘要:Text clustering is gaining importance among researchers because of rapid increase in the availability of online text collections without class labels. It helps to organize, summarize and retrieve useful information from corpora. High dimensionality of text datasets leads to poor performance of clustering algorithms. Dimensionality can be reduced using feature extraction or feature selection methods. Feature selection methods scale well and are easy to interpret. An unsupervised univariate filter feature selection method was proposed for dimensionality reduction. The proposed method outperformed nine other filter methods reported in the literature, by identifying most relevant features that lead to good clustering performance on eight popular text datasets.
关键词:Feature Selection;Unsupervised;Filter Method;Text Clustering;Differential Inverse Document Frequency