文章基本信息

标题：Efficient Clustering Algorithms in Text Mining
本地全文：下载
作者：Nataraj Gudapaty ; G Loshma ; Dr. Nagaratna P Hegde 等
期刊名称：International Journal of Computer Science & Technology
印刷版ISSN：2229-4333
电子版ISSN：0976-8491
出版年度：2012
卷号：3
期号：2
页码：1068-1072
语种：English
出版社：Ayushmaan Technologies
摘要：Text Mining is to process unstructured (textual) information, extract meaningful numeric indices from the text, and, thus, make the information contained in the text accessible to the various data mining (statistical and machine learning) algorithms. Information can be extracted to derive summaries for the words contained in the documents. Document clustering is a fundamental task of text mining, by which efficient organization, navigation, summarization, and retrieval of documents can be achieved. The clustering of documents presents difficult challenges due to the sparsity and the high dimensionality of text data, and to the complex semantics. K-means and PAM (partitioning around mediods) algorithms of text clustering and semantic-based vector space model, a semantic based PAM text clustering model is proposed to solve the problem on high-dimensional and sparse characteristics of text data set. The model reduces the semantic loss of the text data and improves the quality of text clustering. We propose a novel adaptive kernel K-means clustering algorithm and PAM (Partition Around Mediods) algorithm to combine textual content and citation information for clustering. In this text mining process using semantics the comparison betweenK-Means and PAM is done. The time and space complexities of these two algorithms are compared and presented as bar charts and line charts using graphs.