期刊名称:International Journal of Data Mining & Knowledge Management Process
印刷版ISSN:2231-007X
电子版ISSN:2230-9608
出版年度:2015
卷号:5
期号:6
页码:53
DOI:10.5121/ijdkp.2015.5605
出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:Text mining is an emerging research field evolving from information retrieval area. Clustering andclassification are the two approaches in data mining which may also be used to perform text classificationand text clustering. The former is supervised while the later is un-supervised. In this paper, our objective isto perform text clustering by defining an improved distance metric to compute the similarity between twotext files. We use incremental frequent pattern mining to find frequent items and reduce dimensionality.The improved distance metric may also be used to perform text classification. The distance metric isvalidated for the worst, average and best case situations [15]. The results show the proposed distancemetric outperforms the existing measures.
关键词:frequent items; text mining; dimensionality reduction