期刊名称:International Journal of Computer Science and Communication Networks
电子版ISSN:2249-5789
出版年度:2012
卷号:2
期号:4
页码:553-558
出版社:Technopark Publications
摘要:Web mining involves activities such as document clustering, community mining etc. to be performed on web. Such tasks need measuring semantic similarity between words. This helps in performing web mining activities easily in many applications. However, the accuracy of measuring semantic similarity between any two words is difficult task. In this paper a new approach is proposed to measure similarity between words. This approach is based on text snippets and page counts. These two measures are taken from the results of a search engine like Google. To achieve the aim of this paper, lexical patterns are extracted from text snippets and word co-occurrence measures are defined using page counts. The results of these two are combined. Moreover, we proposed algorithms such as pattern clustering and pattern extraction in order to find various relationships between any given two words. Support Vector Machines, a data mining technique, is used to optimize the results. The empirical results reveal that the proposed techniques are finding best results that can be compared with human ratings and accuracy in web mining activities
关键词:Text snippets; word count; semantic similarity; web mining; lexical patterns