文章基本信息

标题：Efficient Hybrid Semantic Text Similarity using Wordnet and a Corpus
本地全文：下载
作者：Issa Atoum ; Ahmed Otoom
期刊名称：International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN：2158-107X
电子版ISSN：2156-5570
出版年度：2016
卷号：7
期号：9
DOI：10.14569/IJACSA.2016.070917
出版社：Science and Information Society (SAI)
摘要：Text similarity plays an important role in natural language processing tasks such as answering questions and summarizing text. At present, state-of-the-art text similarity algorithms rely on inefficient word pairings and/or knowledge derived from large corpora such as Wikipedia. This article evaluates previous word similarity measures on benchmark datasets and then uses a hybrid word similarity in a novel text similarity measure (TSM). The proposed TSM is based on information content and WordNet semantic relations. TSM includes exact word match, the length of both sentences in a pair, and the maximum similarity between one word and the compared text. Compared with other well-known measures, results of TSM are surpassing or comparable with the best algorithms in the literature.
关键词：thesai; IJACSA; thesai.org; journal; IJACSA papers; text similarity; distributional similarity; information content; knowledge-based similarity; corpus-based similarity; WordNet