首页    期刊浏览 2024年12月04日 星期三
登录注册

文章基本信息

  • 标题:Dependency vs. Constituent Based Syntactic N-Grams in Text Similarity Measures for Paraphrase Recognition
  • 本地全文:下载
  • 作者:Hiram Calvo ; Andrea Segura-Olivares ; Alejandro García
  • 期刊名称:Computación y Sistemas
  • 印刷版ISSN:1405-5546
  • 出版年度:2014
  • 卷号:18
  • 期号:3
  • 页码:517-554
  • 语种:English
  • 出版社:Instituto Politécnico Nacional
  • 其他摘要:Paraphrase recognition consists in detecting if an expression restated as another expression contains the same information. Traditionally, for solving this prob- lem, several lexical, syntactic and semantic based tech- niques are used. For measuring word overlapping, most of the works use n-grams; however syntactic n-grams have been scantily explored. We propose using syntac- tic dependency and constituent n-grams combined with common NLP techniques such as stemming, synonym detection, similarity measures , and linear combination and a similarity matrix built in turn from syntactic n- grams. We measure and compare the performance of our system by using the Microsoft Research Paraphrase Corpus. An in-depth research is presented in order to present the strengths and weaknesses of each ap- proach, as well as a common error analysis section. Our main motivation was to determine which syntactic approach had a better performance for this task: syn- tactic dependency n-grams, or syntactic constituent n- grams. We compare too both approaches with traditional n-grams and state-of-the-art systems.
  • 关键词:Paraphrase recognition; Microsoft Research paraphrase corpus; similarity measures; syntactic n- grams; constituent analysis; dependency analysis
国家哲学社会科学文献中心版权所有