首页    期刊浏览 2024年11月30日 星期六
登录注册

文章基本信息

  • 标题:Joint Phrase Alignment and Extraction for Statistical Machine Translation
  • 本地全文:下载
  • 作者:Graham Neubig ; Taro Watanabe ; Eiichiro Sumita
  • 期刊名称:Information and Media Technologies
  • 电子版ISSN:1881-0896
  • 出版年度:2012
  • 卷号:7
  • 期号:2
  • 页码:793-804
  • DOI:10.11185/imt.7.793
  • 出版社:Information and Media Technologies Editorial Board
  • 摘要:The phrase table, a scored list of bilingual phrases, lies at the center of phrase-based machine translation systems. We present a method to directly learn this phrase table from a parallel corpus of sentences that are not aligned at the word level. The key contribution of this work is that while previous methods have generally only modeled phrases at one level of granularity, in the proposed method phrases of many granularities are included directly in the model. This allows for the direct learning of a phrase table that achieves competitive accuracy without the complicated multi-step process of word alignment and phrase extraction that is used in previous research. The model is achieved through the use of non-parametric Bayesian methods and inversion transduction grammars (ITGs), a variety of synchronous context-free grammars (SCFGs). Experiments on several language pairs demonstrate that the proposed model matches the accuracy of the more traditional two-step word alignment/phrase extraction approach while reducing its phrase table to a fraction of its original size.
  • 关键词:statistical machine translation;phrase alignment;non-parametric Bayesian statistics;inversion transduction grammars
国家哲学社会科学文献中心版权所有