文章基本信息

标题：Unsupervised Word Alignment Using Frequency Constraint in Posterior Regularized EM
本地全文：下载
作者：Hidetaka Kamigaito ; Taro Watanabe ; Hiroya Takamura 等
期刊名称：Information and Media Technologies
电子版ISSN：1881-0896
出版年度：2017
卷号：12
页码：46-70
DOI：10.11185/imt.12.46
出版社：Information and Media Technologies Editorial Board
摘要：
Generative word alignment models, such as IBMModels, are restricted to one-to-many alignment, and cannot explicitly represent many-to-many relationships in bilingual texts. The problem is partially solved either by introducing heuristics or by agreement constraints such that two directional word alignments agree with each other. However, this constraint cannot take into account the grammatical difference of language pairs. In particular, function words are not trivial to align for grammatically different language pairs, such as Japanese and English. In this paper, we focus on the posterior regularization framework (Ganchev, Graca, Gillenwater, and Taskar 2010) that can force two directional word alignment models to agree with each other during training, and propose new constraints that can take into account the difference between function words and content words. We discriminate a function word and a content word using word frequency in the same way as done by Setiawan, Kan, and Li (2007). Experimental results show that our proposed constraints achieved better alignment qualities on the French-English Hansard task and the Japanese-English Kyoto free translation task (KFTT) measured by AER and F-measure. In translation evaluations, we achieved statistically significant gains in BLEU scores in the Japanese-English NTCIR10 task and Spanish-English WMT06 task.
关键词：Statistical Machine Translation;Unsupervised Word Alignment;Posterior Regularization Framework;Constrained EM