期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2011
卷号:2011
出版社:ACL Anthology
摘要:We present an unsupervised model for joint
phrase alignment and extraction using nonparametric
Bayesian methods and inversion
transduction grammars (ITGs). The key contribution
is that phrases of many granularities
are included directly in the model through
the use of a novel formulation that memorizes
phrases generated not only by terminal, but
also non-terminal symbols. This allows for
a completely probabilistic model that is able
to create a phrase table that achieves competitive
accuracy on phrase-based machine
translation tasks directly from unaligned sentence
pairs. Experiments on several language
pairs demonstrate that the proposed model
matches the accuracy of traditional two-step
word alignment/phrase extraction approach
while reducing the phrase table to a fraction
of the original size.