期刊名称:The Prague Bulletin of Mathematical Linguistics
印刷版ISSN:0032-6585
电子版ISSN:1804-0462
出版年度:2016
卷号:106
期号:1
页码:125-146
DOI:10.1515/pralin-2016-0013
语种:English
出版社:Walter de Gruyter GmbH
摘要:We present EFMARAL, a new system for efficient and accurate word alignment using a Bayesian model with Markov Chain Monte Carlo (MCMC) inference. Through careful selection of data structures and model architecture we are able to surpass the fast_align system, commonly used for performance-critical word alignment, both in computational efficiency and alignment accuracy. Our evaluation shows that a phrase-based statistical machine translation (SMT) system produces translations of higher quality when using word alignments from EFMARAL than from fast_align, and that translation quality is on par with what is obtained using GIZA++, a tool requiring orders of magnitude more processing time. More generally we hope to convince the reader that Monte Carlo sampling, rather than being viewed as a slow method of last resort, should actually be the method of choice for the SMT practitioner and others interested in word alignment.