期刊名称:The Prague Bulletin of Mathematical Linguistics
印刷版ISSN:0032-6585
电子版ISSN:1804-0462
出版年度:2011
卷号:96
期号:1
页码:89-98
DOI:10.2478/v10108-011-0014-1
语种:English
出版社:Walter de Gruyter GmbH
摘要:We present a tool that extracts phrase pairs from a word-aligned parallel corpus and filters them on the fly based on a user-defined frequency threshold. The bulk of phrase pairs to be scored is much reduced, making the whole phrase table construction process faster with no significant harm to the ultimate phrase table quality as measured by BLEU. Technically, our tool is an alternative to the extract component of the phrase-extract toolkit bundled with Moses SMT software and covers some of the functionality of sigfilter .