文章基本信息

标题：Faster and Smaller N-Gram Language Models
本地全文：下载
作者：Adam Pauls ; Dan Klein
期刊名称：Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度：2011
卷号：2011
出版社：ACL Anthology
摘要：N-gram language models are a major resource bottleneck in machine translation. In this paper, we present several language model implementations that are both highly compact and fast to query. Our fastest implementation is as fast as the widely used SRILM while requiring only 25% of the storage. Our most compact representation can store all 4 billion n-grams and associated counts for the Google n-gram corpus in 23 bits per n-gram, the most compact lossless representation to date, and even more compact than recent lossy compression techniques. We also discuss techniques for improving query speed during decoding, including a simple but novel language model caching technique that improves the query speed of our language models (and SRILM) by up to 300%.