首页    期刊浏览 2024年12月02日 星期一
登录注册

文章基本信息

  • 标题:Approximating Probabilistic Models as Weighted Finite Automata
  • 本地全文:下载
  • 作者:Ananda Theertha Suresh ; Brian Roark ; Michael Riley
  • 期刊名称:Computational Linguistics
  • 印刷版ISSN:0891-2017
  • 电子版ISSN:1530-9312
  • 出版年度:2021
  • 卷号:47
  • 期号:2
  • 页码:221-254
  • DOI:10.1162/coli_a_00401
  • 语种:English
  • 出版社:MIT Press
  • 摘要:AbstractWeighted finite automata (WFAs) are often used to represent probabilistic models, such as n-gram language models, because among other things, they are efficient for recognition tasks in time and space. The probabilistic source to be represented as a WFA, however, may come in many forms. Given a generic probabilistic model over sequences, we propose an algorithm to approximate it as a WFA such that the Kullback-Leibler divergence between the source model and the WFA target model is minimized. The proposed algorithm involves a counting step and a difference of convex optimization step, both of which can be performed efficiently. We demonstrate the usefulness of our approach on various tasks, including distilling n-gram models from neural models, building compact language models, and building open-vocabulary character models. The algorithms used for these experiments are available in an open-source software library.
国家哲学社会科学文献中心版权所有