首页    期刊浏览 2025年02月27日 星期四
登录注册

文章基本信息

  • 标题:Comparing Automatic and Human Evaluation of NLG Systems
  • 本地全文:下载
  • 作者:Anja Belz ; Ehud Reiter
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2006
  • 卷号:2006
  • 出版社:ACL Anthology
  • 摘要:We consider the evaluation problem in Natural Language Generation (NLG) and present results for evaluating several NLG systems with similar functionality, including a knowledge-based generator and several statistical systems. We compare evaluation results for these systems by human domain experts, human non-experts, and several automatic evaluation metrics, including NIST, BLEU, and ROUGE. We find that NIST scores correlate best (> 0.8) with human judgments, but that all automatic metrics we examined are biased in favour of generators that select on the basis of frequency alone. We conclude that automatic evaluation of NLG systems has considerable potential, in particular where high-quality reference texts and only a small number of human evaluators are available. However, in general it is probably best for automatic evaluations to be supported by human-based evaluations, or at least by studies that demonstrate that a particular metric correlates well with human judgments in a given domain.
国家哲学社会科学文献中心版权所有