首页    期刊浏览 2024年12月12日 星期四
登录注册

文章基本信息

  • 标题:Evaluating Topic Modeling Interpretability Using Topic Labeled Gold-standard Sets
  • 本地全文:下载
  • 作者:Biagio Palese ; Gabriele Piccoli
  • 期刊名称:Communications of the Association for Information Systems
  • 印刷版ISSN:1529-3181
  • 出版年度:2020
  • 卷号:47
  • 页码:433-451
  • DOI:10.17705/1CAIS.04720
  • 语种:English
  • 出版社:Association for Information Systems
  • 摘要:The paucity of rigorous evaluation measures undermines topic modeling results’ validity and trustworthiness. Accordingly, we propose a method that researchers can use to select models when they assess topics’ human interpretability. We show how they can evaluate different topic models using gold-standard sets that humans label. Our approach ensures that the topics extracted algorithmically from an entire corpus concur with the themes humans would have identified in the same documents. By doing so, we combine human coding’s advantages for topic interpretability with algorithmic topic Modeling’s analytical efficiency and scalability. We demonstrate that one can rigorously identify optimal model parametrizations for maximum interpretability and to rigorously justify model selection. We also contribute three open access gold-standard sets in the hospitality context and make them available so other researchers can use them to benchmark their models or validate their results. Finally, we showcase a methodology for designing and developing gold-standard sets for validating topic models, which researchers interested in developing gold-standard sets in domains and contexts appropriate for their research can use.
  • 关键词:Human Interpretable Topics;Gold-standard Set;Text Mining;Topic Evaluation;Topic Interpretability Measure;Topic Modeling
国家哲学社会科学文献中心版权所有