首页    期刊浏览 2024年12月02日 星期一
登录注册

文章基本信息

  • 标题:Study on Constants of Natural Language Texts
  • 本地全文:下载
  • 作者:Daisuke Kimura ; Kumiko Tanaka-Ishii
  • 期刊名称:Information and Media Technologies
  • 电子版ISSN:1881-0896
  • 出版年度:2014
  • 卷号:9
  • 期号:4
  • 页码:771-789
  • DOI:10.11185/imt.9.771
  • 出版社:Information and Media Technologies Editorial Board
  • 摘要:This paper considers different measures that might become constants for any length of a given natural language text. Such measures indicate a potential for studying the complexity of natural language but have previously only been studied using relatively small English texts. In this study, we consider measures for texts in languages other than English, and for large-scale texts. Among the candidate measures, we consider Yule's K , Orlov's Z , and Golcher's VM , each of whose convergence has been previously argued empirically. Furthermore, we introduce entropy H , and a measure, r , related to the scale-free property of language. Our experiments show that both K and VM are convergent for texts in various languages, whereas the other measures are not.
  • 关键词:textual constants;multilingual texts;complexity;language models;redundancy
国家哲学社会科学文献中心版权所有