首页    期刊浏览 2024年12月05日 星期四
登录注册

文章基本信息

  • 标题:Creating an AlignedRussian Text Simplification Dataset from Language Learner Data
  • 本地全文:下载
  • 作者:Anna Dmitrieva ; Jörg Tiedemann
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2021
  • 卷号:2021
  • 页码:73-79
  • 语种:English
  • 出版社:ACL Anthology
  • 摘要:Parallel language corpora where regular texts are aligned with their simplified versions can be used in both natural language processing and theoretical linguistic studies. They are essential for the task of automatic text simplification, but can also provide valuable insights into the characteristics that make texts more accessible and reveal strategies that human experts use to simplify texts. Today, there exist a few parallel datasets for English and Simple English, but many other languages lack such data. In this paper we describe our work on creating an aligned Russian-Simple Russian dataset composed of Russian literature texts adapted for learners of Russian as a foreign language. This will be the first parallel dataset in this domain, and one of the first Simple Russian datasets in general.
国家哲学社会科学文献中心版权所有