首页    期刊浏览 2025年02月28日 星期五
登录注册

文章基本信息

  • 标题:Comparison of Turkish Word Representations Trained on Different Morphological Forms
  • 本地全文:下载
  • 作者:Gökhan Güler ; A. Cüneyd Tantuğ
  • 期刊名称:Computer Science & Information Technology
  • 电子版ISSN:2231-5403
  • 出版年度:2020
  • 卷号:10
  • 期号:1
  • 页码:107-116
  • DOI:10.5121/csit.2020.100110
  • 出版社:Academy & Industry Research Collaboration Center (AIRCC)
  • 摘要:Increased popularity of different text representations has also brought many improvements inNatural Language Processing (NLP) tasks. Without need of supervised data, embeddingstrained on large corpora provide us meaningful relations to be used on different NLP tasks.Even though training these vectors is relatively easy with recent methods, information gainedfrom the data heavily depends on the structure of the corpus language. Since the popularlyresearched languages have a similar morphological structure, problems occurring formorphologically rich languages are mainly disregarded in studies. For morphologically richlanguages, context-free word vectors ignore morphological structure of languages. In thisstudy, we prepared texts in morphologically different forms in a morphologically richlanguage, Turkish, and compared the results on different intrinsic and extrinsic tasks. To seethe effect of morphological structure, we trained word2vec model on texts which lemma andsuffixes are treated differently. We also trained subword model fastText and compared theembeddings on word analogy, text classification, sentimental analysis, and language modeltasks.
  • 关键词:embedding; vector; morphology; Turkish; word2vec; fastText
国家哲学社会科学文献中心版权所有