首页    期刊浏览 2025年02月28日 星期五
登录注册

文章基本信息

  • 标题:The Evolution of Language Models Applied to Emotion Analysis of Arabic Tweets
  • 本地全文:下载
  • 作者:Nora Al-Twairesh
  • 期刊名称:Information
  • 电子版ISSN:2078-2489
  • 出版年度:2021
  • 卷号:12
  • 期号:2
  • 页码:84
  • DOI:10.3390/info12020084
  • 出版社:MDPI Publishing
  • 摘要:The field of natural language processing (NLP) has witnessed a boom in language representation models with the introduction of pretrained language models that are trained on massive textual data then used to fine-tune downstream NLP tasks. In this paper, we aim to study the evolution of language representation models by analyzing their effect on an under-researched NLP task: emotion analysis; for a low-resource language: Arabic. Most of the studies in the field of affect analysis focused on sentiment analysis, i.e., classifying text into valence (positive, negative, neutral) while few studies go further to analyze the finer grained emotional states (happiness, sadness, anger, etc.). Emotion analysis is a text classification problem that is tackled using machine learning techniques. Different language representation models have been used as features for these machine learning models to learn from. In this paper, we perform an empirical study on the evolution of language models, from the traditional term frequency–inverse document frequency (TF–IDF) to the more sophisticated word embedding word2vec, and finally the recent state-of-the-art pretrained language model, bidirectional encoder representations from transformers (BERT). We observe and analyze how the performance increases as we change the language model. We also investigate different BERT models for Arabic. We find that the best performance is achieved with the ArabicBERT large model, which is a BERT model trained on a large dataset of Arabic text. The increase in F1-score was significant 7–21%.
  • 关键词:pretrained language models; BERT; emotion analysis; Arabic pretrained language models ; BERT ; emotion analysis ; Arabic
国家哲学社会科学文献中心版权所有