文章基本信息

标题：The Evolution of Language Models Applied to Emotion Analysis of Arabic Tweets
本地全文：下载
作者：Nora Al-Twairesh
期刊名称：Information
电子版ISSN：2078-2489
出版年度：2021
卷号：12
期号：2
页码：84
DOI：10.3390/info12020084
出版社：MDPI Publishing
摘要：The field of natural language processing (NLP) has witnessed a boom in language representation models with the introduction of pretrained language models that are trained on massive textual data then used to fine-tune downstream NLP tasks. In this paper, we aim to study the evolution of language representation models by analyzing their effect on an under-researched NLP task: emotion analysis; for a low-resource language: Arabic. Most of the studies in the field of affect analysis focused on sentiment analysis, i.e., classifying text into valence (positive, negative, neutral) while few studies go further to analyze the finer grained emotional states (happiness, sadness, anger, etc.). Emotion analysis is a text classification problem that is tackled using machine learning techniques. Different language representation models have been used as features for these machine learning models to learn from. In this paper, we perform an empirical study on the evolution of language models, from the traditional term frequency–inverse document frequency (TF–IDF) to the more sophisticated word embedding word2vec, and finally the recent state-of-the-art pretrained language model, bidirectional encoder representations from transformers (BERT). We observe and analyze how the performance increases as we change the language model. We also investigate different BERT models for Arabic. We find that the best performance is achieved with the ArabicBERT large model, which is a BERT model trained on a large dataset of Arabic text. The increase in F1-score was significant 7–21%.
关键词：pretrained language models; BERT; emotion analysis; Arabic pretrained language models ; BERT ; emotion analysis ; Arabic