文章基本信息

标题：Applying Distributional Semantics to Enhance Classifying Emotions in Arabic Tweets
本地全文：下载
作者：Shahd Alharbi ; Matthew Purver
期刊名称：Computer Science & Information Technology
电子版ISSN：2231-5403
出版年度：2018
卷号：8
期号：6
页码：15-34
DOI：10.5121/csit.2018.80602
出版社：Academy & Industry Research Collaboration Center (AIRCC)
摘要：Most of the recent researches have been carried out to analyse sentiment and emotions found inEnglish texts, where few studies have been conducted on Arabic contents, which have beenfocused on analysing the sentiment as positive and negative, instead of the different emotions’classes. Therefore this paper has focused on analysing different six emotions’ classes in Arabiccontents, especially Arabic tweets which have unstructured nature that make it challenging taskcompared to the formal structured contents found in Arabic journals and books. On the otherhand, the recent developments in the distributional sematic models, have encouraged testing theeffect of the distributional measures on the classification process, which was not investigated byany other classification-related studies for analysing Arabic texts. As a result, the model hassuccessfully improved the average accuracy to more than 86% using Support Vector Machine(SVM) compared to the different sentiments and emotions studies for classifying Arabic textsthrough the developed semi-supervised approach which has employed the contextual and theco-occurrence information from a large amount of unlabelled dataset. In addition to thedifferent remarkable achieved results, the model has recorded a high average accuracy,85.30%, after removing the labels from the unlabelled contextual information which was used inthe labelled dataset during the classification process. Moreover, due to the unstructured natureof Twitter contents, a general set of pre-processing techniques for Arabic texts was found whichhas resulted in increasing the accuracy of the six emotions’ classes to 85.95% while employingthe contextual information from the unlabelled dataset.
关键词：SVM; DSM; classifying; Arabic tweets; hashtags; emoticons; NLP &co-occurrence matrix.