文章基本信息

标题：An Implementation of Hybrid Enhanced Sentiment Analysis System using Spark ML Pipeline: A Big Data Analytics Framework
本地全文：下载
作者：Raviya K ; Mary Vennila S
期刊名称：International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN：2158-107X
电子版ISSN：2156-5570
出版年度：2021
卷号：12
期号：5
页码：323
DOI：10.14569/IJACSA.2021.0120540
出版社：Science and Information Society (SAI)
摘要：Today, we live in the Big Data age. Social networks, online shopping, mobile data are main sources generating huge text data by users. This "text data" will provide companies with useful insight on how customers view their brand and encourage them to make business strategies actively in order to maintain their trade. Hence, it is essential for the enterprises to analyse the sentiments of social media big data to make predictions. Because of the variety and existence of data, the study of sentiment on broad data has become difficult. However, it includes open-source Big Data platforms and machine learning techniques to process large text information in real-time. The advancement in fields including Big Data and Deep Learning technology has influenced and overcome the traditional restrictions of distributed computing. The primary aim is to perform sentiment analysis on the pipelined architecture of Apache Spark ML to speed upward the computations and improve machine efficiency in different environments. Therefore, the Hybrid CNN-SVM model is designed and developed. Here, CNN is pipeline with SVM for sentiment feature extraction and classification in ML to improve the accuracy. It is more flexible, fast and scalable. In addition, Naive Bayes, Support Vector Machines (SVM), Random Forest, Logistic Regression classifiers have been used to measure the efficiency of the proposed system on multi-node environment. The experimental results demonstrate that in terms of different evaluation metrics, the hybrid sentiment analysis model outperforms the conventional models. The proposed method makes it convenient for effective handling of big sentiment datasets. It would be more beneficial for corporations, government and individuals to improve their great value.
关键词：Big data; sentiment analysis; machine learning; apache spark; ML pipeline