首页    期刊浏览 2025年02月28日 星期五
登录注册

文章基本信息

  • 标题:Real-time Twitter data analysis using Hadoop ecosystem
  • 本地全文:下载
  • 作者:Anisha P. Rodrigues ; Niranjan N. Chiplunkar
  • 期刊名称:Cogent Engineering
  • 电子版ISSN:2331-1916
  • 出版年度:2018
  • 卷号:5
  • 期号:1
  • 页码:1-16
  • DOI:10.1080/23311916.2018.1534519
  • 出版社:Taylor and Francis Ltd
  • 摘要:In the era of the Internet, social media has become an integral part of modern society. People use social media to share their opinions and to have an up-to-date knowledge about the current trends on a daily basis. Twitter is one of the renowned social media that gets a huge amount of tweets each day. This information can be used for economic, industrial, social or government approaches by arranging and analyzing the tweets as per our demand. Since Twitter contains a huge volumeof data, storing and processing this data is a complex problem. Hadoop is a big data storage and processing tool for analyzing data with 3Vs, i.e. data with huge volume, variety and velocity. Hadoop is a framework which deals with Big data and it has its own family which supports processing of different things which are tied up in one umbrella called the Hadoop Ecosystem. In this paper, we will be analyzing tweets streamed in real time. We have used Apache Flume to capture real-time tweets. As an analysis, we have proposed a method for finding recent trends in tweets and performed sentiment analysis on real-time tweets. The analysis is done using Hadoop ecosystem tools such as Apache Hive and Apache Pig. Performance in terms of execution time is compared for analysis of real-time tweets using Pig and Hive. From the experimental results, conclusion can be drawn that Pig is more efficient than Hive as Pig takes less time for execution than Hive..
  • 关键词:Apache Flume; Apache Hive; Apache Pig; Hadoop
国家哲学社会科学文献中心版权所有