文章基本信息

标题：Clustering-Based Data Analysis With Hadoop
本地全文：下载
作者：Ms. Shweta Bhonde ; Prof. Mirza Baig
期刊名称：International Journal of Computer Trends and Technology
电子版ISSN：2231-2803
出版年度：2019
卷号：67
期号：5
页码：78-81
DOI：10.14445/22312803/IJCTT-V67I5P113
出版社：Seventh Sense Research Group
摘要：Large collection of data sets includes different types such as structured, unstructured and semistructured data. This data is categories as “Big Data” due to its absolute volume, variety and velocity. Traditional data management, warehousing and analysis system fall short of tools to analyze this data. Big data exceeds the processing capability of traditional database to capture, manage, and process the voluminous amount of data. Due to its specific nature of Big Data, in this paper we first introduce the big data is stored in distributed file system architectures. Hadoop and HDFS by Apache is widely used for storing and managing Big Data and the data processing is done by the Map Reduced system. To process or analyse this huge amount of data or extracting meaningful information is a challenging task.
关键词：Big Data; HDFS; Map Reduced; Cluster