文章基本信息

标题：Survey of Parallel Data Processing in Context with MapReduce
本地全文：下载
作者：Madhavi Vaidya
期刊名称：Computer Science & Information Technology
电子版ISSN：2231-5403
出版年度：2011
卷号：1
期号：3
页码：69-80
DOI：10.5121/csit.2011.1307
出版社：Academy & Industry Research Collaboration Center (AIRCC)
摘要：MapReduce is a parallel programming model and an associated implementation introduced by Google. In the programming model, a user specifies the computation by two functions, Map and Reduce. The underlying MapReduce library automatically parallelizes the computation, and handles complicated issues like data distribution, load balancing and fault tolerance. The original MapReduce implementation by Google, as well as its open-source counterpart, Hadoop, is aimed for parallelizing computing in large clusters of commodity machines.This paper gives an overview of MapReduce programming model and its applications. The author has described here the workflow of MapReduce process. Some important issues, like fault tolerance, are studied in more detail. Even the illustration of working of Map Reduce is given. The data locality issue in heterogeneous environments can noticeably reduce the Map Reduce performance. In this paper, the author has addressed the illustration of data across nodes in a way that each node has a balanced data processing load stored in a parallel manner. Given a data intensive application running on a Hadoop Map Reduce cluster, the auhor has exemplified how data placement is done in Hadoop architecture and the role of Map Reduce in the Hadoop Architecture. The amount of data stored in each node to achieve improved data-processing performance is explained here.
关键词：parallelization; Hadoop; Google File Systems; Map Reduce; Distributed File System