首页    期刊浏览 2025年02月04日 星期二
登录注册

文章基本信息

  • 标题:Extract Transform and Load Strategy for Unstructured Data into Data Warehouse Using Map Reduce Paradigm and Big Data Analytics
  • 本地全文:下载
  • 作者:P.Saravana kumar ; M.Athigopal ; S.Vetrivel
  • 期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
  • 印刷版ISSN:2320-9798
  • 电子版ISSN:2320-9801
  • 出版年度:2014
  • 卷号:2
  • 期号:12
  • 出版社:S&S Publications
  • 摘要:Analytics over the huge volume of data is now possible with Big data. Data keep on accumulated onevery minute from multitude data sources such as social media, mobile devices, and sensors. In order to extract insightsfrom diverse information feeds from multiple, often unrelated sources, data need to be correlated or harmonized to acommon level of granularity. Loading Unstructured Data into Data warehouse getting complex. A strategy for fetchingthe unstructured data into Hadoop Distributed File System is discussed. Data cleansing and profiling of extracted datais important to overcome data quality concerns. Transform phase carried with map reduce frame work. Computationratio, Network band width and Data locality parameters are monitored with full dump and Incremental load operations.Pig Latin is used to process data from Hadoop Distributed File System and finally load the process data into HDFS fileor Data warehouse. Aggregated data from Pig is minimal Subset of Data is Loaded to Data warehouse for BusinessAnalytics and Enterprise Reporting. Based on the Performance related parameters appropriate strategy is suggested forDifferent type of application.
  • 关键词:Big data Analytics; Data warehouse; Map-reduce Paradigm; ETL Process
国家哲学社会科学文献中心版权所有