首页    期刊浏览 2025年02月28日 星期五
登录注册

文章基本信息

  • 标题:An Empirical Study for Handling Scientific Datasets
  • 本地全文:下载
  • 作者:Yunhee Kang ; Heeyoul Choi
  • 期刊名称:International Journal of Grid and Distributed Computing
  • 印刷版ISSN:2005-4262
  • 出版年度:2012
  • 卷号:5
  • 期号:3
  • 出版社:SERSC
  • 摘要:Since the volume of data generated by a scientific data experiment has grown exponentially, new scientific methods to analyze and organize the data are required. Hence, these methods need to be used effective infrastructure composed of computing resources that are used for pre-processing and post-processing data. The demanding requirement has led to development of methods to reduce the size of dataset and to apply a new programming model and its implementation like MapReduce. In this paper, we describe an empirical study for handling the dataset of a scientific data experiment to support data transformation, which is an essential phase to handling large-scale data in scientific data experiments. In this experiment we show a way to optimize the dataset written in netCDF by a data reduction as a sub-setting method and to process the dataset about tornado outbreak in the US by Hadoop, a MapReduce framework. These methods can be applied to pre-processing and post-processing in scientific data experiments
  • 关键词:MapReduce; Scientific Data Experiment; Sub-Setting; Data Transformation
国家哲学社会科学文献中心版权所有