首页    期刊浏览 2024年12月12日 星期四
登录注册

文章基本信息

  • 标题:Importance of Memory Management Layer in Big Data Architecture
  • 本地全文:下载
  • 作者:Maha Dessokey ; Sherif M. Saif ; Hesham Eldeeb
  • 期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
  • 印刷版ISSN:2158-107X
  • 电子版ISSN:2156-5570
  • 出版年度:2022
  • 卷号:13
  • 期号:5
  • DOI:10.14569/IJACSA.2022.0130554
  • 语种:English
  • 出版社:Science and Information Society (SAI)
  • 摘要:The generation of daily massive amounts of heterogeneous data from a variety of sources presents a challenge in terms of storage and analysis capabilities and brings new problems into high-performance computing clusters. To better utilize this huge and heterogeneous data, the continuous development of advanced Big Data platforms and Big Data analytic techniques are required. One of the significant issues with in-memory Big Data processing platforms, such as Apache Spark, is the user’s responsibility to decide whether the intermediate data should be cached or not. In addition, the data may be kept in several storage systems and physically scattered over different racks, regions, and clouds. Data need to be close to the computation nodes and hence data locality issue is a challenge. In this paper, using a distinct memory management layer between the data processing layer and the data storage layer, which automatically caches data without the need for any interaction from the applications’ developers, is evaluated. K-means, PageRank and WordCount workloads from the HiBench benchmark beside a real case to predict the price of Real Estate that is implemented using Gradient Boosting Regression Tree model, are used to evaluate this framework. Experiments show that the memory management layer outperforms the Apache Spark in reducing the execution time.
  • 关键词:Apache Spark; Big Data; data analytics algorithms; memory management
国家哲学社会科学文献中心版权所有