期刊名称:Journal of Theoretical and Applied Information Technology
印刷版ISSN:1992-8645
电子版ISSN:1817-3195
出版年度:2021
卷号:99
期号:11
语种:English
出版社:Journal of Theoretical and Applied
摘要:Recently, IoT has revealed a key value in the smart cities. Our living comfortability level has been improved. Such technology requires extensive data processing especially when it is a real time driven data. Apache Hadoop framework is a necessary and efficient model that can be incorporated with the IoT technology. Hadoop, the open-source framework, is typically used for off-line batch processing on large-scale clusters. It has a wide range of applications in the big data industry due to its capability in processing massive data in distributed and parallel environments. However, several aspects should be carefully evaluated before deploying Hadoop-based solutions. The authors thoroughly investigate the Apache Hadoop framework with the focus on factors that directly affect its performance. The work discusses and evaluates two crucial dimensions of Hadoop systems; monitoring tools and their impact on the performance of the Apache Hadoop based clusters, and the most influential parameters and the optimization techniques of Apache Hadoop based systems. Results showed that monitoring tools play a major role in Hadoop-based solutions planning and maintenance. According to the used experimental settings, the Cacti monitoring tool consumes around 45% of the memory usage, however memory usage in Ganglia is more efficient than Cacti tool (i.e., on average around 2.5%). For CPU utilization, both monitoring tools are efficient and the monitoring tool usage amount is almost negligible. The results also showed that there is a shortlist of critical parameters that significantly affect the overall performance. Based on the results, the authors conclude the paper by future directions and possible improvements that need further explorations and experiments.