期刊名称:International Journal of Networking and Computing
印刷版ISSN:2185-2847
出版年度:2018
卷号:8
期号:2
页码:387-407
语种:English
出版社:International Journal of Networking and Computing
摘要:This paper presents a practical implementation of a 2D flood simulation model using hybrid distributed-parallel technologies including MPI, OpenMP, CUDA, and evaluations of its performance under various configurations that utilize these technologies. The main objective of this research work was to improve the computational performance of the flood simulation in a hybrid architecture. Modern desktops and small cluster systems owned by domain researchers are able to perform these simulations efficiently due to multicore and GPU computing devices, but lack the expertise needed to fully utilize software libraries designed to take advantage of the latest hardware. By leveraging knowledge of our experimentation environment, we were able to incorporate MPI and multiple GPU devices to improve performance over a single-process OpenMP version up to 18x, depending on the size of the input data. We discuss some observations that have significant effects on overall performance, including process-to-device mapping, communication strategies and data partitioning, and present some experimental results. The limitations of this work are discussed, and we propose some ideas to relieve or overcome such limitations in future work.
其他摘要:This paper presents a practical implementation of a 2D flood simulation model using hybrid distributed-parallel technologies including MPI, OpenMP, CUDA, and evaluations of its performance under various configurations that utilize these technologies. The main objective of this research work was to improve the computational performance of the flood simulation in a hybrid architecture. Modern desktops and small cluster systems owned by domain researchers are able to perform these simulations efficiently due to multicore and GPU computing devices, but lack the expertise needed to fully utilize software libraries designed to take advantage of the latest hardware. By leveraging knowledge of our experimentation environment, we were able to incorporate MPI and multiple GPU devices to improve performance over a single-process OpenMP version up to 18x, depending on the size of the input data. We discuss some observations that have significant effects on overall performance, including process-to-device mapping, communication strategies and data partitioning, and present some experimental results. The limitations of this work are discussed, and we propose some ideas to relieve or overcome such limitations in future work.