TY - GEN
T1 - Analysis of MURaM, a Solar Physics Application, for Scalability, Performance and Portability
AU - Wright, Eric
AU - Brown, Cena
AU - Przybylski, Damien
AU - Rempel, Matthias
AU - Suresh, Supreeth
AU - Chandrasekaran, Sunita
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/11/12
Y1 - 2023/11/12
N2 - With the advent of GPUs in parallel computing several languages, tools and compilers are being developed. Many impactful applications can benefit from the performance capabilities these GPUs provide, but moving large, complex code bases to GPU execution often poses many hurdles and growing pains as developers adapt unfamiliar programming models and interface with increasingly complex, but powerful hardwares. One such advanced model is OpenACC, a directive-based parallel programming model designed to target various architectures using a singular source code. In this paper we present our experiences using OpenACC to bring GPU acceleration to MURaM, a state-of-the-art solar physics application jointly developed and used by the National Center for Atmospheric Research (NCAR) and the Max Planck Institute of Solar System Research (MPS). Our work presents several challenges we faced for mapping general parallel concepts to low-level GPU hardware and corresponding performance penalties inherent to these models. While OpenACC provides architecture portability it lacks performance portability in some cases. We discuss possible solutions to the problem of performance portability that we see and create some prototypes to explore what could be gained from these solutions in terms of MURaM. We then provide scaling results and findings transitioning to current generation GPU architectures with strong and weak scaling on up to 512 NVIDIA A100 GPUs, observing that several portions of the code could perform and scale significantly better with the inclusion of more advanced hardware features in OpenACC. On our HPC systems, current performance of MURaM showcases that one A100 GPU provides roughly as much throughput as 90-100 CPU cores, while also scaling further than CPU runs are capable.
AB - With the advent of GPUs in parallel computing several languages, tools and compilers are being developed. Many impactful applications can benefit from the performance capabilities these GPUs provide, but moving large, complex code bases to GPU execution often poses many hurdles and growing pains as developers adapt unfamiliar programming models and interface with increasingly complex, but powerful hardwares. One such advanced model is OpenACC, a directive-based parallel programming model designed to target various architectures using a singular source code. In this paper we present our experiences using OpenACC to bring GPU acceleration to MURaM, a state-of-the-art solar physics application jointly developed and used by the National Center for Atmospheric Research (NCAR) and the Max Planck Institute of Solar System Research (MPS). Our work presents several challenges we faced for mapping general parallel concepts to low-level GPU hardware and corresponding performance penalties inherent to these models. While OpenACC provides architecture portability it lacks performance portability in some cases. We discuss possible solutions to the problem of performance portability that we see and create some prototypes to explore what could be gained from these solutions in terms of MURaM. We then provide scaling results and findings transitioning to current generation GPU architectures with strong and weak scaling on up to 512 NVIDIA A100 GPUs, observing that several portions of the code could perform and scale significantly better with the inclusion of more advanced hardware features in OpenACC. On our HPC systems, current performance of MURaM showcases that one A100 GPU provides roughly as much throughput as 90-100 CPU cores, while also scaling further than CPU runs are capable.
KW - GPU, solar physics
KW - directive-based programming models
KW - magnetohydrodynamics
KW - radiation transport
UR - https://www.scopus.com/pages/publications/85178117243
U2 - 10.1145/3624062.3624606
DO - 10.1145/3624062.3624606
M3 - Conference contribution
AN - SCOPUS:85178117243
T3 - ACM International Conference Proceeding Series
SP - 1929
EP - 1938
BT - Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
PB - Association for Computing Machinery
T2 - 2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023
Y2 - 12 November 2023 through 17 November 2023
ER -