TY - JOUR
T1 - Acceleration of the Parameterization of Unified Microphysics Across Scales (PUMAS) on the Graphics Processing Unit (GPU) With Directive-Based Methods
AU - Sun, Jian
AU - Dennis, John M.
AU - Mickelson, Sheri A.
AU - Vanderwende, Brian
AU - Gettelman, Andrew
AU - Thayer-Calder, Katherine
N1 - Publisher Copyright:
© 2023 The Authors. Journal of Advances in Modeling Earth Systems published by Wiley Periodicals LLC on behalf of American Geophysical Union.
PY - 2023/5
Y1 - 2023/5
N2 - Cloud microphysics is one of the most time-consuming components in a climate model. In this study, we port the cloud microphysics parameterization in the Community Atmosphere Model (CAM), known as Parameterization of Unified Microphysics Across Scales (PUMAS), from CPU to GPU to seek a computational speedup. The directive-based methods (OpenACC and OpenMP target offload) are determined as the best fit specifically for our development practices, which enable a single version of source code to run either on the CPU or GPU, and yield a better portability and maintainability. Their performance is first examined in a PUMAS stand-alone kernel and the directive-based methods can outperform a CPU node as long as there is enough computational burden on the GPU. A consistent behavior is observed when we run PUMAS on the GPU in a practical CAM simulation. A 3.6× speedup of the PUMAS execution time, including data movement between CPU and GPU, is achieved at a coarse horizontal resolution (8 NVIDIA V100 GPUs against 36 Intel Skylake CPU cores). This speedup further increases up to 5.4× at a high resolution (24 NVIDIA V100 GPUs against 108 Intel Skylake CPU cores), which highlights the fact that GPU favors larger problem size. This study demonstrates that using GPU in a CAM simulation can save noticeable computational costs even with a small portion of code being GPU-enabled. Therefore, we are encouraged to port more parameterizations to GPU to take advantage of its computational benefit.
AB - Cloud microphysics is one of the most time-consuming components in a climate model. In this study, we port the cloud microphysics parameterization in the Community Atmosphere Model (CAM), known as Parameterization of Unified Microphysics Across Scales (PUMAS), from CPU to GPU to seek a computational speedup. The directive-based methods (OpenACC and OpenMP target offload) are determined as the best fit specifically for our development practices, which enable a single version of source code to run either on the CPU or GPU, and yield a better portability and maintainability. Their performance is first examined in a PUMAS stand-alone kernel and the directive-based methods can outperform a CPU node as long as there is enough computational burden on the GPU. A consistent behavior is observed when we run PUMAS on the GPU in a practical CAM simulation. A 3.6× speedup of the PUMAS execution time, including data movement between CPU and GPU, is achieved at a coarse horizontal resolution (8 NVIDIA V100 GPUs against 36 Intel Skylake CPU cores). This speedup further increases up to 5.4× at a high resolution (24 NVIDIA V100 GPUs against 108 Intel Skylake CPU cores), which highlights the fact that GPU favors larger problem size. This study demonstrates that using GPU in a CAM simulation can save noticeable computational costs even with a small portion of code being GPU-enabled. Therefore, we are encouraged to port more parameterizations to GPU to take advantage of its computational benefit.
KW - CAM
KW - GPU
KW - OpenACC
KW - OpenMP target offload
KW - PUMAS
UR - https://www.scopus.com/pages/publications/85151751895
U2 - 10.1029/2022MS003515
DO - 10.1029/2022MS003515
M3 - Article
AN - SCOPUS:85151751895
SN - 1942-2466
VL - 15
JO - Journal of Advances in Modeling Earth Systems
JF - Journal of Advances in Modeling Earth Systems
IS - 5
M1 - e2022MS003515
ER -