TY - JOUR
T1 - Using Machine Learning at scale in numerical simulations with SmartSim
T2 - An application to ocean climate modeling
AU - Partee, Sam
AU - Ellis, Matthew
AU - Rigazzi, Alessandro
AU - Shao, Andrew E.
AU - Bachman, Scott
AU - Marques, Gustavo
AU - Robbins, Benjamin
N1 - Publisher Copyright:
© 2022 The Author(s)
PY - 2022/7
Y1 - 2022/7
N2 - We demonstrate the first climate-scale, numerical ocean simulations improved through distributed, online inference of Deep Neural Networks (DNN) using SmartSim. SmartSim is a library dedicated to enabling online analysis and Machine Learning (ML) for high performance, numerical simulations. In this paper, we detail the SmartSim architecture and provide benchmarks including online inference with a shared ML model, EKE-ResNet, on heterogeneous HPC systems. We demonstrate the capability of SmartSim by using it to run a 12-member ensemble of global-scale, high-resolution ocean simulations, each spanning 19 compute nodes, all communicating with the same ML architecture at each simulation timestep. In total, 970 billion inferences are collectively served by running the ensemble for a total of 120 simulated years. The inferences are used to predict the oceanic eddy kinetic energy (EKE), which is a variable that is used to tune different turbulence closures in the model and thus directly affects the simulation. The root-mean-square of the error in EKE (as compared to an eddy-resolving simulation) is 20% lower when using the ML-prediction than the previous state of the art. This demonstration is an example of how machine learning methods can be integrated into traditional numerical simulations, replace prognostic equations, and preserve overall simulation stability without significantly affecting the time to solution.
AB - We demonstrate the first climate-scale, numerical ocean simulations improved through distributed, online inference of Deep Neural Networks (DNN) using SmartSim. SmartSim is a library dedicated to enabling online analysis and Machine Learning (ML) for high performance, numerical simulations. In this paper, we detail the SmartSim architecture and provide benchmarks including online inference with a shared ML model, EKE-ResNet, on heterogeneous HPC systems. We demonstrate the capability of SmartSim by using it to run a 12-member ensemble of global-scale, high-resolution ocean simulations, each spanning 19 compute nodes, all communicating with the same ML architecture at each simulation timestep. In total, 970 billion inferences are collectively served by running the ensemble for a total of 120 simulated years. The inferences are used to predict the oceanic eddy kinetic energy (EKE), which is a variable that is used to tune different turbulence closures in the model and thus directly affects the simulation. The root-mean-square of the error in EKE (as compared to an eddy-resolving simulation) is 20% lower when using the ML-prediction than the previous state of the art. This demonstration is an example of how machine learning methods can be integrated into traditional numerical simulations, replace prognostic equations, and preserve overall simulation stability without significantly affecting the time to solution.
KW - Climate modeling
KW - Deep learning
KW - High performance computing
KW - Numerical simulation
KW - SmartSim
UR - https://www.scopus.com/pages/publications/85132546131
U2 - 10.1016/j.jocs.2022.101707
DO - 10.1016/j.jocs.2022.101707
M3 - Article
AN - SCOPUS:85132546131
SN - 1877-7503
VL - 62
JO - Journal of Computational Science
JF - Journal of Computational Science
M1 - 101707
ER -