TY - GEN
T1 - Automatically Parallelizing Batch Inference on Deep Neural Networks Using Fiats and Fortran 2023 “Do Concurrent”
AU - Rouson, Damian
AU - Bai, Zhe
AU - Bonachea, Dan
AU - Ergawy, Kareem
AU - Gutmann, Ethan
AU - Klemm, Michael
AU - Rasmussen, Katherine
AU - Richardson, Brad
AU - Shende, Sameer
AU - Torres, David
AU - Zhang, Yunhao
N1 - Publisher Copyright:
© This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2026.
PY - 2026
Y1 - 2026
N2 - This paper introduces novel programming strategies that leverage features of the Fortran 2023 standard of the International Standards Organization (ISO) to automatically parallelize computations on deep neural networks. The paper focuses on the interplay of object-oriented, parallel, and functional programming paradigms in the Fiats deep learning library. We demonstrate how several infrequently-used language features play a role in enabling efficient, parallel execution. Specifically, the ability to explicitly declare that a procedure is pure facilitates inference in the context of the language’s loop-parallelism construct do concurrent. Also, explicitly prohibiting the overriding of a parent type’s type-bound procedures eliminates the need for dynamic dispatch in performance-critical code. Finally, this paper uses batch inference calculations on a neural network surrogate for atmospheric aerosol dynamics to demonstrate that LLVM Flang compiler’s automatic parallelization of do concurrent achieves roughly the same performance and scalability as achieved by OpenMP compiler directives. We also demonstrate that double-precision inference costs 37–72% longer runtime than default-real precision with most values in the range 57–60%.
AB - This paper introduces novel programming strategies that leverage features of the Fortran 2023 standard of the International Standards Organization (ISO) to automatically parallelize computations on deep neural networks. The paper focuses on the interplay of object-oriented, parallel, and functional programming paradigms in the Fiats deep learning library. We demonstrate how several infrequently-used language features play a role in enabling efficient, parallel execution. Specifically, the ability to explicitly declare that a procedure is pure facilitates inference in the context of the language’s loop-parallelism construct do concurrent. Also, explicitly prohibiting the overriding of a parent type’s type-bound procedures eliminates the need for dynamic dispatch in performance-critical code. Finally, this paper uses batch inference calculations on a neural network surrogate for atmospheric aerosol dynamics to demonstrate that LLVM Flang compiler’s automatic parallelization of do concurrent achieves roughly the same performance and scalability as achieved by OpenMP compiler directives. We also demonstrate that double-precision inference costs 37–72% longer runtime than default-real precision with most values in the range 57–60%.
KW - Atmospheric Sciences
KW - Deep learning
KW - Fortran
UR - https://www.scopus.com/pages/publications/105023476831
U2 - 10.1007/978-3-032-07612-0_11
DO - 10.1007/978-3-032-07612-0_11
M3 - Conference contribution
AN - SCOPUS:105023476831
SN - 9783032076113
T3 - Lecture Notes in Computer Science
SP - 135
EP - 147
BT - High Performance Computing - ISC High Performance 2025 International Workshops, Revised Selected Papers
A2 - Neuwirth, Sarah
A2 - Paul, Arnab Kumar
A2 - Weinzierl, Tobias
A2 - Carson, Erin Claire
PB - Springer Science and Business Media Deutschland GmbH
T2 - 40th International Conference on High Performance Computing, ISC High Performance 2025
Y2 - 10 June 2025 through 13 June 2025
ER -