Machine-learning-based load balancing for community ice code component in CESM

Prasanna Balaprakash, Yuri Alexeev, Sheri A. Mickelson, Sven Leyffer, Robert Jacob, Anthony Craig

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

Load balancing scientific codes on massively parallel architectures is becoming an increasingly challenging task. In this paper, we focus on the Community Earth System Model, a widely used climate modeling code. It comprises six components each of which exhibits different scalability patterns. Previously, an analytical performance model has been used to find optimal load-balancing parameter configurations for each component. Nevertheless, for the Community Ice Code component, the analytical performance model is too restrictive to capture its scalability patterns. We therefore developed machine-learning-based load-balancing algorithm. It involves fitting a surrogate model to a small number of load-balancing configurations and their corresponding runtimes. This model is then used to find high-quality parameter configurations. Compared with the current practice of expert-knowledge-based enumeration over feasible configurations, the machine-learning-based load-balancing algorithm requires six times fewer evaluations to find the optimal configuration.

Original languageEnglish
Title of host publicationHigh Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Revised Selected Papers
EditorsOsni Marques, Michel Dayde, Kengo Nakajima
PublisherSpringer Verlag
Pages79-91
Number of pages13
ISBN (Print)9783319173528
DOIs
StatePublished - 2015
Event11th International Conference on High Performance Computing for Computational Science, VECPAR 2014 - Eugene, United States
Duration: Jun 30 2014Jul 3 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8969
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th International Conference on High Performance Computing for Computational Science, VECPAR 2014
Country/TerritoryUnited States
CityEugene
Period06/30/1407/3/14

Fingerprint

Dive into the research topics of 'Machine-learning-based load balancing for community ice code component in CESM'. Together they form a unique fingerprint.

Cite this