TY - JOUR
T1 - Systematic Benchmarking of Climate Models
T2 - Methodologies, Applications, and New Directions
AU - Hassler, Birgit
AU - Hoffman, Forrest M.
AU - Beadling, Rebecca
AU - Blockley, Ed
AU - Huang, Bo
AU - Lee, Jiwoo
AU - Lembo, Valerio
AU - Lewis, Jared
AU - Lu, Jianhua
AU - Madaus, Luke
AU - Malinina, Elizaveta
AU - Medeiros, Brian
AU - Pokam, Wilfried
AU - Scoccimarro, Enrico
AU - Swaminathan, Ranjini
N1 - Publisher Copyright:
© 2026 His Majesty the King in Right of Canada. Crown copyright. Oak Ridge National Laboratory. Climate Resource and The Author(s). Reproduced with the permission of the Minister of Environment and Climate Change. This article is published with the permission of the Controller of HMSO and the King's Printer for Scotland.
PY - 2026/3
Y1 - 2026/3
N2 - As climate models become increasingly complex, there is a growing need to comprehensively and systematically assess model performance with respect to observations. Given the increasing number and diversity of climate model simulations in use, the community has moved beyond simple model intercomparison and toward developing methods capable of benchmarking a large number of simulations against a suite of climate metrics. Here, we present a detailed review of evaluation and benchmarking methods and approaches developed in the last decade, focusing primarily on scientific implications for Coupled Model Intercomparison Project (CMIP) simulations and CMIP6 results that contributed to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6). Based on this review, we explain the resulting contemporary philosophy of model benchmarking, and provide clear distinctions and definitions of the terms model verification, process validation, evaluation, and benchmarking. While significant progress has been made in model development based on systematic evaluation and benchmarking efforts, some climate system biases still remain. The development of open-source community software packages has played a fundamental role in identifying areas of significant model improvement and bias reduction. We review the key features of several software packages that have been commonly used over the past decade to evaluate and benchmark global and regional climate models. Additionally, we discuss best practices for the selection of evaluation and benchmarking metrics and for interpreting the obtained results, the importance of selecting suitable sources of reference data and accurate uncertainty quantification.
AB - As climate models become increasingly complex, there is a growing need to comprehensively and systematically assess model performance with respect to observations. Given the increasing number and diversity of climate model simulations in use, the community has moved beyond simple model intercomparison and toward developing methods capable of benchmarking a large number of simulations against a suite of climate metrics. Here, we present a detailed review of evaluation and benchmarking methods and approaches developed in the last decade, focusing primarily on scientific implications for Coupled Model Intercomparison Project (CMIP) simulations and CMIP6 results that contributed to the Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6). Based on this review, we explain the resulting contemporary philosophy of model benchmarking, and provide clear distinctions and definitions of the terms model verification, process validation, evaluation, and benchmarking. While significant progress has been made in model development based on systematic evaluation and benchmarking efforts, some climate system biases still remain. The development of open-source community software packages has played a fundamental role in identifying areas of significant model improvement and bias reduction. We review the key features of several software packages that have been commonly used over the past decade to evaluate and benchmark global and regional climate models. Additionally, we discuss best practices for the selection of evaluation and benchmarking metrics and for interpreting the obtained results, the importance of selecting suitable sources of reference data and accurate uncertainty quantification.
KW - CMIP
KW - climate models
KW - model benchmarking
KW - model evaluation
UR - https://www.scopus.com/pages/publications/105029813559
U2 - 10.1029/2025RG000891
DO - 10.1029/2025RG000891
M3 - Review article
AN - SCOPUS:105029813559
SN - 8755-1209
VL - 64
JO - Reviews of Geophysics
JF - Reviews of Geophysics
IS - 1
M1 - e2025RG000891
ER -