TY - JOUR
T1 - Informing Robust Functional Relationship Benchmarks
T2 - An Evaluation of the Temperature Sensitivity of Ecosystem Respiration Across the Arctic-Boreal Region
AU - Poe, Jeralyn
AU - Huntzinger, Deborah
AU - Riley, William J.
AU - Wells, Jon M.
AU - Schuur, Edward A.G.
AU - Schwalm, Christopher
AU - Berner, Logan T.
AU - Rodenhizer, Heidi
AU - Bouskill, Nicholas J.
AU - Brovkin, Victor
AU - Burke, Eleanor J.
AU - Ciais, Philippe
AU - Georgievski, Goran
AU - Gustafson, Adrian
AU - Lawrence, David M.
AU - MacDougall, Andrew H.
AU - Mekonnen, Zelalem A.
AU - Melton, Joe R.
AU - Meyer, Gesa
AU - Pongracz, Alexandra
AU - Qiu, Chunjing
AU - Sulman, Benjamin N.
AU - Swenson, Sean C.
AU - Tao, Jing
AU - Wårlind, David
AU - Xi, Yi
AU - Yuan, Fengming
AU - Zhu, Qing
AU - Schädel, Christina
N1 - Publisher Copyright:
© 2026. The Author(s).
PY - 2026/5
Y1 - 2026/5
N2 - During land model development, simulated carbon dynamics are often benchmarked against observational data sets to evaluate model performance. Functional relationship benchmarks are the relationship between a driving variable (e.g., temperature) and a response variable (e.g., ecosystem respiration) and are a promising tool for assessing model performance by evaluating modeled sensitivities to changing environmental conditions. However, observed functional relationships can be influenced by choices made during data collection and throughout the benchmarking process, impacting the inferred skill of land models. To avoid misrepresenting a model's true performance, it is necessary to systematically evaluate best practices when constructing functional relationship benchmarks. We developed a set of guidelines for constructing functional relationship benchmarks, considering the choice of data set, number of daily observations, temporal extent, and temporal resolution across Alaska and Canada over a 20-year period from 2001 to 2020. The temperature sensitivity of ecosystem respiration from observations, evaluated through an apparent Q10, is highly variable both spatially and as a result of the data processing approach applied in the benchmark formation. When benchmarking 13 models from the Warming Permafrost Model Intercomparison Project (WrPMIP), the range in inferred model skill is substantially impacted by the choices applied in constructing functional relationship benchmarks. The inferred performance of a given model is most sensitive to the number of daily observations and temporal extent, followed by choice of benchmark data set and temporal averaging. Results from this analysis can guide the development of consistent and robust functional relationships for future model evaluation studies.
AB - During land model development, simulated carbon dynamics are often benchmarked against observational data sets to evaluate model performance. Functional relationship benchmarks are the relationship between a driving variable (e.g., temperature) and a response variable (e.g., ecosystem respiration) and are a promising tool for assessing model performance by evaluating modeled sensitivities to changing environmental conditions. However, observed functional relationships can be influenced by choices made during data collection and throughout the benchmarking process, impacting the inferred skill of land models. To avoid misrepresenting a model's true performance, it is necessary to systematically evaluate best practices when constructing functional relationship benchmarks. We developed a set of guidelines for constructing functional relationship benchmarks, considering the choice of data set, number of daily observations, temporal extent, and temporal resolution across Alaska and Canada over a 20-year period from 2001 to 2020. The temperature sensitivity of ecosystem respiration from observations, evaluated through an apparent Q10, is highly variable both spatially and as a result of the data processing approach applied in the benchmark formation. When benchmarking 13 models from the Warming Permafrost Model Intercomparison Project (WrPMIP), the range in inferred model skill is substantially impacted by the choices applied in constructing functional relationship benchmarks. The inferred performance of a given model is most sensitive to the number of daily observations and temporal extent, followed by choice of benchmark data set and temporal averaging. Results from this analysis can guide the development of consistent and robust functional relationships for future model evaluation studies.
UR - https://www.scopus.com/pages/publications/105038356759
U2 - 10.1029/2025JG009307
DO - 10.1029/2025JG009307
M3 - Article
AN - SCOPUS:105038356759
SN - 2169-8953
VL - 131
JO - Journal of Geophysical Research: Biogeosciences
JF - Journal of Geophysical Research: Biogeosciences
IS - 5
M1 - e2025JG009307
ER -