TY - JOUR
T1 - An Assessment of How Domain Experts Evaluate Machine Learning in Operational Meteorology
AU - Harrison, David R.
AU - McGovern, Amy
AU - Karstens, Christopher D.
AU - Bostrom, Ann
AU - Demuth, Julie L.
AU - Jirak, Israel L.
AU - Marsh, Patrick T.
N1 - Publisher Copyright:
© 2025 American Meteorological Society.
PY - 2025/3
Y1 - 2025/3
N2 - As an increasing number of machine learning (ML) products enter the research-to-operations (R2O) pipe-line, researchers have anecdotally noted a perceived hesitancy by operational forecasters to adopt this relatively new tech-nology. One explanation often cited in the literature is that this perceived hesitancy derives from the complex and opaque nature of ML methods. Because modern ML models are trained to solve tasks by optimizing a potentially complex combi-nation of mathematical weights, thresholds, and nonlinear cost functions, it can be difficult to determine how these models reach a solution from their given input. However, it remains unclear to what degree a model’s transparency may influence a forecaster’s decision to use that model or if that impact differs between ML and more traditional (i.e., non-ML) methods. To address this question, a survey was offered to forecaster and researcher participants attending the 2021 NOAA Hazardous Weather Testbed (HWT) Spring Forecasting Experiment (SFE) with questions about how participants subjectively perceive and compare machine learning products to more traditionally derived products. Results from this study revealed few differences in how participants evaluated machine learning products compared to other types of guidance. However, comparing the responses between operational forecasters, researchers, and academics exposed notable differences in what factors the three groups considered to be most important for determining the operational success of a new forecast prod-uct. These results support the need for increased collaboration between the operational and research communities.
AB - As an increasing number of machine learning (ML) products enter the research-to-operations (R2O) pipe-line, researchers have anecdotally noted a perceived hesitancy by operational forecasters to adopt this relatively new tech-nology. One explanation often cited in the literature is that this perceived hesitancy derives from the complex and opaque nature of ML methods. Because modern ML models are trained to solve tasks by optimizing a potentially complex combi-nation of mathematical weights, thresholds, and nonlinear cost functions, it can be difficult to determine how these models reach a solution from their given input. However, it remains unclear to what degree a model’s transparency may influence a forecaster’s decision to use that model or if that impact differs between ML and more traditional (i.e., non-ML) methods. To address this question, a survey was offered to forecaster and researcher participants attending the 2021 NOAA Hazardous Weather Testbed (HWT) Spring Forecasting Experiment (SFE) with questions about how participants subjectively perceive and compare machine learning products to more traditionally derived products. Results from this study revealed few differences in how participants evaluated machine learning products compared to other types of guidance. However, comparing the responses between operational forecasters, researchers, and academics exposed notable differences in what factors the three groups considered to be most important for determining the operational success of a new forecast prod-uct. These results support the need for increased collaboration between the operational and research communities.
KW - Artificial intelligence
KW - Decision making
KW - Forecasting
KW - Forecasting techniques
KW - Operational forecasting
UR - https://www.scopus.com/pages/publications/105000184495
U2 - 10.1175/WAF-D-24-0144.1
DO - 10.1175/WAF-D-24-0144.1
M3 - Article
AN - SCOPUS:105000184495
SN - 0882-8156
VL - 40
SP - 393
EP - 410
JO - Weather and Forecasting
JF - Weather and Forecasting
IS - 3
ER -