Comment on Williams (2025): “Friends don't let friends use NSE or KGE for hydrologic model accuracy evaluation: A rant with data and suggestions for better practice”

  • Martyn P. Clark
  • , Wouter J.M. Knoben
  • , Diana Spieler
  • , Gaby J. Gründemann
  • , Cyril Thébault
  • , Nicolás A. Vásquez
  • , Andrew W. Wood
  • , Yalan Song
  • , Chaopeng Shen
  • , Shaun Carney
  • , Katie van Werkhoven

Research output: Contribution to journalLetterpeer-review

Abstract

Williams (2025), hereafter W25, raises valid concerns about the widespread use of the Nash–Sutcliffe Efficiency (NSE) and Kling–Gupta Efficiency (KGE) metrics in hydrologic model evaluation, arguing that these skill scores confound model accuracy with flow variability and should be replaced by error-based metrics such as the Root Mean Squared Error (RMSE) and the Mean Absolute Error (MAE). While we agree that model evaluation often lacks critical interpretation, we disagree that abandoning skill scores offers a constructive path forward. In this commentary, we discuss three main limitations in the W25 paper. First, we contend that W25 gives little attention to the broader literature on hydrologic model evaluation, leaving its recommendations weakly grounded in existing research. Second, we note that W25's recommendation to replace skill scores with error-based metrics such as RMSE does not resolve the underlying issue: both skill scores and error-based metrics conflate spatial variations in model accuracy with variations in flow variability. Third, we suggest that W25 overlooks the value of NSE and KGE in supporting standardized test environments that enable consistent model comparison. More generally, we argue that the W25 paper points the field in less productive directions for future research – simply replacing NSE and KGE with error-based metrics does not help the community address the core challenges in hydrologic model evaluation.

Original languageEnglish
Article number106869
JournalEnvironmental Modelling and Software
Volume197
DOIs
StatePublished - Feb 2026
Externally publishedYes

Keywords

  • Benchmarking
  • Diagnostic evaluation
  • Hydrologic model evaluation
  • Kling–Gupta efficiency
  • Model comparison
  • Nash–Sutcliffe efficiency
  • Performance metrics

Fingerprint

Dive into the research topics of 'Comment on Williams (2025): “Friends don't let friends use NSE or KGE for hydrologic model accuracy evaluation: A rant with data and suggestions for better practice”'. Together they form a unique fingerprint.

Cite this