TY - JOUR
T1 - Toward Emulating an Explicit Organic Chemistry Mechanism With Random Forest Models
AU - Mouchel-Vallon, Camille
AU - Hodzic, Alma
N1 - Publisher Copyright:
© 2023. The Authors.
PY - 2023/5/27
Y1 - 2023/5/27
N2 - Predicting secondary organic aerosol (SOA) formation relies either on extremely detailed, numerically expensive models accounting for the condensation of individual species or on extremely simplified, numerically affordable models parameterizing SOA formation for large-scale simulations. In this work, we explore the possibility of creating a random forest to reproduce the behavior of a detailed atmospheric organic chemistry model at a fraction of the numerical cost. A comprehensive data set was created based on thousands of individual detailed simulations, randomly initialized to account for the variety of atmospheric chemical environments. Recurrent random forests were trained to predict organic matter formation from dodecane and toluene precursors, and the partitioning between gas and particle phases. Validation tests show that the random forests perform well without any divergence over 10 days of simulations. The distribution of errors shows that the sampling of initial conditions for the training simulations needs to focus on chemical regimes where SOA production is the most sensitive. Sensitivity tests show that specializing multiple random forests for a specific chemical regime is not more efficient than training a single general random forest for the entire data set. The most important predictors are those providing information about the chemical regime, oxidants levels, and existing organic mass. The choice of predictors is crucial as using too many unimportant predictors reduces the performances of the random forests.
AB - Predicting secondary organic aerosol (SOA) formation relies either on extremely detailed, numerically expensive models accounting for the condensation of individual species or on extremely simplified, numerically affordable models parameterizing SOA formation for large-scale simulations. In this work, we explore the possibility of creating a random forest to reproduce the behavior of a detailed atmospheric organic chemistry model at a fraction of the numerical cost. A comprehensive data set was created based on thousands of individual detailed simulations, randomly initialized to account for the variety of atmospheric chemical environments. Recurrent random forests were trained to predict organic matter formation from dodecane and toluene precursors, and the partitioning between gas and particle phases. Validation tests show that the random forests perform well without any divergence over 10 days of simulations. The distribution of errors shows that the sampling of initial conditions for the training simulations needs to focus on chemical regimes where SOA production is the most sensitive. Sensitivity tests show that specializing multiple random forests for a specific chemical regime is not more efficient than training a single general random forest for the entire data set. The most important predictors are those providing information about the chemical regime, oxidants levels, and existing organic mass. The choice of predictors is crucial as using too many unimportant predictors reduces the performances of the random forests.
KW - explicit organic chemistry
KW - machine learning
KW - organic aerosol modeling
UR - https://www.scopus.com/pages/publications/85160409492
U2 - 10.1029/2022JD038227
DO - 10.1029/2022JD038227
M3 - Article
AN - SCOPUS:85160409492
SN - 2169-897X
VL - 128
JO - Journal of Geophysical Research: Atmospheres
JF - Journal of Geophysical Research: Atmospheres
IS - 10
M1 - e2022JD038227
ER -