TY - JOUR
T1 - Novel Application of Machine Learning Techniques for Rapid Source Apportionment of Aerosol Mass Spectrometer Datasets
AU - Pande, Paritosh
AU - Shrivastava, Manish
AU - Shilling, John E.
AU - Zelenyuk, Alla
AU - Zhang, Qi
AU - Chen, Qi
AU - Ng, Nga Lee
AU - Zhang, Yue
AU - Takeuchi, Masayuki
AU - Nah, Theodora
AU - Rasool, Quazi Z.
AU - Zhang, Yuwei
AU - Zhao, Bin
AU - Liu, Ying
N1 - Publisher Copyright:
© 2022 American Chemical Society. All rights reserved.
PY - 2022/4/21
Y1 - 2022/4/21
N2 - We apply machine learning approaches sparse multinomial logistic regression to classify aerosol mass spectrometer (AMS) unit mass resolution (UMR) data followed by an ensemble regression technique for source apportionment of organic aerosols (OA). The classifier was trained on 60 well characterized laboratory and positive matrix factorization (PMF) deconvolved reference spectra to identify eight OA types. These include four laboratory-derived secondary organic aerosol (SOA) spectra, which include isoprene photooxidation SOA, isoprene epoxydiols (IEPOX) SOA, a monoterpene SOA type that includes α-pinene and β-pinene SOA, and aromatic SOA from oxidation of naphthalene and m-xylene precursors, as well as PMF deconvolved spectra for three primary organic aerosol (POA) types, namely, hydrocarbon-like organic aerosol (HOA), biomass burning organic aerosol (BBOA), and cooking OA (COA), and a more oxidized oxygenated OA type (MO-OOA). A 5-fold cross-validation strategy, repeated 10 times, was used to assess the classifier's performance. The classifier had high classification accuracy for COA, aromatic SOA, and isoprene SOA spectra but incorrectly classified ∼9% by number of MO-OOA spectra as BBOA, 12% of BBOA spectra as HOA (and vice versa), and 18% of IEPOX-SOA spectra as aromatic SOA. Next, an ensemble regression model was trained on an artificially generated dataset consisting of mixtures of different OA types to assess its ability to predict fractional mass abundances from classification probabilities of various OA species obtained from the multinomial logistic regression classifier trained on the reference spectra. Ultimately, the proposed approach was applied for source apportionment of aircraft-based AMS measurements of OA UMR spectra during the HI-SCALE field campaign. On two representative days (May 6th and 18th, 2016), the algorithm determined that ∼50-60% of OA by mass was MO-OOA, which represented a highly aged organic aerosol mixture from different sources. On both days, BBOA was determined to contribute less than 10% to OA by mass. However, on May 18th, the aromatic SOA fraction was higher compared to that on May 6th. The proposed approach is capable of rapidly analyzing AMS data in real time, making it suitable for applications where rapid source apportionment of AMS OA spectra is desirable.
AB - We apply machine learning approaches sparse multinomial logistic regression to classify aerosol mass spectrometer (AMS) unit mass resolution (UMR) data followed by an ensemble regression technique for source apportionment of organic aerosols (OA). The classifier was trained on 60 well characterized laboratory and positive matrix factorization (PMF) deconvolved reference spectra to identify eight OA types. These include four laboratory-derived secondary organic aerosol (SOA) spectra, which include isoprene photooxidation SOA, isoprene epoxydiols (IEPOX) SOA, a monoterpene SOA type that includes α-pinene and β-pinene SOA, and aromatic SOA from oxidation of naphthalene and m-xylene precursors, as well as PMF deconvolved spectra for three primary organic aerosol (POA) types, namely, hydrocarbon-like organic aerosol (HOA), biomass burning organic aerosol (BBOA), and cooking OA (COA), and a more oxidized oxygenated OA type (MO-OOA). A 5-fold cross-validation strategy, repeated 10 times, was used to assess the classifier's performance. The classifier had high classification accuracy for COA, aromatic SOA, and isoprene SOA spectra but incorrectly classified ∼9% by number of MO-OOA spectra as BBOA, 12% of BBOA spectra as HOA (and vice versa), and 18% of IEPOX-SOA spectra as aromatic SOA. Next, an ensemble regression model was trained on an artificially generated dataset consisting of mixtures of different OA types to assess its ability to predict fractional mass abundances from classification probabilities of various OA species obtained from the multinomial logistic regression classifier trained on the reference spectra. Ultimately, the proposed approach was applied for source apportionment of aircraft-based AMS measurements of OA UMR spectra during the HI-SCALE field campaign. On two representative days (May 6th and 18th, 2016), the algorithm determined that ∼50-60% of OA by mass was MO-OOA, which represented a highly aged organic aerosol mixture from different sources. On both days, BBOA was determined to contribute less than 10% to OA by mass. However, on May 18th, the aromatic SOA fraction was higher compared to that on May 6th. The proposed approach is capable of rapidly analyzing AMS data in real time, making it suitable for applications where rapid source apportionment of AMS OA spectra is desirable.
KW - SOA
KW - aerosol mass spectrometer
KW - classification
KW - ensemble regression
KW - logistic regression
KW - machine learning
KW - source apportionment
UR - https://www.scopus.com/pages/publications/85128340589
U2 - 10.1021/acsearthspacechem.1c00344
DO - 10.1021/acsearthspacechem.1c00344
M3 - Article
AN - SCOPUS:85128340589
SN - 2472-3452
VL - 6
SP - 932
EP - 942
JO - ACS Earth and Space Chemistry
JF - ACS Earth and Space Chemistry
IS - 4
ER -