A QSAR for the hydroxyl radical reaction rate constant: validation, domain
of application, and prediction
Öberg, T.
Atmospheric Environment 39, 2189-2200 (2005)
Abstract
A large number of anthropogenic organic chemicals are emitted into the
troposphere. Reactions with the hydroxyl radical are a dominant removal pathway
for most organic compounds, but experimentally determined gas-phase reaction
rate constants are only available for about 750 compounds. The lack of
experimental data increases the importance of applying quantitative structure–activity
relationships (QSAR) to evaluate and predict reactivities. It is generally
acknowledged that these empirical relationships are valid only within the same
domain for which they were developed. However, model validation is sometimes
neglected and the application domain is not always well defined. The purpose of
this paper is to outline how validation and domain definition can facilitate the
modeling and prediction of the hydroxyl radical reaction rates for a large
database. A substantial number of theoretical descriptors (867) were generated
from 2D molecular structures for compounds present in the Syracuse Research
Corporation's PhysProp Database. A QSAR model was developed for the hydroxyl
radical reaction rate constant using a projection-based regression technique,
partial least squares regression (PLSR). The PLSR model was subsequently
validated with an external test set. The main factors of variation could be
attributed to two reaction pathways, hydrogen atom abstraction and addition to
double bonds or aromatic systems. A set of 17 293 compounds, drawn from the
PhysProp Database, was projected onto the PLSR model and 74% were inside the
applicability domain. The predicted hydroxyl reaction rates for 25% of these
compounds were slow or negligible, with atmospheric half-lives in the range from
days to years. Finally, the list of persistent organic compounds was matched
against the OECD list of high production volume chemicals (HPVC). Together with
the experimental data, nearly three hundred compounds were identified as both
persistent and in high volume production.
|
|