Abstract
Selecting most rigorous quantitative structure-activity relationship (QSAR)
approaches is of great importance in the development of robust and predictive
models of chemical toxicity. To address this issue in a systematic way, we have
formed an international virtual collaboratory consisting of six independent
groups with shared interests in computational chemical toxicology. We have
compiled an aqueous toxicity data set containing 983 unique compounds tested in
the same laboratory over a decade against Tetrahymena pyriformis. A
modeling set including 644 compounds was selected randomly from the original set
and distributed to all groups that used their own QSAR tools for model
development. The remaining 339 compounds in the original set (external set I) as
well as 110 additional compounds (external set II) published recently by the
same laboratory (after this computational study was already in progress) were
used as two independent validation sets to assess the external predictive power
of individual models. In total, our virtual collaboratory has developed 15
different types of QSAR models of aquatic toxicity for the training set. The
internal prediction accuracy for the modeling set ranged from 0.76 to 0.93 as
measured by the leave-one-out cross-validation correlation coefficient (Qabs2).
The prediction accuracy for the external validation sets I and II ranged from
0.71 to 0.85 (linear regression coefficient RabsI2)
and from 0.38 to 0.83 (linear regression coefficient RabsII2),
respectively. The use of an applicability domain threshold implemented in most
models generally improved the external prediction accuracy but at the same time
led to a decrease in chemical space coverage. Finally, several consensus models
were developed by averaging the predicted aquatic toxicity for every compound
using all 15 models, with or without taking into account their respective
applicability domains. We find that consensus models afford higher prediction
accuracy for the external validation data sets with the highest space coverage
as compared to individual constituent models. Our studies prove the power of a
collaborative and consensual approach to QSAR model development. The best
validated models of aquatic toxicity developed by our collaboratory (both
individual and consensus) can be used as reliable computational predictors of
aquatic toxicity and are available from any of the participating laboratories.
DOI: 10.1021/ci700443v
|
|