Functional dependence study of QSAR models
A functional dependence study can provide insights on the type of molecular characteristics that are important for a particular biological property and how changes in these molecular characteristics affect the biological property. This information is useful for guiding structural changes during computer-aided drug design so that the desired biological property can be obtained. It is also useful for validating a QSAR model. A valid QSAR model should be consistent with previous findings of important factors that affect the biological property.
For QSAR models developed from linear modeling methods, the descriptors are either positively or negatively correlated to biological properties in a linear relationship. In contrast, descriptors in models developed by using machine learning methods correlate to biological properties in a non-linear relationship. Thus these models can potentially provide more information about the relationships between descriptors and biological properties.
The relationships between descriptors and biological properties can be obtained by using functional dependence plots where the value of a single descriptor is varied through its range, while all other descriptors are held constant at a certain value (Wessel et al. 1998). However, QSAR models usually contain descriptors that are correlated with one another and these intercorrelations can drastically alter the shape of a functional dependence plot if the values of the descriptors that are held constant are changed (Andrea et al. 1991). In addition, descriptors may encode multiple physicochemical and structural aspects of the molecule. This makes it difficult to determine the relationship between a specific molecular characteristic and an biological property.
Principal component analysis (PCA) can be used to overcome both problems (Yap et al. 2005). PCA can extract dominant patterns in the descriptor subsets and group similar descriptors under a single principal component (PC). Different PCs encode different molecular characteristics and the orthogonality among the PCs can be exploited to determine the correlation between a molecular characteristic and a biological property without the influence of other molecular characteristics. A descriptor may belong to multiple PCs and the explained variations of a descriptor in each PC can be used to determine its level of contribution in the PCs (Eriksson et al. 2001). Artificial testing sets can be created to determine the relationship between the PCs and biological property. Each artificial testing set contains 1000 artificial compounds and initially used PCs as descriptors. The PC to be evaluated is varied uniformly from -5 to 5 while all of the other PCs are assigned a value of zero. The loadings derived from PCA are then used to transform the PCs back to the original molecular descriptors. Artificial compounds with molecular descriptors outside the range of the corresponding descriptor in the training set are removed to prevent extrapolation of the model. The values of the biological property of the remaining artificial compounds are predicted by using the developed QSAR models. Functional dependence plots of the biological property against the PCs can then be used to find the trends between various molecular characteristics and the biological property.
References
- Andrea TA and Kalayeh H (1991). Applications of neural networks in quantitative structure-activity relationships of dihydrofolate reductase inhibitors. Journal of Medicinal Chemistry 34: 2824-2836.
- Eriksson L, Johansson E, Kettaneh-Wold N and Wade KM (2001). PCA. Multi- and megavariate data analysis - Principles and applications. Umea, Sweden, Umetrics AB: 43-70.
- Wessel MD, Jurs PC, Tolan JW and Muskal SM (1998). Prediction of human intestinal absorption of drug compounds from molecular structure. Journal of Chemical Information and Computer Sciences 38(4): 726-735.
- Yap CW and Chen YZ (2005). Quantitative structure-pharmacokinetic relationships for drug distribution properties by using general regression neural network. Journal of Pharmaceutical Sciences 94(1): 153-168.