Molecular descriptors - Scaling
Molecular descriptors are usually scaled before they are used for QSAR/qSAR modeling. This is to ensure that all descriptors have equal potential to affect the QSAR/qSAR model (Livingstone 1995). There are four main types of descriptor scaling: autoscaling (Livingstone 1995), range scaling (Livingstone 1995), feature weighting (Livingstone 1995) and Pareto scaling (Eriksson et al. 2001). Autoscaling and range scaling are the two most common types of descriptor scaling methods used in QSAR/qSAR modeling.
Autoscaling
In autoscaling, the mean is subtracted from the descriptor values and the resultant values are divided by the standard deviation:

where X’ij is the new scaled value for descriptor j of compound i and Xj and sigmaj are the mean and standard deviation of descriptor j respectively. The autoscaled descriptors have a mean of zero and a standard deviation of one. The advantage of autoscaling is that it is less susceptible to effects of compounds with extreme values because they are mean centred. In addition, variance of one is useful in variance-related methods since they each contribute one unit of variance to the overall variance of a dataset.
Range scaling (Normalization)
In range scaling, the minimum value of the descriptor is subtracted from the descriptor values and the resultant values are divided by the range:

where Xj,min and Xj,max are the minimum and maximum value of descriptor j respectively. The range-scaled descriptors have a minimum and maximum value of -1 and 1 respectively. Range scaling can be carried out over any preferred range by multiplication of the range-scaled values by a factor. The disadvantage of range scaling is that it is dependent on the minimum and maximum values of the descriptors, which makes it very sensitive to outliers.
References
- Eriksson L, Johansson E, Kettaneh-Wold N and Wade KM (2001). Multi- and megavariate data analysis - Principles and applications. Umea, Sweden, Umetrics AB.
- Livingstone DJ (1995). Data pre-treatment. Data analysis for chemists: Applications to QSAR and chemical product design. Oxford, Oxford University Press: 48-64. Share This