Recursive feature elimination (RFE)

It has been suggested that the ranking criterion for descriptor selection can be formulated from the variation in an objective function upon removing each descriptor (Kohavi et al. 1997). In order to improve the efficiency of support vector machine (SVM) training, this objective function is represented by a cost function J for the i-th descriptor and it is computed by using the training set only. When the i-th descriptor is removed or its weight wi is reduced to zero, the variation of the cost function DJ(i) is given by

dj.jpg

The case of Dwi = wi - 0 corresponds to the removal of descriptor i.

Guyon et al have used RFE to reduce the descriptors of a linear SVM classification system for cancer detection from gene selection data (Guyon et al. 2002). In the corresponding linear SVM classifier, the cost function is

j.jpg

where l is an m dimensional identity vector (m is the number of compounds in the training set). Therefore DJ(i) = (1/2) wi2 and wi2 can be used as a descriptor ranking criterion. Yu et al have used RFE to reduce the descriptors of a non-linear SVM classification system of polynomial kernels for prediction of drug activity (Yu et al. 2003). However, because of the diversity and complexity of the compounds to be classed, the use of linear and polynomial kernels may not always be sufficient for accurate prediction of various pharmaceutical and biological properties. Thus SVM classification systems of Gaussian kernels were frequently used. In this case, the cost function to be minimized, under the constraints 0 ≤ αk ≤ C and ay.jpg, is

j2.jpg

where H is the matrix with elements yi yj exp(-||xi - xj||2/(2σ2)).

To compute the variation in the cost function upon removal of input component i, the parameters αs were kept unchanged and the matrix H was re-computed. The resulting ranking coefficient is

dj2.jpg

where H(-i) is the matrix computed by using the same method as that of matrix H but with its i-th component removed. One or more of the descriptors with the smallest DJ(i) can thus be eliminated.

References

  • Guyon I, Weston J, Barnhill S and Vapnik V (2002). Gene selection for cancer classification using support vector machines. Machine Learning 46(1-3): 389-422.
  • Kohavi R and John GH (1997). Wrappers for feature subset selection. Artificial Intelligence 97(1-2): 273-324.
  • Yu H, Yang J, Wang W and Han J (2003). Discovering compact and highly discriminative features or feature combinations of drug activities using support vector machines. Proceeding of the IEEE computer society bioinformatics conference (CSB): 220-228.
Share This

Leave a Reply


Close
E-mail It