Support vector machine (SVM)

SVM is based on the structural risk minimization principle from statistical learning theory (Vapnik 1995; Burges 1998; Evgeniou et al. 2001). A compound is represented by a vector xi which is its molecular descriptors. In linearly separable cases, SVM constructs a hyperplane which separates two data classes of compounds with a maximum margin. This is accomplished by finding another vector w and a parameter b that minimizes w2.jpg and satisfies the following conditions:

dplus.jpg Class 1 (D+)

dminus.jpg Class 2 (D–)

where yi is the data class index of compound i, w is a vector normal to the hyperplane, bw.jpg is the perpendicular distance from the hyperplane to the origin and w2.jpg is the Euclidean norm of w. After the determination of w and b, a given compound with vector x can be classified by:

ysvm.jpg

In non-linearly separable cases, SVM maps the vectors into a higher dimensional feature space using a kernel function K(xi, xj). The table below lists three different types of kernel functions which are commonly used. The Gaussian radial basis function kernel has been extensively used in a number of different studies with good results (Burbidge et al. 2001; Czerminski et al. 2001; Trotter et al. 2001).

Commonly used kernel functions

Kernel Equation
Polynomial polynomialkernel.jpg
Gaussian radial basis function rbfkernel.jpg
Sigmoidal sigmoidalkernel.jpg

Linear support vector machine is applied to this feature space and then the decision function is given by:

ysvmnonlinear.jpg

where l is the number of support vectors and the coefficients alphai0 and b are determined by maximizing the following Langrangian expression:

langrangian.jpg

under the following conditions:

a1.jpg

a2.jpg

where C is a penalty for training errors. A positive or negative value from decision function equation indicates that the compound with vector x belongs to the positive or negative data class respectively.

References

  • Burbidge R, Trotter M, Buxton B and Holden S (2001). Drug design by machine learning: support vector machines for pharmaceutical data analysis. Computers and Chemistry 26(1): 5-14.
  • Burges CJC (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2): 127-167.
  • Czerminski R, Yasri A and Hartsough D (2001). Use of support vector machine in pattern classification: Application to QSAR studies. Quantitative Structure-Activity Relationships 20(3): 227-240.
  • Evgeniou T and Pontil M (2001). Support vector machines: theory and applications. Machine learning and its applications. Advanced lectures. Paliouras G, Karkaletsis V and Spyropoulos CD. New York, Springer: 249-257.
  • Trotter MWB, Buxton BF and Holden SB (2001). Support vector machines in combinatorial chemistry. Measurement and Control 34(8): 235-239.
  • Vapnik VN (1995). The nature of statistical learning theory. New York, Springer.
  • Share This

2 Responses to “Support vector machine (SVM)”

  1. www.preiserhoehung.de Says:

    I found your topic “Support vector machine (SVM) | pharmine” when i was searching for Combinatorial chemistry and it is really intresting for me. If its OK for you i would like to translate your topic and post it on my german blog about Combinatorial chemistry. I link back to your topic of course!

  2. Yap Chun Wei Says:

    No problem, you can translate it to German.

Leave a Reply


Close
E-mail It