k nearest neighbour (kNN)

kNN is a basic instance-based method and was introduced by Fix and Hodges (Fix et al. 1951). kNN measures the Euclidean distance between a given compound with vector x and each compound in the training set with individual vector xi (Fix et al. 1951; Johnson et al. 1982). The Euclidean distances for the vector pairs are calculated using the following formula:

d.jpg

A total of k number of training compounds nearest to the given compound is used to determine its data class:

y.jpg

where sigma(a,b)=1 if a=b and sigma(a,b)=0 if a!=b, argmax is the maximum of the function, V is a finite set of data classes. k is usually an odd number to prevent ambiguity in the estimation of y.

References

  • Fix E and Hodges JL (1951). Discriminatory analysis: Non-parametric discrimination: Consistency properties. Texas, USAF School of Aviation Medicine, Randolph Field: 261-279.
  • Johnson RA and Wichern DW (1982). Applied multivariate statistical analysis. Englewood Cliffs, NJ, Prentice Hall.
  • Share This

Leave a Reply


Close
E-mail It