k nearest neighbour (kNN)
kNN is a basic instance-based method and was introduced by Fix and Hodges (Fix et al. 1951). kNN measures the Euclidean distance between a given compound with vector x and each compound in the training set with individual vector xi (Fix et al. 1951; Johnson et al. 1982). The Euclidean distances for the vector pairs are calculated using the following formula:
A total of k number of training compounds nearest to the given compound is used to determine its data class:
where sigma(a,b)=1 if a=b and sigma(a,b)=0 if a!=b, argmax is the maximum of the function, V is a finite set of data classes. k is usually an odd number to prevent ambiguity in the estimation of y.
References
- Fix E and Hodges JL (1951). Discriminatory analysis: Non-parametric discrimination: Consistency properties. Texas, USAF School of Aviation Medicine, Randolph Field: 261-279.
- Johnson RA and Wichern DW (1982). Applied multivariate statistical analysis. Englewood Cliffs, NJ, Prentice Hall. Share This

