Probabilistic neural network (PNN)

PNN was introduced by Specht in 1990 (Specht 1990) and is a form of neural network designed for classification through the use of Bayes’ optimal decision rule:

bayes.jpg

where hi and hj are the prior probabilities, ci and cj are the costs of misclassification and fi(x) and fj(x) are the probability density function for data class i and j respectively. A given compound with vector x is classified into data class i if the product of all the three terms is greater for data class i than for any other data class j not equal to i. In most applications, the prior probabilities and costs of misclassifications are treated as being equal. The probability density function for each data class for a univariate case can be estimated by the Parzen’s nonparametric estimator (Parzen 1962):

gx.jpg

where n is the sample size, sigma is a scaling parameter which defines the width of the bell curve that surrounds each compound, W(d) is a weight function which has its largest value at d = 0 and (x – xi) is the distance between a given compound and a compound in the training set. The Parzen’s nonparametric estimator was later expanded by Cacoullos (Cacoullos 1966) for the multivariate case.

gxx.jpg

The Gaussian function is frequently used as the weight function because it is well behaved, easily calculated and satisfies the conditions required by Parzen’s estimator. Thus the probability density function for the multivariate case becomes

gx2.jpg

To simplify the equation, a single sigma that is common to all the descriptors (single-sigma model) can be used instead of an individual sigma for each descriptor (multi-sigma model). Single-sigma models could be computed faster and can produce reasonable models when all the descriptors are of approximately equal importance. However, multi-sigma models are more general than single-sigma model and are useful when descriptors are of different nature and importance (Masters 1995).

PNN can be implemented as a neural network (Masters 1995), which is shown in the figure below. The network architecture of a PNN is determined by the number of compounds and descriptors in the training set. There are 4 layers in a PNN. The input layer provides input values to all neurons in the pattern layer and has as many neurons as the number of descriptors in the training set. The number of pattern neurons is determined by the total number of compounds in the training set. Each pattern neuron computes a distance measure between the input compound and the training compound represented by that neuron and then subjects the distance measure to the Parzen’s nonparameteric estimator. The summation layer has a neuron for each data class and the neurons sum all the pattern neurons’ output corresponding to members of that summation neuron’s data class to obtain the estimated probability density function for that data class. The single neuron in the output layer then determines the final data class of the input compound by comparing all the probability density functions from the summation neurons and choosing the data class with the highest value for the probability density function.

pnn.jpg

References

  • Cacoullos T (1966). Estimation of a multivariate density. Annals of the Institute of Statistical Mathematics 18: 179-189.
  • Masters T (1995). Advanced algorithms for neural networks : a C++ sourcebook. New York, Wiley.
  • Parzen E (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics 33(3): 1065-1076.
  • Specht DF (1990). Probabilistic neural networks. Neural Networks 3(1): 109-118.
  • Share This

Leave a Reply


Close
E-mail It