Consensus methods for classification models

I have used two types of consensus methods in my research (Yap et al. 2005). The first is a ‘positive majority’ consensus method, which classifies a compound as positive if the majority of the models classify the compound as positive (Eriksson et al. 2003). This consensus method requires an odd number of models to prevent ambiguity in its prediction. The second is a ‘positive probability’ consensus method, which explicitly computes the probability for a compound to be positive using the following formulas (McDowell et al. 2002):

prpos.jpg … (1)
prneg.jpg … (2)

where pr.jpgis the posterior probability that a compound is positive given the classification result from model i and alphapos.jpg and alphaneg.jpg is the sensitivity and specificity of model i respectively. Equation (1) or (2) was used when model i classifies the compound as positive or negative respectively. In the absence of the knowledge about the ratio of positive to negative compounds in the population, the prior probability of a compound to be positive can be tentatively set at 0.5. In practice, the actual value for the prior probability is unimportant if a large number of models are used for the consensus process.

References

  • Eriksson L, Jaworska J, Cronin M, Worth A, Gramatica P and McDowell R (2003). Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. Environmental Health Perspectives 111(10): 1361-1375.
  • McDowell R and Jaworska J (2002). Bayesian analysis and inference from QSAR predictive model results. SAR and QSAR in Environmental Research 13: 111-125.
  • Yap CW and Chen YZ (2005). Prediction of cytochrome P450 3A4, 2D6, and 2C9 inhibitors and substrates by using support vector machines. Journal of Chemical Information and Modeling 45(4): 982-992.
Share This

Leave a Reply


Close
E-mail It