
Share This
This entry was posted by Yap Chun Wei
on Wednesday, January 28th, 2009 at 10:38 am and is filed under Data mining, Pharmacy.
You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.
January 30th, 2009 at 5:35 pm
Nice!
Long time ago I was stumble upon the term “external test set”. So, what is the difference between the validation and testing types of dataset?
January 30th, 2009 at 10:21 pm
Testing set is used to help fine tune the model. For example, when using k-nearest neighbour to build a model, we need to find the optimum k. In order to do that, we will use different values of k to build different models and compare the performance of the models on the testing set.
Validation set is only used to assess the performance of the final model (i.e. the one with the optimum k). It is not used at any stage during model development.
Some researchers prefer to switch the two terms, using validation set to mean the dataset that is used to fine tune model parameters and testing set to assess the final model.
But generally, if the dataset (whether validation or testing) is prefixed by “external”, it will mean that that dataset is used to assess the final model.