KNIME - Part VI: Model validation using cross-validation and/or independent validation set.
The previous post already provides the steps for model validation using cross-validation. So how do we validate a model using an independent validation set?
Develop a model
- Put File Reader node (IO->Read) to workbench and configure it to load a training set from a file.
- Put SVM Learner node (Mining->SVM) to workbench and connect the output port from the File Reader node to its input port.
- Configure it by selecting the correct column for the Class column listbox.
- Set the kernel and parameter values to the optimum values that have been determined by the parameter optimization procedure.
- Put Model Writer node (IO->Write) to workbench and connect the model port of the SVM Learner node to its input port. Configure it to save the model to a file.
- Execute all nodes.
Validate model on an independent validation set
- Put File Reader node (IO->Read) to workbench and configure it to load an independent validation set from a file.
- Put SVM Predictor node (Mining->SVM) to workbench and connect the output port from the File Reader node to its test data input port.
- Put Model Reader node (IO->Read) to workbench and connect its output port to the model port of the SVM Predictor node. Configure it to load the model that is saved in the model development phase.
- Put Cross validation node (Meta) to workbench and open the Cross validation node Meta-workflow editor.
- Copy and paste the Aggregator node to the workbench.
- Exit the Meta-workflow editor and delete the Cross validation node from the workbench.
- Connect the output port from the SVM Predictor node to the input port of the Aggregator node.
- Execute all nodes.
It can be seen that KNIME is able to validate a model using either cross-validation or an independent validation set. However, it is rather limited in the number of available error measurement methods. For example, for classification problems, it does not have sensitivity and specificity measurements, and for regression problems, it does not have r2 or mean square error.
Share This
January 28th, 2009 at 1:48 pm
Great! Thank you very much!
I always wanted to write in my site something like that. Can I take part of your post to my blog?
Of course, I will add backlink?
Sincerely, Your Reader
February 6th, 2009 at 11:24 am
Hi. Your site displays incorrectly in Opera, but content excellent! Thanks for your wise words =)
February 6th, 2009 at 12:02 pm
Hi Timur,
No problem, you can put part of the post in your blog.
January 2nd, 2011 at 11:17 pm
there are error measures in the weka addons. but they are just presented in the view, not in the output data, so you can’t work with them in loops =(.
to calculate error measures with the knime nodes you can do it by some math formular modules, counting resultmeans and stuff. you can then calc the mse, sse and calc r2 or even a nonlinar correlation coefficient.
September 20th, 2011 at 1:12 pm
It’s a shame you don’t have a donate button! I’d definitely donate to this brilliant blog! I suppose for now i’ll settle for bookmarking and adding your RSS feed to my Google account. I look forward to brand new updates and will talk about this blog with my Facebook group. Talk soon!