KNIME - Part V: Parameter optimization of machine learning/statistical methods
- Put File Reader node (IO->Read) to workbench and configure it to load a training set from a file.
- Put Cross validation node (Meta) to workbench and connect the output port from the File Reader node to its input port.
- Configure it by setting Number of validations to 10.
- Ensure the Random sampling box is checked.
- Select the correct column for the Column with class labels listbox.
- Open the Cross validation node Meta-workflow editor.
- Put K Nearest Neighbor node (Mining->Misc Classifiers) on the editor and connect the training data and test data output ports of the X-Partitioner node to the training data and test data input ports of the K Nearest Neighbor node, respectively.
- Configure it by setting the Number of neighbours to consider (k) to 3.
- Exit the Meta-workflow editor and put Statistics View node (Statistics) to workbench and connect the error rates port from the Cross validation node to its input port.
- Execute all nodes.
The above procedure shows how KNIME can be used to train and assess the performance of a model. However, it is not possible to automatically determine the optimum parameter value (e.g. Number of neighbours to consider (k) in the above procedure) for a machine learning/statistical method (According to their forum, this feature may be available in version 2.0). To determine the optimum parameter value, you have to do it manually by setting a parameter value, run all the nodes, record the mean error rates given in the Statistics View node, set another parameter value, run all the nodes again, record the mean error rates and so on, until you have evaluated all the parameter values that you are interested in. Then the parameter value which gives the lowest mean error rates will be the optimum parameter value of the machine learning/statistical method for the training set.
Share This
March 3rd, 2009 at 7:31 pm
Hello webmaster
I would like to share with you a link to your site
write me here preonrelt@mail.ru
January 2nd, 2011 at 11:10 pm
hi!
for a k means clustering you can run a brute force thing collecting the errors by execute the kmeans in a loop. each time the loop is run k gets k+1. then watch the errors on a plot. the ellbow will give you a good k for k-means. I programmed a node which detects this elbow automatically.
December 22nd, 2011 at 7:40 pm
free website audit report
[…]KNIME - Part V: Parameter optimization of machine learning/statistical methods | pharmine[…]