Weka (KnowledgeFlow) - Part V: Parameter optimization of machine learning/statistical methods

  1. Put ArffLoader component (DataSources) to layout area and configure it to load a training set from a file.
  2. Put ClassAssigner component (Filters) to layout area and connect the dataSet connection from the ArffLoader component to it.
    • Configure it by setting the classIndex to the class column.
  3. Put CrossValidationSplitMaker component (Evaluation) to layout area and connect the dataSet connection from the ClassAssigner component to it.
    • Configure it by setting the folds to 10.
  4. Put SMO component (Classifiers) to layout area and connect the trainingSet and testSet connections from the CrossValidationSplitMaker component to it.
    • Configure it by choosing RBFKernel and setting the gamma value for the kernel to 0.01.
  5. Put ClassifierPerformanceEvaluator component (Evaluation) to layout and connect the batchClassifier connection from the SMO component to it.
  6. Put TextViewer component (Visualization) to layout and connect the text connection from the ClassifierPerformanceEvaluator component to it.
  7. Run.

The above procedure shows how KnowledgeFlow can be used to train and assess the performance of a model. However, it is not possible to automatically determine the optimum parameter values (e.g. c and gamma value for the kernel) for a machine learning/statistical method (There is a GridSearch component (Classifiers) but I could not get it to work. Keep having the error “Can’t have more folds than instances” even though I am not using cross validation.). To determine the optimum parameter values, you have to do it manually by setting a parameter value, run, record the mean absolute error given in the TextViewer component, set another parameter value, run again, record the mean absolute error and so on, until you have evaluated all the parameter values that you are interested in. Then the parameter values which gives the lowest mean absolute error will be the optimum parameter value of the machine learning/statistical method for the training set.

Share This

Leave a Reply


Close
E-mail It