Weka (KnowledgeFlow) - Part II: Partitioning of dataset into training and testing sets
- Put ArffLoader component (DataSources) to layout area and configure it to load a dataset from a file.
- Put TrainTestSplitMaker component (Evaluation) to layout area and connect the dataSet connection from the ArffLoader component to it.
- Configure it by setting the trainPercent to 80.
- Put two ArffSaver components (DataSinks) on the layout area. Connect the trainingSet connection from the TrainTestSplitMaker component to the first ArffSaver component and configure it to save the training set to a file. Connect the second testSet connection from the TrainTestSplitMaker component to the second ArffSaver component and configure it to save the testing set to a file.
- Run.
As can be seen from the above procedure, it is very easy to partition a dataset randomly into a training set and testing set. However, KnowledgeFlow does not seems to contain other algorithms, like the Kennard and Stone algorithm, for partitioning datasets.
Share This