TANAGRA - Part II: Partitioning of dataset into training and testing sets
- Create a new diagram and configure it to load a dataset from a file. This will put a Dataset operator on the diagram.
- Put Sampling operator (Instance selection) to diagram under the Dataset operator and configure the proportion size setting to 80%.
- Put Export dataset operator (Data visualization) to diagram under the Sampling operator.
- Configure it by setting the Examples selection to selected examples.
- Set the filename to save the training set to.
- Put Recover examples operator to diagram under the Sampling operator and set the Examples to recover option to unselected.
- Put Export dataset operator (Data visualization) to diagram under the Recover examples operator.
- Configure it by setting the Examples selection to selected examples.
- Set the filename to save the testing set to.
- Execute.
As can be seen from the above procedure, it is very easy to partition a dataset randomly into a training set and testing set. However, TANAGRA does not seems to contain other algorithms, like the Kennard and Stone algorithm, for partitioning datasets.
Share This