TANAGRA - Part II: Partitioning of dataset into training and testing sets

  1. Create a new diagram and configure it to load a dataset from a file. This will put a Dataset operator on the diagram.
  2. Put Sampling operator (Instance selection) to diagram under the Dataset operator and configure the proportion size setting to 80%.
  3. Put Export dataset operator (Data visualization) to diagram under the Sampling operator.
    • Configure it by setting the Examples selection to selected examples.
    • Set the filename to save the training set to.
  4. Put Recover examples operator to diagram under the Sampling operator and set the Examples to recover option to unselected.
  5. Put Export dataset operator (Data visualization) to diagram under the Recover examples operator.
    • Configure it by setting the Examples selection to selected examples.
    • Set the filename to save the testing set to.
  6. Execute.

As can be seen from the above procedure, it is very easy to partition a dataset randomly into a training set and testing set. However, TANAGRA does not seems to contain other algorithms, like the Kennard and Stone algorithm, for partitioning datasets.

Share This

Leave a Reply


Close
E-mail It