KNIME - Part II: Partitioning of dataset into training and testing sets

  1. Put File Reader node (IO->Read) to workbench and configure it to load a dataset from a file.
  2. Put Partitioning node (Data Manipulation->Row) to workbench and connect the output port from the File Reader node to its input port.
    • Configure it by choosing Relative and setting it at 80%
    • Ensure the Draw randomly box is checked.
  3. Put two CSV Writer nodes (IO->Write) on the workbench. Connect the first output port from the Partitioning node to the input node of the first CSV Writer and configure it to save the first set (which is the training set) to a file. Connect the second output port from the Partitioning node to the input node of the second CSV Writer and configure it to save the second set (which is the testing set) to a file.
  4. Execute all nodes.

As can be seen from the above procedure, it is very easy to partition a dataset randomly into a training set and testing set. However, KNIME does not seems to contain other algorithms, like the Kennard and Stone algorithm, for partitioning datasets.

Share This

Leave a Reply


Close
E-mail It