RapidMiner - Part II: Partitioning of dataset into training and testing sets

  1. Add ArffExampleSource operator (IO->Examples) to Root.
    • Configure it to load a dataset from a file.
    • Set the value of the label_attribute to the class column.
  2. Add SimpleValidation operator (Validation) to Root and configure it by setting the split_ratio value to 0.8.
    • Add OperatorChain operator to SimpleValidation operator.
      • Add ArffExampleSetWriter operator (IO->Examples) to the current OperatorChain operator and configure it to save the training set to a file.
      • Add NearestNeighors operator (Learner->Supervised->Lazy) to the current OperatorChain operator.
    • Add another OperatorChain operator to SimpleValidation operator.
      • Add ArffExampleSetWriter operator (IO->Examples) to the current OperatorChain operator and configure it to save the testing set to a file.
      • Add ModelApplier operator to the current OperatorChain operator.
      • Add Performance operator (Validation) to the current OperatorChain operator.
  3. Run.

In RapidMiner, partitioning of dataset must be accompanied by learning a model from the newly created training set and evaluation of the model on the newly created testing set. There are no operators which can just partition the dataset into a training set and testing set without model building and evaluation. Although RapidMiner has a lot of operators, its selection of operators for partitioning datasets seem to be rather limited. For example, it does not contain other algorithms, like the Kennard and Stone algorithm, for partitioning datasets.

Share This

One Response to “RapidMiner - Part II: Partitioning of dataset into training and testing sets”

  1. UGG Ultra Tall Boots Says:

    Many advantages for falling gets interested this guide. The thoughts were clearly defined and really persuasive. After reading this great article, I learned a good deal that is very helpful to my long run life.

Leave a Reply


Close
E-mail It