Data mining tools comparison methodology

As mentioned in my previous post, I will explore Weka, RapidMiner, and KNIME in more details. A reader has suggested that I look at TANAGRA also. So I will try to give a comparison between these four tools. However, I will not be doing the usual comparison (i.e. side by side comparison) and I will not be going into all the features of these tools. Instead, I will gauge the ease with which the tool can be used for QSAR experiments. I will evaluate the tools using a few procedures that are widely used in QSAR experiments. These procedures have been described in my previous posts and are:

  1. Partitioning of dataset into training and testing sets.
  2. Descriptor scaling.
  3. Descriptor selection.
  4. Parameter optimization of machine learning/statistical methods.
  5. Model validation using cross-validation and/or independent validation set.

As I am not very familiar with these tools, my comments on these tools will be highly subjective.

Share This

Leave a Reply


Close
E-mail It