Weka (version 3.5.7)
From their official website, “Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. Weka is open source software issued under the GNU General Public License”.
Weka consists of four different applications: Explorer, Experimenter, KnowledgeFlow and SimpleCLI. For this review, I will concentrate mainly on KnowledgeFlow since it is similar to both RapidMiner and KNIME in terms of user interface.
If you install the current developer version of Weka, you will have a total of 225 components in KnowledgeFlow, with the following components distribution:
- DataSources: 8
- DataSinks: 7
- Filters: 69
- Classifiers: 110
- Clusterers: 9
- Associations: 5
- Evaluation: 10
- Visualization: 7
As mentioned before, I am interested in using it for QSAR experiments so I will only examine those nodes that are relevant. Basically, KnowledgeFlow can read data from quite a number of sources, e.g. ARFF, csv, LIBSVM files, database, etc, so most users should not have any problems opening their existing data files in RapidMiner. But it does not have a component for reading from Microsoft Excel files, which is not a big deal since you can easily convert them to csv format using Microsoft Excel.
At first sight, KnowledgeFlow does not seem to have any descriptor selection capability. Will explore this in more detail when I start the testing proper.
Currently, KnowledgeFlow contains 22 algorithms for developing regression models and 66 algorithms for constructing classification models.
KnowledgeFlow contains validation methods like cross-validation and bagging.
Overall, my first impression of KnowledgeFlow is that its graphical user interface seems easy to use. However, the layout of the components may make it difficult to find those components that you require in an experiment.