Orange - Part I: Overview

Orange (Snapshot 11 April 2008)

From their official website, “Orange is a component-based data mining software. It includes a range of preprocessing, modelling and data exploration techniques. It is based on C++ components, that are accessed either directly (not very common), through Python scripts (easier and better), or through GUI objects called Orange Widgets”. Orange is distributed under GPL.

If you install the current version of Orange, you will have a total of 77 widgets, with the following nodes distribution:

  • Data: 15
  • Classify: 14
  • Evaluate: 6
  • Visualize: 13
  • Associate: 13
  • Prototypes: 13
  • Regression: 3

However, since I am interested in using it for QSAR experiments, I will only examine those nodes that are relevant. Basically, Orange can read data from five sources: text-delimited files (which include csv files), C4.5 files, and three other formats which I am not familar with. Orange cannot read data from SVMlight files, LIBSVM files or Microsoft Excel files. The lack of support for Microsoft Excel files is no big deal since you can easily convert them to csv format using Microsoft Excel. However, the lack of support for SVMlight and LIBSVM files will inconvenient users who are already using these two popular support vector machine softwares.

Orange has a few filter descriptor selection methods such as ReliefF, Information gain, Gain ratio and Gini gain.

Currently, Orange contains one algorithm for developing regression models and 10 algorithms for constructing classification models. It seems strange that Orange does not have multiple linear regression algorithm, which is the most basic of regression algorithms.

Orange has a Data Sampler widget that provides validation methods like cross-validation and leave-one-out.

Overall, my first impression of Orange is that it has a nice graphical user interface but it seems quite inadequate for QSAR experiments.

Share This

Leave a Reply


Close
E-mail It