TANAGRA - Part I: Overview

TANAGRA (version 1.4.21)

From their official website, “TANAGRA is the successor of SIPINA which implements various supervised learning algorithms, especially an interactive and visual construction of decision trees. TANAGRA is more powerful, it contains some supervised learning but also other paradigms such as clustering, factorial analysis, parametric and nonparametric statistics, association rule, feature selection and construction algorithms”. TANAGRA “is an “open source project” as every researcher can access to the source code, and add his own algorithms, as far as he agrees and conforms to the software distribution license”. According to the English translation of the license (the original is in French), the software is free for use but if you have used it for your research, you have to cite it in your publications.

If you install the current version of TANAGRA, you will have a total of 137 operators, with the following nodes distribution:

  • Data visualization: 6
  • Statistics: 17
  • Nonparametric statistics: 20
  • Instance selection: 6
  • Feature construction: 12
  • Feature selection: 12
  • Regression: 6
  • Factorial analysis: 6
  • PLS: 4
  • Clustering: 12
  • Spv learning: 17
  • Meta-spv learning: 4
  • Spv learning assessment: 6
  • Scoring: 3
  • Association: 6

However, since I am interested in using it for QSAR experiments, I will only examine those nodes that are relevant. Basically, TANAGRA can only read data from three sources: text-delimited files (which include csv files), ARFF files (which are Weka files), and Microsoft Excel files. There are no nodes for reading from SVMlight files or LIBSVM files. The lack of support for SVMlight and LIBSVM files will inconvenient users who are already using these two popular support vector machine softwares.

TANAGRA has a few descriptor selection operators. However, it seems like it does not have some common ones like genetic algorithm. Also the filter and wrapper descriptor selection methods seems to be mixed together. Will explore this in more detail when I start the testing proper.

Currently, TANAGRA contains 6 algorithms for developing regression models and 17 algorithms for constructing classification models.

TANAGRA contains a few validation methods like cross-validation, boosting and bagging.

Overall, my first impression of TANAGRA is that although it may not have a very pretty graphical user interface or a lot of operators, it seems to be quite suitable for QSAR experiments.

Share This

Leave a Reply


Close
E-mail It