KNIME - Part I: Overview

KNIME - Konstanz Information Miner (version 1.3.3)

From their official website, “KNIME is a modular data exploration platform that enables the user to visually create data flows (often referred to as pipelines), selectively execute some or all analysis steps, and later investigate the results through interactive views on data and models”. KNIME uses a non-profit open source license which “allows KNIME to be downloaded, distributed, and used freely as long as the software or its use is not distributed per profit”.

If you install the current version of KNIME, with all its optional plugins, you will have a total of 189 nodes, with the following nodes distribution:

  • IO: 11
  • Database: 2
  • Data manipulation: 36
  • Data views: 21
  • Statistics: 4
  • Machines: 28
  • Chemistry: 22
  • Meta: 7
  • Misc: 3
  • Weka: 47
  • Python: 3
  • R: 4
  • Reporting: 2

However, since I am interested in using it for QSAR experiments, I will only examine those nodes that are relevant. Basically, KNIME can only read data from three sources: ARFF files (which are Weka files), text-delimited files (which include csv files), and from a database. There are no nodes for reading from SVMlight files or LIBSVM files or from Microsoft Excel files. The lack of support for Microsoft Excel files is no big deal since you can easily convert them to csv format using Microsoft Excel. However, the lack of support for SVMlight and LIBSVM files will inconvenient users who are already using these two popular support vector machine softwares.

At first sight, KNIME does not seem to have any descriptor selection capability. Will explore this in more detail when I start the testing proper.

Currently, KNIME contains 3 algorithms for developing regression models and 8 algorithms for constructing classification models. I did not count those algorithms that are under the Weka branch because those algorithms are just wrappers over algorithms that are present in Weka and do not have the ability to load and save developed models.

KNIME contains a Cross validation meta-node. Though the website states that it also has boosting and bagging nodes, they were not present in the downloadable version.

Overall, my first impression of KNIME is that it has a very good graphical user interface and seems easy to use. However, it may not contain sufficient tools for a full QSAR experiment.

Share This

8 Responses to “KNIME - Part I: Overview”

  1. Profnick Says:

    Hi,
    You say “Basically, KNIME can only read data from three sources: ARFF files (which are Weka files), text-delimited files (which include csv files), and from a database” ….”The lack of support for Microsoft Excel files is no big deal since you can easily convert them to csv format using Microsoft Excel.”

    In fact you don’t need to convert from Excel since the file reader node will handle Excel files and automatically detect delimiters etc. Also as far as output is concerned their is an xls writer.
    From the QSAR point of view a distinct advantage of KNIME, is that it allows direct reading of .sdf files via the SDF reader node in the Chemistry/CDK plugin.

  2. Yap Chun Wei Says:

    That is strange. I have checked KNIME again and the File Reader node is able to read ASCII files only and not native Excel files.

    For output, KNIME does have an XLS writer but input is a more important criterion than output to me personally.

    Yes, the ability to read SDF files directly is definitely a distinct advantage of KNIME. However, I did not highlight this as I assumed that most QSAR modellers will use their favorite descriptor calculation software to calculate the descriptors before the modelling process. Thus they will not need to read molecular files directly.

  3. Profnick Says:

    Strange indeed, I was convinced that I had used .xls files on my LINUX machine at home but when I checked on my Windows machine in the office you are quite right. Maybe I had confused the Excel-like icon associated with .csv files from Excel. I only mentioned the sdf file handling capability since KNIME enables calculation of CDK descriptors from input of sdf files followed by the CDK conversion node. Regrettably KNIME has no parameter reduction routines though, as you say.

  4. Yap Chun Wei Says:

    Personally, I feel that KNIME has the potential to be a very good tool for QSAR. I like the GUI and the workflow is rather intuitive. Hopefully, the next major release will bring a lot of advances to this software.

  5. Anand Says:

    We use knime for chemoinformatics applications.However I am interested in using the same for QSAR predictions.I am looking for a suitable material on the same.I would be thankful if anybody could provide me some resources on it.

  6. Frank Xavier Says:

    You can use RapidMiner ( http://www.RapidMiner.com/ ), another open source data mining tool, to convert Microsoft Excel sheets, SVM^light or LibSVM files to ARFF or CSV format or into a database of your choice and then load the data into KNIME from there.

    While RapidMiner is no dedicated ETL tool (Extract, Transform, Load), you can use it for that purpose as well.

    And if you install RapidMiner anyway, you can also use it as an alternative or additional data mining tool. It’s downloadable for free and well worth a try.

    Have fun,
    Frank

  7. Charles Bergren Says:

    In Excel, I’ve saved xls files using ‘other’ (comma delimited) formats and then used the knime file reader to import them. Works fine on the knime version I downloaded a couple of days back

  8. Yotkes Says:

    I am looking for outliers detector algorithm — Java open source that run on top of DB or file (CSV)
    All the parameters are strings, no numeric parameters.
    I need it for high volume

    Anyone can please help?
    Sorry if I am not at the right forum

Leave a Reply


Close
E-mail It