Visualization software for exploratory data analysis

A dataset may contain anywhere from one to several thousand features. When the number of features in a dataset exceeds three, it is difficult to visualize how different instances are related to one another. Luckily, there are methods available to help us visualize these high-dimensional dataset. The common thing about these methods is that they reduce the original features in the dataset into not more than three features, while retaining the distance relationship between the instances. This allows the instances to be plotted as a 2D or 3D graph, providing us with a visual overview of the structure of the dataset.

The usual method for visualizing datasets in QSAR is principal component analysis (PCA). PCA is used to convert the existing features into another set of orthogonal features, with the first few features capturing the bulk of the variance in the dataset. A 2D plot is usually made from the first two principal components and is useful for showing clusters in the dataset, areas where data is sparse, possible outliers, and whether it is possible to separate the different classes in the dataset using PCA alone.

Other than PCA, other dimensionality reduction methods are seldom used in QSAR. The reasons are not clear. Perhaps, it is due to the lack of software, or the lack of expertise in interpreting such graphs. Indeed, it is for both reasons that I do not use visualization methods often in my research. Previously, I was too involved in data mining alone. Now, I have broaden my approach to data exploration and thus it is necessary for me to learn how to visualize data properly.

A search on the internet shows that there are some visualization software available. I have selected three software, VisuMap, OmniViz, and GGobi to explore in more details.

Share This

One Response to “Visualization software for exploratory data analysis”

  1. Nick Price Says:

    These applications look very interesting; I shall look forward to seeing your reviews.
    Nick

Leave a Reply


Close
E-mail It