VisuMap
VisuMap is a high dimensional data visualizer. It provides a number of dimensionality reduction methods like principal component analysis, Sammon mapping, curvilinear component analysis, relational perspective map and SMACOF MDS. It also has a few data clustering methods such as K-mean clustering, agglomerative clustering, self-organizing map and metric sampling.
The website contains some sample maps, sample datasets for you to work on. There are also white papers, and demo videos on the software (which is only available after you register with the website).
To evaluate this software, I used my own dataset. In one of my previous research, I gathered three congeneric groups of compounds: penicillins, cephalosporins, fluoroquinolones. I compute fingerprints (1025 dimensions) using openbabel for these compounds and combined them into one dataset. Then I load the dataset into VisuMap and run it through each of the different dimensionality reduction methods.
Results from Principal component analysis. Yellow squares are cephalosporins, Red circles are penicillins, Blue triangles are fluoroquinolones
Results from Sammon mapping. Yellow squares are cephalosporins, Red circles are penicillins, Blue triangles are fluoroquinolones
Results from Curvilinear component analysis. Yellow squares are cephalosporins, Red circles are penicillins, Blue triangles are fluoroquinolones
Results from Relational perspective map. Yellow squares are cephalosporins, Red circles are penicillins, Blue triangles are fluoroquinolones
Results from SMACOF MDS. Yellow squares are cephalosporins, Red circles are penicillins, Blue triangles are fluoroquinolones
All the pictures above (except PCA) are the 2D maps produced by the various algorithms. Although the software can also produce 3D maps, it is not easy to visualize them as the software does not provide very good controls for rotating the map. I could not get the 3D animation to work in my VMWare machine so I don’t know whether it provides an easy way to view 3D maps. It will be good if the software adopts the way that molecular structure viewer software like Sybyl handles 3D structures (i.e. hold down right mouse button and move the mouse to rotate).
It can be seen from the pictures that the algorithms PCA, Sammon and MDS did a very good job in showing that there are three distinct groups from the obvious separation between the groups (The colours and shapes of the different groups were added in manually to enhance the visual effects. Bear in mind that when you process a dataset with unknown groupings, every point will appear to be the same. Thus the only way to differentitate groups is if there is an obvious separation band). For the other algorithms, the separation between the groups are not as good, although it can be seen that members of each group does not mix with those from other groups. The Sammon and MDS algorithm also correctly showed that penicillins and cephalosporins are closer to each other than they are to fluoroquinolones.
Share This




July 22nd, 2008 at 4:53 pm
You said you have worked on 1025 dimensions. What about the amount of data you have taken? (1 million / 0.5 million etc)
i.e datasize you have considered?
July 23rd, 2008 at 11:55 pm
Thanks for reviewing our software. The 3D animation service in VisuMap
requires DirectX library as documented in the installation guide.
The navigation of the 3D maps is very similar to that
of the PCA window, except that it is much faster for large datasets
(>5K data points). The 3D navigation interface
is modeled like GoogleEarth, so that you can virtually
fly within your data using your mouse.
For most mapping algorithms in VisuMap the dataset size is limited
to 5000 to 10000 data points. If you have more data points, you
should use one of the integrated clustering algorithms to
reduced dataset size to a more manageable size. For instance,
you can easily reduce a dataset with 1 million data point to
few thousands clusters with the self-organizing map within few hours.
You can also use the clustering services to color data points
automatically according the their clusters
It should also be pointed out that mapping algorithms like Sammon map and PCA emphasize on the global inter-cluster structure, whereas other mapping algorithms (like the RPM and CCA) emphasize more on the details within clusters.
July 25th, 2008 at 10:10 am
Hi Krishnakumari,
The size of the dataset used is only 171 compounds.
Hi James,
Thanks for the clarification. Yes, I am aware that VisuMap requires DirectX library but since my machines are all linux-based, I can only run it using Windows that is under VMWare. I had actually also asked my graduate student to try the software and she mentioned that the 3D navigation control is rather good.
I will be looking more into the clustering algorithms next. However, as I had mentioned in my previous post, I am not an expert in visualization software. I am still learning how to best utilize such software for research, in particular in QSAR research. Thus I hope readers don’t really treat these posts as reviews on the software but rather treat them as just personal observations of an amateur using the software. I will welcome comments from readers and yourself on how to more effectively use such software and correctly any inaccuracies that may inadvertently arise.
August 26th, 2008 at 10:54 am
It should be noticed that, apart from mapping algorithms, the distance metric
you choose to measure the dissimilarity between data points also plays an very important role in these kind of analysis. If you are using the fingerprints vectors which, i suppose, are binary feature flags, I would suggest to try other metrics like Jaccard or Dice distance.
VisuMap allows you to plug-in your own distance functions to characterize dissimilarities. For those standard metrics, like jaccard distance, there are free ready-to-use plugin modules.
October 15th, 2008 at 12:21 am
James,
How can I contact you ? Your server/site bounces and won’t redirect email.