VisuMap
Tuesday, July 22nd, 2008VisuMap is a high dimensional data visualizer. It provides a number of dimensionality reduction methods like principal component analysis, Sammon mapping, curvilinear component analysis, relational perspective map and SMACOF MDS. It also has a few data clustering methods such as K-mean clustering, agglomerative clustering, self-organizing map and metric sampling.
The website contains some sample maps, sample datasets for you to work on. There are also white papers, and demo videos on the software (which is only available after you register with the website).
To evaluate this software, I used my own dataset. In one of my previous research, I gathered three congeneric groups of compounds: penicillins, cephalosporins, fluoroquinolones. I compute fingerprints (1025 dimensions) using openbabel for these compounds and combined them into one dataset. Then I load the dataset into VisuMap and run it through each of the different dimensionality reduction methods.
Results from Principal component analysis. Yellow squares are cephalosporins, Red circles are penicillins, Blue triangles are fluoroquinolones
Results from Sammon mapping. Yellow squares are cephalosporins, Red circles are penicillins, Blue triangles are fluoroquinolones
Results from Curvilinear component analysis. Yellow squares are cephalosporins, Red circles are penicillins, Blue triangles are fluoroquinolones
Results from Relational perspective map. Yellow squares are cephalosporins, Red circles are penicillins, Blue triangles are fluoroquinolones
Results from SMACOF MDS. Yellow squares are cephalosporins, Red circles are penicillins, Blue triangles are fluoroquinolones
All the pictures above (except PCA) are the 2D maps produced by the various algorithms. Although the software can also produce 3D maps, it is not easy to visualize them as the software does not provide very good controls for rotating the map. I could not get the 3D animation to work in my VMWare machine so I don’t know whether it provides an easy way to view 3D maps. It will be good if the software adopts the way that molecular structure viewer software like Sybyl handles 3D structures (i.e. hold down right mouse button and move the mouse to rotate).
It can be seen from the pictures that the algorithms PCA, Sammon and MDS did a very good job in showing that there are three distinct groups from the obvious separation between the groups (The colours and shapes of the different groups were added in manually to enhance the visual effects. Bear in mind that when you process a dataset with unknown groupings, every point will appear to be the same. Thus the only way to differentitate groups is if there is an obvious separation band). For the other algorithms, the separation between the groups are not as good, although it can be seen that members of each group does not mix with those from other groups. The Sammon and MDS algorithm also correctly showed that penicillins and cephalosporins are closer to each other than they are to fluoroquinolones.




