<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.2" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: VisuMap - Part 2</title>
	<link>http://voyagememoirs.com/pharmine/2008/08/29/visumap-part-2/</link>
	<description>Data mining in Pharmacy</description>
	<pubDate>Sun, 20 May 2012 14:18:10 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.2</generator>
		<item>
		<title>By: gift cards</title>
		<link>http://voyagememoirs.com/pharmine/2008/08/29/visumap-part-2/#comment-9248</link>
		<dc:creator>gift cards</dc:creator>
		<pubDate>Mon, 19 Mar 2012 08:52:26 +0000</pubDate>
		<guid>http://voyagememoirs.com/pharmine/2008/08/29/visumap-part-2/#comment-9248</guid>
		<description>Wow, superb weblog structure! How long have you been blogging for? you make blogging glance easy. The total look of your web internet site is superb, neatly as the content material!</description>
		<content:encoded><![CDATA[<p>Wow, superb weblog structure! How long have you been blogging for? you make blogging glance easy. The total look of your web internet site is superb, neatly as the content material!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alvin</title>
		<link>http://voyagememoirs.com/pharmine/2008/08/29/visumap-part-2/#comment-7000</link>
		<dc:creator>Alvin</dc:creator>
		<pubDate>Wed, 30 Nov 2011 01:48:53 +0000</pubDate>
		<guid>http://voyagememoirs.com/pharmine/2008/08/29/visumap-part-2/#comment-7000</guid>
		<description>I would really like to say thank you so much for the work you have made in writing this blog post. I am hoping the same most reliable work by you in the future too.</description>
		<content:encoded><![CDATA[<p>I would really like to say thank you so much for the work you have made in writing this blog post. I am hoping the same most reliable work by you in the future too.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Lucio Forsman</title>
		<link>http://voyagememoirs.com/pharmine/2008/08/29/visumap-part-2/#comment-6618</link>
		<dc:creator>Lucio Forsman</dc:creator>
		<pubDate>Mon, 14 Nov 2011 22:06:41 +0000</pubDate>
		<guid>http://voyagememoirs.com/pharmine/2008/08/29/visumap-part-2/#comment-6618</guid>
		<description>Hey! I know this is kind of off topic but I was wondering which blog platform are you using for this site? I'm getting sick and tired of Wordpress because I've had issues with hackers and I'm looking at options for another platform. I would be fantastic if you could point me in the direction of a good platform.</description>
		<content:encoded><![CDATA[<p>Hey! I know this is kind of off topic but I was wondering which blog platform are you using for this site? I&#8217;m getting sick and tired of Wordpress because I&#8217;ve had issues with hackers and I&#8217;m looking at options for another platform. I would be fantastic if you could point me in the direction of a good platform.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: James X. Li</title>
		<link>http://voyagememoirs.com/pharmine/2008/08/29/visumap-part-2/#comment-382</link>
		<dc:creator>James X. Li</dc:creator>
		<pubDate>Fri, 05 Sep 2008 01:26:25 +0000</pubDate>
		<guid>http://voyagememoirs.com/pharmine/2008/08/29/visumap-part-2/#comment-382</guid>
		<description>Thanks for posting the dataset. After some examination we have found a defect in the implementation of the Dice dissimilarity metric that has been made available as a free plugin module for VisuMap.

The defect has been corrected in the mean time, and the new version is ready to download on our web site. Using the correct Dice metric, RPM method produces similar maps as with other metrics.

It is not too surprising that differences between those binary metrics are not apparent for human eye. But those fine differences could be significant for automated searching algorithms.

The similarity metric as a whole depends on how you calculate the fingerprints; and that in turn depends on how you select the fragments to produce the fingerprints. The process to find appropriates similarity (or dissimilarity) metric is domain and problem specific. VisuMap can also offer some help in this regard. Please see my blog (http://jamesxli.blogspot.com/2008/09/on-similarity-metrics-for-chemical.html) for more comments.</description>
		<content:encoded><![CDATA[<p>Thanks for posting the dataset. After some examination we have found a defect in the implementation of the Dice dissimilarity metric that has been made available as a free plugin module for VisuMap.</p>
<p>The defect has been corrected in the mean time, and the new version is ready to download on our web site. Using the correct Dice metric, RPM method produces similar maps as with other metrics.</p>
<p>It is not too surprising that differences between those binary metrics are not apparent for human eye. But those fine differences could be significant for automated searching algorithms.</p>
<p>The similarity metric as a whole depends on how you calculate the fingerprints; and that in turn depends on how you select the fragments to produce the fingerprints. The process to find appropriates similarity (or dissimilarity) metric is domain and problem specific. VisuMap can also offer some help in this regard. Please see my blog (http://jamesxli.blogspot.com/2008/09/on-similarity-metrics-for-chemical.html) for more comments.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Yap Chun Wei</title>
		<link>http://voyagememoirs.com/pharmine/2008/08/29/visumap-part-2/#comment-324</link>
		<dc:creator>Yap Chun Wei</dc:creator>
		<pubDate>Tue, 02 Sep 2008 08:58:53 +0000</pubDate>
		<guid>http://voyagememoirs.com/pharmine/2008/08/29/visumap-part-2/#comment-324</guid>
		<description>Thank you so much for the useful information. I will experiment more and read more literature.

I have added the link to the dataset to the post (in the first paragraph). Ids 1 to 72 are cephalosporins, 73 to 111 are fluoroquinolones and 112 to 170 are penicillins.

I have experimented with t-SNE. The results are pretty similar to those of Sammon and SMACOF method for the three distance metrics.</description>
		<content:encoded><![CDATA[<p>Thank you so much for the useful information. I will experiment more and read more literature.</p>
<p>I have added the link to the dataset to the post (in the first paragraph). Ids 1 to 72 are cephalosporins, 73 to 111 are fluoroquinolones and 112 to 170 are penicillins.</p>
<p>I have experimented with t-SNE. The results are pretty similar to those of Sammon and SMACOF method for the three distance metrics.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: James X. Li</title>
		<link>http://voyagememoirs.com/pharmine/2008/08/29/visumap-part-2/#comment-285</link>
		<dc:creator>James X. Li</dc:creator>
		<pubDate>Sat, 30 Aug 2008 15:37:30 +0000</pubDate>
		<guid>http://voyagememoirs.com/pharmine/2008/08/29/visumap-part-2/#comment-285</guid>
		<description>I am a little surprised by the disappointing result of the Dice-metric with RPM, as this metric is quite similar to the Jaccard-metric. Would it possible to release your dataset for us to take a close look? You can replace the labels &#38; names of the data points with anonymous strings (with the table editor), if you need to keep the data confidential.

If the cluster structure is your main interest, you should also try the t-SNE mapping method. This method is the latest addition to VisuMap that preserves clusters structure very well.

In general, when I explore a new dataset I would first try to use the PCA method. PCA is the safest mapping algorithm; it actually does not do any data processing except rotating and shifting the coordinators system, then project to the 3 coordinators with the most variances. PCA is less powerful compared to other non-linear mapping algorithms, since it does not do any unfolding, twisting, segmentation etc.  But, PCA often provides good results in the practice.

When applying PCA method, you should check the eigenvalues of the principal components (via PCA Projection&#62;PCA Analyzer&#62;PCA Details). Larger eigenvalues mean more variances (and more information). If you see more than 3 relatively large eigenvalues, you should be careful with the results of PCA map, since it only displays projection to 3 components, some relevant information may be invisible in the map.

Also notice that with VisuMap you can select a cluster of data points then apply PCA on the selected data (via the context menu "Show PCA View").  Thus, in addition to global structure, you can easily explore the detailed relationships within clusters.

If PCA does not deliver satisfactory results (e.g. the map has no clusters, no clear geometrical shapes or density gradients, does not fit to expectations, etc. just like a random cloud), I would then try Sammon or SMACOF method, then those more powerful methods like CCA, t-SNE or RPM. The tool Shepard diagram (view&#62;Shepard Diagram) provides some help to assess how good a map reflects the original distance information. 

The selection of distance metric and, more generally, the preparation of data (filtering, transformation, cleansing etc.) are very domain specific tasks. Knowledge about different groups of distance metrics (dissimilarity distances) could offer some guides by searching for literatures.</description>
		<content:encoded><![CDATA[<p>I am a little surprised by the disappointing result of the Dice-metric with RPM, as this metric is quite similar to the Jaccard-metric. Would it possible to release your dataset for us to take a close look? You can replace the labels &amp; names of the data points with anonymous strings (with the table editor), if you need to keep the data confidential.</p>
<p>If the cluster structure is your main interest, you should also try the t-SNE mapping method. This method is the latest addition to VisuMap that preserves clusters structure very well.</p>
<p>In general, when I explore a new dataset I would first try to use the PCA method. PCA is the safest mapping algorithm; it actually does not do any data processing except rotating and shifting the coordinators system, then project to the 3 coordinators with the most variances. PCA is less powerful compared to other non-linear mapping algorithms, since it does not do any unfolding, twisting, segmentation etc.  But, PCA often provides good results in the practice.</p>
<p>When applying PCA method, you should check the eigenvalues of the principal components (via PCA Projection&gt;PCA Analyzer&gt;PCA Details). Larger eigenvalues mean more variances (and more information). If you see more than 3 relatively large eigenvalues, you should be careful with the results of PCA map, since it only displays projection to 3 components, some relevant information may be invisible in the map.</p>
<p>Also notice that with VisuMap you can select a cluster of data points then apply PCA on the selected data (via the context menu &#8220;Show PCA View&#8221;).  Thus, in addition to global structure, you can easily explore the detailed relationships within clusters.</p>
<p>If PCA does not deliver satisfactory results (e.g. the map has no clusters, no clear geometrical shapes or density gradients, does not fit to expectations, etc. just like a random cloud), I would then try Sammon or SMACOF method, then those more powerful methods like CCA, t-SNE or RPM. The tool Shepard diagram (view&gt;Shepard Diagram) provides some help to assess how good a map reflects the original distance information. </p>
<p>The selection of distance metric and, more generally, the preparation of data (filtering, transformation, cleansing etc.) are very domain specific tasks. Knowledge about different groups of distance metrics (dissimilarity distances) could offer some guides by searching for literatures.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

