pharmine http://voyagememoirs.com/pharmine Data mining in Pharmacy Tue, 17 Nov 2009 14:06:50 +0000 http://wordpress.org/?v=2.3.2 en PaDEL-ADV http://voyagememoirs.com/pharmine/2009/11/09/padel-adv/ http://voyagememoirs.com/pharmine/2009/11/09/padel-adv/#comments Mon, 09 Nov 2009 15:53:51 +0000 admin http://voyagememoirs.com/pharmine/2009/11/09/padel-autodockvina/ Introducing another new software, PaDEL-ADV. This is a software to perform virtual screening using AutoDock Vina.

PaDEL-ADV reads a directory containing ligands files. For each ligand, the structural file is converted into a pdb file, if necessary, using The Chemistry Development Kit. The pdb file is then converted to pdbqt using the prepare_ligand4.py script provided by AutoDockTools. AutoDock Vina is then used to dock the ligand with the receptor. Individual binding modes are extracted from the output pdbqt file using vina_split. The pdbqt files are then converted to pdb files using the pdbqt_to_pdb.py script provided by AutoDockTools. Results for each binding modes are extracted from the log file and placed into the results CSV file. The log file and all the related pdb and pdbqt files are then compressed into a zip file.

Share This ]]>
http://voyagememoirs.com/pharmine/2009/11/09/padel-adv/feed/
Modern QSAR - Validation http://voyagememoirs.com/pharmine/2009/04/21/modern-qsar-validation/ http://voyagememoirs.com/pharmine/2009/04/21/modern-qsar-validation/#comments Tue, 21 Apr 2009 06:27:20 +0000 admin http://voyagememoirs.com/pharmine/2009/04/21/modern-qsar-validation/ modern-qsar-validation.jpg

Share This ]]>
http://voyagememoirs.com/pharmine/2009/04/21/modern-qsar-validation/feed/
Modern QSAR - Modeling methods http://voyagememoirs.com/pharmine/2009/03/23/modern-qsar-modeling-methods/ http://voyagememoirs.com/pharmine/2009/03/23/modern-qsar-modeling-methods/#comments Mon, 23 Mar 2009 04:56:53 +0000 admin http://voyagememoirs.com/pharmine/2009/03/23/modern-qsar-modeling-methods/ modern-qsar-modelingmethods.jpg

Share This ]]>
http://voyagememoirs.com/pharmine/2009/03/23/modern-qsar-modeling-methods/feed/
Modern QSAR - Descriptor http://voyagememoirs.com/pharmine/2009/02/06/modern-qsar-descriptor/ http://voyagememoirs.com/pharmine/2009/02/06/modern-qsar-descriptor/#comments Fri, 06 Feb 2009 03:54:57 +0000 admin http://voyagememoirs.com/pharmine/2009/02/06/modern-qsar-descriptor/ modern-qsar-descriptors.jpg

Share This ]]>
http://voyagememoirs.com/pharmine/2009/02/06/modern-qsar-descriptor/feed/
Modern QSAR - Dataset http://voyagememoirs.com/pharmine/2009/01/28/modern-qsar-dataset/ http://voyagememoirs.com/pharmine/2009/01/28/modern-qsar-dataset/#comments Wed, 28 Jan 2009 02:38:27 +0000 admin http://voyagememoirs.com/pharmine/2009/01/28/modern-qsar-dataset/ 32-modern-qsar-dataset.jpg

Share This ]]>
http://voyagememoirs.com/pharmine/2009/01/28/modern-qsar-dataset/feed/
OECD Principles For The Validation, For Regulatory Purposes, Of (Quantitative) Structure-Activity Relationship Models http://voyagememoirs.com/pharmine/2008/12/16/oecd-principles-for-the-validation-for-regulatory-purposes-of-quantitative-structure-activity-relationship-models/ http://voyagememoirs.com/pharmine/2008/12/16/oecd-principles-for-the-validation-for-regulatory-purposes-of-quantitative-structure-activity-relationship-models/#comments Tue, 16 Dec 2008 09:29:03 +0000 admin http://voyagememoirs.com/pharmine/2008/12/16/oecd-principles-for-the-validation-for-regulatory-purposes-of-quantitative-structure-activity-relationship-models/ In 2004, OECD came up with 5 principles for QSAR models. They are:

  1. a defined endpoint
  2. an unambiguous algorithm
  3. a defined domain of applicability
  4. appropriate measures of goodness-of– fit, robustness and predictivity
  5. a mechanistic interpretation, if possible

If you are working on QSAR models, it will be good for you to know these principles and apply them in your work.

For more information on these principles, you can go to the OECD website

Share This ]]>
http://voyagememoirs.com/pharmine/2008/12/16/oecd-principles-for-the-validation-for-regulatory-purposes-of-quantitative-structure-activity-relationship-models/feed/
Health Discovery Corporation holds the patents to SVM and RFE http://voyagememoirs.com/pharmine/2008/11/20/health-discovery-corporation-holds-the-patents-to-svm-and-rfe/ http://voyagememoirs.com/pharmine/2008/11/20/health-discovery-corporation-holds-the-patents-to-svm-and-rfe/#comments Thu, 20 Nov 2008 08:08:19 +0000 admin http://voyagememoirs.com/pharmine/2008/11/20/health-discovery-corporation-holds-the-patents-to-svm-and-rfe/ While doing some literature search and reading, I discovered that SVM and RFE are actually patented technologies. I am not sure what are the implications of this to researchers but I don’t like the sound of it. Maybe it is time to look into other new machine learning technologies and hold off using SVM for the next 20 years.

Share This ]]>
http://voyagememoirs.com/pharmine/2008/11/20/health-discovery-corporation-holds-the-patents-to-svm-and-rfe/feed/
Y-randomization in KNIME http://voyagememoirs.com/pharmine/2008/11/10/y-randomization-in-knime/ http://voyagememoirs.com/pharmine/2008/11/10/y-randomization-in-knime/#comments Mon, 10 Nov 2008 05:00:15 +0000 admin http://voyagememoirs.com/pharmine/2008/11/10/y-randomization-in-knime/ Previously, I had wrote about how to perform y-randomization in Rapidminer. You can also use those basic concepts to do y-randomization in KNIME. Unlike the previous post where I detailed the steps for an entire y-randomization experiment, in this post, I will show how to perform a single y-randomization on a dataset only. Below is the basic workflow.

workflow1.jpg

“Column Filter” is used to remove all variables except the label. This is then passed to “Shuffle” to randomize the labels. An increasing row id number is then added to this randomized label dataset and the original dataset using “Math Formula”.

mathformula.jpg

“Row ID” is then used to replace the original row ids in both original and randomized label dataset with the newly created row id.

rowid.jpg

Finally, “Joiner” is used to merge the two datasets together, creating a randomized dataset.

joiner.jpg

Share This ]]>
http://voyagememoirs.com/pharmine/2008/11/10/y-randomization-in-knime/feed/
Y-randomization in Rapidminer http://voyagememoirs.com/pharmine/2008/11/04/y-randomization-in-rapidminer/ http://voyagememoirs.com/pharmine/2008/11/04/y-randomization-in-rapidminer/#comments Tue, 04 Nov 2008 02:46:06 +0000 admin http://voyagememoirs.com/pharmine/2008/11/04/y-randomization-in-rapidminer/ I had mentioned using Y-randomization as one of the methods to use for checking overfitting of a prediction model. Recently, someone had asked me about the Y-randomization that was implemented in my software, PHAKISO. PHAKISO was created during my PhD studies and unfortunately, it did not have an automated method to perform the Y-randomization experiment automatically for n number of times. I had always used the associated library, YMLL, to create a simple program to do the job and thus did not implement such feature in PHAKISO.

Since I am using Rapidminer for my research now, I thought it would be easy to create a Y-randomization process in it. Unfortunately, Rapidminer did not have a Y-randomization operator. However, through the solutions provided by the helpful moderators in the Rapidminer forum, I finally know how to do it in Rapidminer and in the process, learnt more about Rapidminer.

The basic process is shown in the following figure. operatortree.jpg

The basic idea in Y-randomization is to randomize the label of the dataset. So in Rapidminer, you would load a dataset, create a copy of it and remove all attributes, except the label, from the copy. Then randomly permutate the examples in the copy and tag all the examples with an unique id. Select the original dataset, tag all examples with an unique id and do a join between the original dataset and its copy, using the id as the key for joining. If you do it in the correct way, the labels in the original dataset will be skipped during the joining and the permutated labels in the copy will be used for the joined dataset. To perform the entire Y-randomization experiment automatically, you will need to use the IteratingPerformanceAverage operator chain to enclose the Y-randomization portion and add a validation procedure after the Y-randomization portion as shown in the figure.

The complete XML process is as follows:
<operator name=”Root” class=”Process” expanded=”yes”>
    <parameter key=”random_seed”        value=”-1″/>
    <operator name=”CSVExampleSource” class=”CSVExampleSource”>
    </operator>
    <operator name=”IteratingPerformanceAverage” class=”IteratingPerformanceAverage” expanded=”yes”>
        <parameter key=”iterations”             value=”100″/>
        <operator name=”IOMultiplier” class=”IOMultiplier”>
            <parameter key=”io_object”         value=”ExampleSet”/>
        </operator>
        <operator name=”AttributeSubsetPreprocessing” class=”AttributeSubsetPreprocessing” expanded=”yes”>
            <parameter key=”attribute_name_regex”              value=”label”/>
            <parameter key=”condition_class”              value=”attribute_name_filter”/>
            <parameter key=”keep_subset_only”       value=”true”/>
            <operator name=”Permutation” class=”Permutation”>
            </operator>
            <operator name=”IdTagging” class=”IdTagging”>
            </operator>
        </operator>
        <operator name=”IOSelector” class=”IOSelector”>
            <parameter key=”io_object”         value=”ExampleSet”/>
            <parameter key=”select_which”  value=”2″/>
        </operator>
        <operator name=”IdTagging (2)” class=”IdTagging”>
        </operator>
        <operator name=”ExampleSetJoin” class=”ExampleSetJoin”>
        </operator>
        <operator name=”XValidation” class=”XValidation” expanded=”yes”>
            <parameter key=”leave_one_out”             value=”true”/>
            <operator name=”NearestNeighbors” class=”NearestNeighbors”>
                <parameter key=”k”      value=”3″/>
            </operator>
            <operator name=”OperatorChain” class=”OperatorChain” expanded=”yes”>
                <operator name=”ModelApplier” class=”ModelApplier”>
                    <list key=”application_parameters”>
                    </list>
                </operator>
                <operator name=”ClassificationPerformance” class=”ClassificationPerformance”>
                    <parameter key=”accuracy”   value=”true”/>
                    <list key=”class_weights”>
                    </list>
                </operator>
            </operator>
        </operator>
    </operator>
</operator>

Just enter your dataset file in CSVExampleSource and change the method in XValidation from NearestNeighbors to your desired modeling method.

Share This ]]>
http://voyagememoirs.com/pharmine/2008/11/04/y-randomization-in-rapidminer/feed/
PaDEL-Crypt http://voyagememoirs.com/pharmine/2008/10/25/padel-crypt/ http://voyagememoirs.com/pharmine/2008/10/25/padel-crypt/#comments Sat, 25 Oct 2008 07:28:21 +0000 admin http://voyagememoirs.com/pharmine/2008/10/25/padel-crypt/ Announcing another product from my laboratory. PaDEL-Crypt is a portable software for encrypting and decrypting files. The encrypted files are stored in a vault and can only be viewed and decrypted using a correct password. The main intended use of PaDEL-Crypt is to encrypt files in portable devices like flash drives or portable hard disks, so as to maintain data confidentiality in the event of accidental loss of such devices. The targeted users are those who do not have much knowledge about encryption but wish to have a simple solution for adding encryption to their portable devices.

At the current moment, PaDEL-Crypt will run on any platform that have Java installed. Once I figure out how to run GCJ, I will make a native version for Windows so that Java will no longer need to be installed in the target machine. In the meantime, you can use the Java Portablizer to copy the Java JRE to your portable device and make a batch file to make PaDEL-Crypt use that version to run.

It might seem strange why my laboratory, which focuses on applying knowledge discovery and data mining techniques to pharmaceutical and biomedical areas, would produce a product like PaDEL-Crypt. Well, data confidentiality is an important issue in many areas, including the pharmaceutical and biomedical areas. Flash drives are so popular nowadays that most of us have at least one with us. However, most of us did not bother to encrypt the data that are stored in it, which is disturbing because flash drives are easily lost. Although most of the information that we put on flash drives are usually not confidential, there might be some that are confidential and it is important to protect these.

Before developing PaDEL-Crypt, I have looked at a few solutions in the market, including those that come with the flash drives. However, most of these solutions are usable only on Microsoft Windows and are not strictly portable (they leave behind information on the target machine). Thus I decided to develop my own using Java. PaDEL-Crypt is not meant to replace other encryption system for harddisks, like TrueCrypt. Rather, it is meant to complement these systems, which are not portable because they require administrative privileges on the target machine to run. PaDEL-Crypt will be an ideal solution to carry or distribute your encrypted files via portable devices or email. As PaDEL-Crypt can also create self-decrypting archives, the target machine does not even require PaDEL-Crypt to be present in order to decrypt the files.

PaDEL-Crypt is basically free for all to use, with the exception of one group of people. Please see the license conditions at its homepage to find out more.

Share This ]]>
http://voyagememoirs.com/pharmine/2008/10/25/padel-crypt/feed/