changeset 23:1868005213a1

ChemicalToolBoX update.
author Bjoern Gruening <bjoern.gruening@gmail.com>
date Fri, 19 Jul 2013 16:28:05 +0200
parents 6c496b524b41
children dce673edc031
files chemfp_clustering/butina_clustering.xml chemfp_clustering/nxn_clustering.py chemfp_clustering/nxn_clustering.xml chemfp_mol2fps/mol2fps.xml chemfp_sdf2fps/sdf2fps.xml force_pre-commit_hook_temp_file
diffstat 5 files changed, 144 insertions(+), 52 deletions(-) [+]
line wrap: on
line diff
--- a/chemfp_clustering/butina_clustering.xml	Sun Jun 02 19:53:56 2013 +0200
+++ b/chemfp_clustering/butina_clustering.xml	Fri Jul 19 16:28:05 2013 +0200
@@ -27,18 +27,25 @@
     </tests>
 <help>
 
-**Note**. You need molecular fingerprints in FPS format. Open Babel Fastsearch index is not supported.
+
+.. class:: infomark
 
-**What it does**
-Clustering of molecule libraries using the Taylor-Butina algorithm. This tool is based on the chemfp_ project.
+**What this tool does**
+
+Unsupervised non-hierarchical clustering method based on the Taylor-Butina algorithm, which guarantees that every cluster contains molecules which are within a distance cutoff of the central molecule. This tool is based on the chemfp_ project.
 
 .. _chemfp: http://chemfp.com/
 
 -----
 
-**Example**
+.. class:: infomark
+
+**Input**
 
-* input::
+| Molecular fingerprints in FPS format.
+| Open Babel Fastsearch index is not supported.
+
+* Example::
 
 	-  fingerprints in FPS format
 
@@ -56,7 +63,13 @@
 
 	- Tanimoto threshold : 0.8 (between 0 and 1)
 
-* output::
+-----
+
+.. class:: infomark
+
+**Output**
+
+* Example::
 
 	0 true singletons
 	=> 
@@ -68,10 +81,13 @@
 	55091849 has 12 other members
 	=> 6499094 6485578 55079807 3153534 55102353 55091466 55091416 6485577 55169009 55091752 55091467 55168823
 
+-----
 
-**References**
+.. class:: infomark
 
-Please reference the chemfp_ project.
+**Cite**
+
+The chemfp_ project from Andrew Dalke!
 
 .. _chemfp: http://chemfp.com/
 
--- a/chemfp_clustering/nxn_clustering.py	Sun Jun 02 19:53:56 2013 +0200
+++ b/chemfp_clustering/nxn_clustering.py	Fri Jul 19 16:28:05 2013 +0200
@@ -47,14 +47,17 @@
                     required=True,
                     help="Path to the input file.")
 
-    parser.add_argument("-o", "--output", dest="output_path",
-                    help="Path to the output file.")
+    parser.add_argument("-c", "--cluster", dest="cluster_image",
+                    help="Path to the output cluster image.")
+
+    parser.add_argument("-s", "--smatrix", dest="similarity_matrix",
+                    help="Path to the similarity matrix output file.")
 
     parser.add_argument("-t", "--threshold", dest="tanimoto_threshold", 
                     type=float, default=0.0,
                     help="Tanimoto threshold [0.0]")
 
-    parser.add_argument("--oformat", default='png', help="Output format (png, svg).")
+    parser.add_argument("--oformat", default='png', help="Output format (png, svg)")
 
     parser.add_argument('-p', '--processors', type=int, 
         default=4)
@@ -64,9 +67,14 @@
     targets = chemfp.open( args.input_path, format='fps' )
     arena = chemfp.load_fingerprints( targets )
     distances  = distance_matrix( arena, args.tanimoto_threshold )
-    linkage = hcluster.linkage( distances, method="single", metric="euclidean" )
+
+    if args.similarity_matrix:
+        distances.tofile( args.similarity_matrix )
 
-    hcluster.dendrogram(linkage, labels=arena.ids)
+    if args.cluster_image:
+        linkage = hcluster.linkage( distances, method="single", metric="euclidean" )
 
-    pylab.savefig( args.output_path, format=args.oformat )
+        hcluster.dendrogram(linkage, labels=arena.ids)
 
+        pylab.savefig( args.cluster_image, format=args.oformat )
+
--- a/chemfp_clustering/nxn_clustering.xml	Sun Jun 02 19:53:56 2013 +0200
+++ b/chemfp_clustering/nxn_clustering.xml	Fri Jul 19 16:28:05 2013 +0200
@@ -1,4 +1,4 @@
-<tool id="ctb_chemfp_nxn_clustering" name="NxN Clustering" version="0.1">
+<tool id="ctb_chemfp_nxn_clustering" name="NxN Clustering" version="0.2">
     <description>of molecular fingerprints</description>
     <requirements>
         <requirement type="package" version="1.7.0">numpy</requirement>
@@ -11,40 +11,53 @@
         nxn_clustering.py
             -i $infile
             -t $threshold
-            -o $outfile
+            #if str($output_files) in ['both', 'image']:
+                --cluster $image
+            #end if
+            #if str($output_files) in ['both', 'matrix']:
+                --smatrix $smilarity_matrix
+            #end if
             --oformat $oformat
     </command>
     <inputs>
         <param name="infile" type="data" format="fps" label="Finperprint dataset" help="Dataset missing? See TIP below"/>
         <param name='threshold' type='float' value='0.0' />
-
         <param name='oformat' type='select' format='text' label="Format of the resulting picture">
             <option value='png'>PNG</option>
             <option value='svg'>SVG</option>
         </param>
+        <param name='output_files' type='select' format='text' label="Output options">
+            <option value='both'>NxN matrix and Image</option>
+            <option value='image'>Image</option>
+            <option value='matrix'>NxN Matrix</option>
+        </param>
+
     </inputs>
     <outputs>
-        <data type="data" format="svg" name="outfile" label="${tool.name} on ${on_string}">
+        <data name="image" type="data" format="svg" label="${tool.name} on ${on_string} - Cluster Image">
+            <filter>output_files == "both" or output_files == "image"</filter>
             <change_format>
                 <when input="oformat" value="png" format="png"/>
             </change_format>
         </data>
+        <data name="smilarity_matrix" format="binary" label="${tool.name} on ${on_string} - Similarity Matrix">
+            <filter>output_files == "both" or output_files == "matrix"</filter>
+        </data>
     </outputs>
     <tests>
         <test>
             <param name="infile" ftype="fps" value="q.fps" />
-            <param value='0.75' />
+            <param name='treshold' value='0.75' />
+            <param name='output_files' value='image' />
             <output ftype="svg" name="outfile" file='NxN_Clustering_on_q.svg' />
         </test>
     </tests>
     <help>
 
-**Note**. You need molecular fingerprints in FPS format. Open Babel Fastsearch index is not supported.
+.. class:: infomark
 
-**Note**. Currently, that tool can only be used with a small dataset.
+**What this tool does**
 
-
-**What it does**
 Generating hierarchical clusters and visualizing clusters with dendrograms.
 For the clustering and the fingerprint handling the chemfp_ project is used.
 
@@ -52,9 +65,21 @@
 
 -----
 
-**Example**
+.. class:: warningmark
+
+**Hint**
+
+The plotting of the cluster image is sensible only with a small dataset.
+
+-----
 
-* input::
+.. class:: infomark
+
+**Input**
+
+Molecular fingerprints in FPS format. Open Babel Fastsearch index is not supported.
+
+* Example::
 
 	-  fingerprints in FPS format
 
@@ -72,11 +97,25 @@
 
 	- Tanimoto threshold : 0.8 (between 0 and 1)
 
-* output::
+-----
+
+.. class:: informark
+
+**Output**
+
+* Example::
+
+	.. image:: $PATH_TO_IMAGES/NxN_clustering.png
 
-	clustring plot
+-----
+
+.. class:: infomark
 
-.. image:: $PATH_TO_IMAGES/NxN_clustering.png
+**Cite**
+
+The chemfp_ project from Andrew Dalke!
+
+.. _chemfp: http://chemfp.com/
 
     </help>
 
--- a/chemfp_mol2fps/mol2fps.xml	Sun Jun 02 19:53:56 2013 +0200
+++ b/chemfp_mol2fps/mol2fps.xml	Fri Jul 19 16:28:05 2013 +0200
@@ -166,21 +166,26 @@
     </tests>
     <help>
 
+.. class:: infomark
 
-**What it does**
+**What this tool does**
 
-Generates different types of fingerprints from the `Open Babel`_ and RDkit_ project.
-This tool is using chemfp_. For more information please have a look at:
+This tool uses chemfp_ to calculate 10 different fingerprints of common file formats. Chemfp uses `Open Babel`_, OpenEye_ and RDKit_.
 
-    - http://code.google.com/p/rdkit/wiki/FingerprintsInTheRDKit
-    - http://openbabel.org/wiki/Tutorial:Fingerprints
+For more information check the websites listed below::
 
+	- http://code.google.com/p/rdkit/wiki/FingerprintsInTheRDKit
+	- http://openbabel.org/wiki/Tutorial:Fingerprints
 
 -----
 
-**Example**
+.. class:: infomark
+
+**Input**
 
-* input::
+FPS fingerprint file format
+
+* Example::
 	
 	      - SDF File
 
@@ -230,7 +235,13 @@
 
 		- type : FP2
 
-* output::
+-----
+
+.. class:: infomark
+
+**Output**
+
+* Example::
 
 	#FPS1
 	#num_bits=1021
@@ -242,16 +253,20 @@
 	0010000000020600208008000008000000c000c02c00002000000c00000100000008001400c800001c0180000000300
 	10000000000080000000c0000060000c0000060810000010000000800102000000	28434379
 
+-----
 
-**References**
+.. class:: infomark
 
-Please reference the `Open Babel`_ or RDKit_ project and the chemfp_ project.
+**Cite**
 
-N M O'Boyle, M Banck, C A James, C Morley, T Vandermeersch, and G R Hutchison. "Open Babel: An open chemical toolbox." J. Cheminf. (2011), 3, 33. `DOI:10.1186/1758-2946-3-33`_
-The Open Babel Package http://openbabel.sourceforge.net/
+| `Open Babel`_ 
+| RDKit_ project 
+| chemfp_ project.
+| 
+| N M O'Boyle, M Banck, C A James, C Morley, T Vandermeersch and G R Hutchison. `Open Babel: An open chemical toolbox.`_
 
-
-.. _DOI:10.1186/1758-2946-3-33: http://www.jcheminf.com/content/3/1/33
+.. _`Open Babel: An open chemical toolbox.`: http://www.jcheminf.com/content/3/1/33
+.. _OpenEye: http://www.eyesopen.com/
 .. _chemfp: http://chemfp.com/
 .. _RDKit: http://www.rdkit.org/
 .. _`Open Babel`: http://openbabel.org/
--- a/chemfp_sdf2fps/sdf2fps.xml	Sun Jun 02 19:53:56 2013 +0200
+++ b/chemfp_sdf2fps/sdf2fps.xml	Fri Jul 19 16:28:05 2013 +0200
@@ -18,18 +18,24 @@
     </tests>
     <help>
 
+.. class:: infomark
 
-**What it does**
+**What this tool does**
 
-Read a PubChem_ SD file and extract the fingerprints, to stores them in a FPS-file.
+Read an input SD file, extract the fingerprints and store them in a FPS-file.
 
 -----
 
-**Example**
-	* input::
+.. class:: infomark
+
+**Input**
+
+`SD-Format`_
+
+.. _`SD-Format`: http://en.wikipedia.org/wiki/Chemical_table_file
+
+* Example::
 	
-		SDF File
-
 		28434379
 		  -OEChem-02031205132D
 
@@ -74,7 +80,13 @@
 
 			&gt;
 
-* output::
+-----
+
+.. class:: infomark
+
+**Output**
+
+* Example::
 
 	#FPS1
 	#num_bits=881
@@ -88,13 +100,15 @@
 	8b2924101609401b13e4080000000000010020000004008000
 	0010000002000000000000	28434379
 
+-----
 
-**References**
+.. class:: infomark
 
-Please reference the chemfp_ project.
+**Cite**
+
+chemfp_ project
 
 .. _chemfp: http://chemfp.com/
-.. _PubChem: http://pubchem.ncbi.nlm.nih.gov/
 
 
     </help>