changeset 0:7b96d8a3262f draft

Uploaded v0.0.0, wrappers for the CLCbio assember and mapper only.
author peterjc
date Thu, 31 Oct 2013 07:57:41 -0400
parents
children 6c899e228df3
files tools/clc_assembly_cell/README.rst tools/clc_assembly_cell/clc_assembler.xml tools/clc_assembly_cell/clc_mapper.xml
diffstat 3 files changed, 400 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/clc_assembly_cell/README.rst	Thu Oct 31 07:57:41 2013 -0400
@@ -0,0 +1,121 @@
+Galaxy wrapper for the CLC Assembly Cell suite from CLCbio
+==========================================================
+
+This wrapper is copyright 2013 by Peter Cock, The James Hutton Institute
+(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
+See the licence text below.
+
+CLC Assembly Cell is the commercial command line assembly suite from CLCbio.
+It uses SIMD instructions to parallelize and accelerate their assembly
+algorithms, and is also very memory efficient making it an appealing choice
+for complex genomes where the RAM requirements exclude other popular tools.
+
+For more information:
+http://www.clcbio.com/products/clc-assembly-cell/
+
+You can download the CLC Assembly Cell User Manual here, currently v4.2
+http://www.clcbio.com/files/usermanuals/CLC_Assembly_Cell_User_Manual.pdf
+
+There is also an online manual here:
+http://clcsupport.com/clcassemblycell/current/index.php?manual=Introduction.html
+
+There is currently a free trial download here:
+http://www.clcbio.com/?action=transfer_user&productVersion=4.2&productID=6982&productName=CLC+Assembly+Cell&nonce=db842e3f95
+
+This wrapper is available from the Galaxy Tool Shed at:
+http://toolshed.g2.bx.psu.edu/view/peterjc/clc_assembly_cell
+
+This Galaxy wrapper was written and tested using CLC Assembly Cell
+version 4.10.86742
+
+
+Automated Installation
+======================
+
+This should be straightforward, Galaxy should automatically download and
+install the wrapper from the Galaxy Tool Shed. However, you will need to
+manually install the CLC Assembly Cell software, and setup the environment
+variable ``$CLC_ASSEMBLY_CELL`` to the directory containing the binaries
+(and in particular, the ``clc_assembler`` binary). For example:
+
+$ export CLC_ASSEMBLY_CELL=/opt/clcbio/clc-assembly-cell-4.1.0-linux_64/
+
+
+Manual Installation
+===================
+
+First install the CLC Assembly Cell sortware as described above.
+
+To install the wrapper copy or move the following files under the Galaxy tools
+folder, e.g. in a tools/clcbio folder:
+
+* clc_assembler.xml (Galaxy tool definition)
+* clc_mapper.xml (Galaxy tool definition)
+* README.rst (this file)
+
+You will also need to modify the tools_conf.xml file to tell Galaxy to offer the
+tools. Just all these line, for example next to other assembly tools::
+
+  <tool file="clc_assembly_cell/clc_assembler.xml" />
+  <tool file="clc_assembly_cell/clc_mapper.xml" />
+
+If you wish to run the unit tests, also add this to tools_conf.xml.sample
+and move/copy the test-data files under Galaxy's test-data folder. Then::
+
+    $ ./run_functional_tests.sh -id clc_assembler
+
+That's it.
+
+
+History
+=======
+
+======= ======================================================================
+Version Changes
+------- ----------------------------------------------------------------------
+v0.0.1  - Initial public release
+======= ======================================================================
+
+
+Developers
+==========
+
+Development is on this itHub repository:
+https://github.com/peterjc/pico_galaxy/tree/master/tools/clc_assembly_cell
+
+For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use
+the following command from the Galaxy root folder::
+
+    $ tar -czf clcbio.tar.gz tools/clc_assembly_cell/README.rst tools/clc_assembly_cell/clc_assembler.xml tools/clc_assembly_cell/clc_mapper.xml
+
+Check this worked::
+
+    $ tar -tzf clcbio.tar.gz
+    tools/clc_assembly_cell/README.rst
+    tools/clc_assembly_cell/clc_assembler.xml
+    tools/clc_assembly_cell/clc_mapper.xml
+
+
+Licence (MIT)
+=============
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+
+NOTE: This is the licence for the Galaxy Wrapper only. The CLCbio tools are
+commercial, and are available and licenced separately.
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/clc_assembly_cell/clc_assembler.xml	Thu Oct 31 07:57:41 2013 -0400
@@ -0,0 +1,122 @@
+<tool id="clc_assembler" name="CLC assembler" version="0.0.1">
+    <description>Assembles reads giving a FASTA file</description>
+    <requirements>
+        <requirement type="binary">clc_assembler</requirement>
+    </requirements>
+    <version_command>/mnt/apps/clcBio/clc-assembly-cell-4.1.0-linux_64/clc_assembler | grep -i version</version_command>
+    <command>/mnt/apps/clcBio/clc-assembly-cell-4.1.0-linux_64/clc_assembler
+#for $rg in $read_group
+##--------------------------------------
+#if str($rg.segments.type) == "paired"
+-p $rg.segments.placement $rg.segments.dist_mode $rg.segments.min_size $rg.segments.max_size -q -i "$rg.segments.filename1" "$rg.segments.filename2"
+#end if
+##--------------------------------------
+#if str($rg.segments.type) == "interleaved"
+-p $rg.segments.placement $rg.segments.dist_mode $rg.segments.min_size $rg.segments.max_size -q "$rg.segments.filename"
+#end if
+##--------------------------------------
+#if str($rg.segments.type) == "none"
+-p no -q
+#for $f in $rg.segments.filenames
+"$f"
+#end for
+#end if
+##--------------------------------------
+#end for
+-o "$out_fasta"
+--cpus \$GALAXY_SLOTS
+-v | grep -v "^Progress: "</command>
+    <stdio>
+        <!-- Assume anything other than zero is an error -->
+        <exit_code range="1:" />
+        <exit_code range=":-1" />
+    </stdio>
+    <inputs>
+        <repeat name="read_group" title="Read Group" min="1">
+            <conditional name="segments">
+                <param name="type" type="select" label="Are these paired reads?">
+                    <option value="paired">Paired reads (as two files)</option>
+		    <option value="interleaved">Paired reads (as one interleaved file)</option>
+                    <option value="none">Unpaired reads (single or orphan reads)</option>
+                </param>
+                <when value="paired">
+                    <param name="placement" type="select" label="Pairing type (segment placing)">
+                        <option value="fb">---&gt; &lt;--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option>
+                        <option value="bf">&lt;--- ---&gt; (e.g. Solexa/Illumina mate-pair library)</option>
+                        <option value="ff">---&gt; ---&gt;</option>
+                        <option value="bb">&lt;--- &lt;---</option>
+                    </param>
+		    <param name="dist_mode" type="select" label="How is the fragment distance measured?">
+                        <option value="ss">Start to start (e.g. Sanger capillary or Solexa/Illumina libraries)</option>
+                        <option value="se">Start to end</option>
+                        <option value="es">End to start</option>
+                        <option value="ee">End to end</option>
+                    </param>
+                    <!-- TODO - min/max validation done via the <code> tag? -->
+                    <param name="min_size" type="integer" optional="false" min="0" value=""
+                           label="Minimum size of 'good' DNA templates in the library preparation" />
+                    <param name="max_size" type="integer" optional="false" min="0" value=""
+                           label="Maximum size of 'good' DNA templates in the library preparation" />
+		    <param name="filename1" type="data" format="fastq,fasta" required="true" label="Read file one"/>
+                    <param name="filename2" type="data" format="fastq,fasta" required="true" label="Read file two"/>
+                </when>
+                <when value="interleaved">
+                    <param name="placement" type="select" label="Pairing type (segment placing)">
+                        <option value="fb">---&gt; &lt;--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option>
+                        <option value="bf">&lt;--- ---&gt; (e.g. Solexa/Illumina mate-pair library)</option>
+                        <option value="ff">---&gt; ---&gt;</option>
+                        <option value="bb">&lt;-- &lt;--</option>
+                    </param>
+                    <param name="dist_mode" type="select" label="How is the fragment distance measured?">
+                        <option value="ss">Start to start (e.g. Sanger capillary or Solexa/Illumina libraries)</option>
+                        <option value="se">Start to end</option>
+                        <option value="es">End to start</option>
+                        <option value="ee">End to end</option>
+                    </param>
+                    <!-- TODO - min/max validation done via the <code> tag? -->
+                    <param name="min_size" type="integer" optional="false" min="0" value=""
+                           label="Minimum size of 'good' DNA templates in the library preparation" />
+                    <param name="max_size" type="integer" optional="false" min="0" value=""
+                           label="Maximum size of 'good' DNA templates in the library preparation" />
+                    <param name="filename" type="data" format="fastq,fasta" required="true" label="Interleaved read file"/>
+                </when>
+                <when value="none">
+                    <param name="filenames" type="data" format="fastq,fasta" multiple="true" required="true" label="Read file(s)"
+                           help="Multiple files allowed, for example several files of orphan reads." />
+		</when>
+            </conditional>
+        </repeat>
+	<!-- Word size? -->
+	<!-- Bubble size? -->
+	<!-- Scaffolding options? -->
+        <!-- Minimum contig length? -->
+        <!-- AGP / GFF output? -->
+    </inputs>
+    <!-- min/max validation? <code file="clc_validator.py" /> -->
+    <outputs>
+        <data name="out_fasta" format="fasta" label="CLCbio assember contigs (FASTA)" />
+    </outputs>
+    <tests>
+        <!-- TODO -->
+    </tests>
+    <help>
+
+**What it does**
+
+Runs the ``clc_assembler`` tool giving a FASTA output file. You would then
+typically map the same set of reads onto this assembly using ``cls_mapper``
+to any perform downstream analysis using the mapped reads.
+
+
+**Citation**
+
+If you use this Galaxy tool in work leading to a scientific publication please
+cite this wrapper as:
+
+Peter J.A. Cock (2013), Galaxy wrapper for the CLC Assembly Cell suite from CLCbio
+http://toolshed.g2.bx.psu.edu/view/peterjc/clc_assembly_cell
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/clc_assembly_cell
+    </help>
+</tool>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/clc_assembly_cell/clc_mapper.xml	Thu Oct 31 07:57:41 2013 -0400
@@ -0,0 +1,157 @@
+<tool id="clc_mapper" name="CLC Mapper" version="0.0.1">
+    <description>Maps reads giving a SAM/BAM file</description>
+    <requirements>
+        <requirement type="binary">clc_mapper</requirement>    
+        <requirement type="binary">clc_cas_to_sam</requirement>
+        <requirement type="binary">samtools</requirement>
+        <requirement type="package" version="0.1.19">samtools</requirement>
+    </requirements>
+    <version_command>/mnt/apps/clcBio/clc-assembly-cell-4.1.0-linux_64/clc_mapper | grep -i version</version_command>
+    <command>echo Mapping reads with clc_mapper...
+&amp;&amp; /mnt/apps/clcBio/clc-assembly-cell-4.1.0-linux_64/clc_mapper
+#for $ref in $references
+#if str($ref.type)=="circular"
+-d -z "$ref.ref_file"
+#else
+-d "$ref.ref_file"
+#end if
+#end for
+#for $rg in $read_group
+##--------------------------------------
+#if str($rg.segments.type) == "paired"
+-p $rg.segments.placement $rg.segments.dist_mode $rg.segments.min_size $rg.segments.max_size -q -i "$rg.segments.filename1" "$rg.segments.filename2"
+#end if
+##--------------------------------------
+#if str($rg.segments.type) == "interleaved"
+-p $rg.segments.placement $rg.segments.dist_mode $rg.segments.min_size $rg.segments.max_size -q "$rg.segments.filename"
+#end if
+##--------------------------------------
+#if str($rg.segments.type) == "none"
+-p no -q
+#for $f in $rg.segments.filenames
+"$f"
+#end for
+#end if
+##--------------------------------------
+#end for
+-o "temp_job.cas"
+--cpus \$GALAXY_SLOTS
+## TODO - filtering out the progress lines seems to mess up the multiple commands
+## | grep -v "^Progress: "
+##===========================================
+## TODO - I've required all the input in Sanger FASTQ format (or FASTA) so can
+## use the offset 33, rather then the CLCbio default of 64 which is only for
+## obsolete Illumina FASTQ files. Really need this option per input file...
+&amp;&amp; echo Converting CAS file to BAM with clc_cas_to_sam...
+&amp;&amp; /mnt/apps/clcBio/clc-assembly-cell-4.1.0-linux_64/clc_cas_to_sam --cas "temp_job.cas" -o "temp_job.bam" --no-progress --qualityoffset 33
+&amp;&amp; rm "temp_job.cas"
+##===========================================
+&amp;&amp; echo Sorting BAM file with samtools...
+&amp;&amp; samtools sort "temp_job.bam" "temp_sorted"
+&amp;&amp; mv "temp_sorted.bam" "$out_bam"
+&amp;&amp; echo Indexing BAM file with samtools...
+&amp;&amp; samtools index "$out_bam"</command>
+    <stdio>
+        <!-- Assume anything other than zero is an error -->
+        <exit_code range="1:" />
+        <exit_code range=":-1" />
+    </stdio>
+    <!-- Job splitting with merge via clc_join_mappings? -->
+    <inputs>
+        <!-- Support linear and circular references (-z) -->
+	<repeat name="references" title="Reference Sequence" min="1">
+            <param name="ref_file" type="data" format="fasta" required="true" label="Reference sequence(s) (FASTA)" />
+	    <param name="type" type="select" label="Reference type">
+                <option value="linear">Linear (e.g. most chromosomes)</option>
+                <option value="circular">Circular (e.g. bacterial chromosomes, mitochondria)</option>
+            </param>
+	</repeat>
+        <repeat name="read_group" title="Read Group" min="1">
+            <conditional name="segments">
+                <param name="type" type="select" label="Are these paired reads?">
+                    <option value="paired">Paired reads (as two files)</option>
+                    <option value="interleaved">Paired reads (as one interleaved file)</option>
+                    <option value="none">Unpaired reads (single or orphan reads)</option>
+                </param>
+                <when value="paired">
+                    <param name="placement" type="select" label="Pairing type (segment placing)">
+                        <option value="fb">---&gt; &lt;--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option>
+                        <option value="bf">&lt;--- ---&gt; (e.g. Solexa/Illumina mate-pair library)</option>
+                        <option value="ff">---&gt; ---&gt;</option>
+                        <option value="bb">&lt;--- &lt;---</option>
+                    </param>
+                    <param name="dist_mode" type="select" label="How is the fragment distance measured?">
+                        <option value="ss">Start to start (e.g. Sanger capillary or Solexa/Illumina libraries)</option>
+                        <option value="se">Start to end</option>
+                        <option value="es">End to start</option>
+                        <option value="ee">End to end</option>
+                    </param>
+                    <!-- TODO - min/max validation done via the <code> tag? -->
+                    <param name="min_size" type="integer" optional="false" min="0" value=""
+                           label="Minimum size of 'good' DNA templates in the library preparation" />
+                    <param name="max_size" type="integer" optional="false" min="0" value=""
+                           label="Maximum size of 'good' DNA templates in the library preparation" />
+                    <param name="filename1" type="data" format="fastqsanger,fasta" required="true" label="Read file one"
+                           help="FASTA or Sanger FASTQ accepted." />
+                    <param name="filename2" type="data" format="fastqsanger,fasta" required="true" label="Read file two"
+                           help="FASTA or Sanger FASTQ accepted." />
+                </when>
+                <when value="interleaved">
+                    <param name="placement" type="select" label="Pairing type (segment placing)">
+                        <option value="fb">---&gt; &lt;--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option>
+                        <option value="bf">&lt;--- ---&gt; (e.g. Solexa/Illumina mate-pair library)</option>
+                        <option value="ff">---&gt; ---&gt;</option>
+                        <option value="bb">&lt;-- &lt;--</option>
+                    </param>
+                    <param name="dist_mode" type="select" label="How is the fragment distance measured?">
+                        <option value="ss">Start to start (e.g. Sanger capillary or Solexa/Illumina libraries)</option>
+                        <option value="se">Start to end</option>
+                        <option value="es">End to start</option>
+                        <option value="ee">End to end</option>
+                    </param>
+                    <!-- TODO - min/max validation done via the <code> tag? -->
+                    <param name="min_size" type="integer" optional="false" min="0" value=""
+                           label="Minimum size of 'good' DNA templates in the library preparation" />
+                    <param name="max_size" type="integer" optional="false" min="0" value=""
+                           label="Maximum size of 'good' DNA templates in the library preparation" />
+                    <param name="filename" type="data" format="fastqsanger,fasta" required="true" label="Interleaved read file"
+                           help="FASTA or Sanger FASTQ accepted."/>
+                </when>
+                <when value="none">
+                    <param name="filenames" type="data" format="fastqsanger,fasta" multiple="true" required="true" label="Read file(s)"
+                           help="Multiple files allowed, for example several files of orphan reads. FASTA or Sanger FASTQ accepted." />
+                </when>
+            </conditional>
+        </repeat>
+        <!-- Length fraction (-l), default 0.5 -->
+        <!-- Similarity (-s), default 0.8 -->
+	<!-- Option for unmapped reads via clc_unmapped_reads ? -->
+    </inputs>
+    <outputs>
+        <data name="out_bam" format="bam" label="CLCbio mapping (BAM)" />
+    </outputs>
+    <tests>
+        <!-- TODO -->
+    </tests>
+    <help>
+
+**What it does**
+
+Runs the CLCbio tool ``clc_mapper`` which produces a proprietary binary
+CAS format file, which is immediately processed using ``cls_cas_to_sam``
+to generate a self-contained standard BAM file, which is then sorted
+and indexed using ``samtools``.
+
+
+**Citation**
+
+If you use this Galaxy tool in work leading to a scientific publication please
+cite this wrapper as:
+
+Peter J.A. Cock (2013), Galaxy wrapper for the CLC Assembly Cell suite from CLCbio
+http://toolshed.g2.bx.psu.edu/view/peterjc/clc_assembly_cell
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/clc_assembly_cell
+    </help>
+</tool>