Mercurial > repos > peterjc > mira4_assembler

--- a/tools/mira4/README.rst	Fri Oct 11 04:28:45 2013 -0400
+++ b/tools/mira4/README.rst	Tue Oct 15 12:07:34 2013 -0400
@@ -1,5 +1,5 @@
-Galaxy tool to wrap the MIRA sequence assembly program (v4.0)
-=============================================================
+Galaxy wrapper for the MIRA assembly program (v4.0)
+===================================================

 This tool is copyright 2011-2013 by Peter Cock, The James Hutton Institute
 (formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
@@ -11,6 +11,11 @@
 It is available from the Galaxy Tool Shed at:
 http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler

+It uses a Galaxy datatype definition 'mira' for the MIRA Assembly Format,
+http://toolshed.g2.bx.psu.edu/view/peterjc/mira_datatypes
+
+A separate wrapper for MIRA v3.4 is available from the Galaxy Tool Shed at:
+http://toolshed.g2.bx.psu.edu/view/peterjc/mira_assembler

 Automated Installation
 ======================
@@ -23,9 +28,7 @@
 cluster settings for de novo usage (high RAM) and mapping (lower RAM).
 Consult the Galaxy adminstration documentation for your cluster setup.

-WARNING: This tool was developed to construct viral genome assembly and
-mapping pipelines, for which the run time and memory requirements are
-negligible. For larger tasks, be aware that MIRA can require vast amounts
+WARNING: For larger tasks, be aware that MIRA can require vast amounts
 of RAM and run-times of over a week are possible. This tool wrapper makes
 no attempt to spot and reject such large jobs.

@@ -50,7 +53,7 @@
   <tool file="mira4/mira4_de_novo.xml" />
   <tool file="mira4/mira4_mapping.xml" />

-You will also need to install MIRA, we used version 4.0 RC3. See:
+You will also need to install MIRA, we used version 4.0 RC4. See:

 * http://chevreux.org/projects_mira.html
 * http://sourceforge.net/projects/mira-assembler/
@@ -65,7 +68,7 @@
 ======= ======================================================================
 Version Changes
 ------- ----------------------------------------------------------------------
-v0.0.1  - Initial version (prototype for MIRA 4.0 RC3, based on wrapper for v3.4)
+v0.0.1  - Initial version (prototype for MIRA 4.0 RC4, based on wrapper for v3.4)
 ======= ======================================================================
--- a/tools/mira4/mira4.py	Fri Oct 11 04:28:45 2013 -0400
+++ b/tools/mira4/mira4.py	Tue Oct 15 12:07:34 2013 -0400
@@ -31,14 +31,13 @@
     return ver.split("\n", 1)[0]


-os.environ["PATH"] = "/mnt/galaxy/downloads/mira_4.0rc3_linux-gnu_x86_64_static/bin/:%s" % os.environ["PATH"]
+os.environ["PATH"] = "/mnt/galaxy/downloads/mira_4.0rc4_linux-gnu_x86_64_static/bin/:%s" % os.environ["PATH"]
 mira_binary = "mira"
 mira_ver = get_version(mira_binary)
 if not mira_ver.strip().startswith("4.0"):
     stop_err("This wrapper is for MIRA V4.0, not:\n%s" % mira_ver)
-if "-v" in sys.argv:
-    print "MIRA wrapper version %s," % WRAPPER_VER
-    print mira_ver
+if "-v" in sys.argv or "--version" in sys.argv:
+    print "%s, MIRA wrapper version %s" % (mira_ver, WRAPPER_VER)
     sys.exit(0)
--- a/tools/mira4/mira4_de_novo.xml	Fri Oct 11 04:28:45 2013 -0400
+++ b/tools/mira4/mira4_de_novo.xml	Tue Oct 15 12:07:34 2013 -0400
@@ -5,7 +5,7 @@
         <requirement type="binary">mira</requirement>
         <requirement type="package" version="4.0">MIRA</requirement>
     </requirements>
-    <version_command interpreter="python">mira4.py -v</version_command>
+    <version_command interpreter="python">mira4.py --version</version_command>
     <command interpreter="python">
 mira4.py $manifest $out_maf $out_fasta $out_log
     </command>
@@ -29,6 +29,13 @@
                 <option value="text">Synthetic reads (database entries, consensus sequences, artifical reads, etc)</option>
 		<!-- TODO reference/backbone as an entry here? -->
             </param>
+            <param name="segment_placement" type="select" label="Pairing type (segment placing)">
+                <option value="">None (e.g. single end sequencing)</option>
+                <option value="FR">---&gt; &lt;--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option>
+                <option value="RF">&lt;--- ---&gt; (e.g. Solexa/Illumina mate-pair library)</option>
+                <option value="SB">2---&gt; 1---&gt; (e.g. Roche 454 paired-end libraries or IonTorrent long-mate; see note)</option>
+                <option value="?">Unknown or not relevant (e.g. primer walking with Sanger capillary sequencing)</option>
+            </param>
 	    <param name="filenames" type="data" format="fastq,mira" multiple="true" required="true" label="Read file(s)"
 		   help="Multiple files allowed, for example paired reads can be given as two files (MIRA looks at read names to identify pairs)." />
         </repeat>
@@ -62,6 +69,10 @@
 technology = ${rg.technology}
 ##MIRA will accept multiple filenames on one data line, or multiple data lines
 #for $f in $rg.filenames
+#if str($rg.segment_placement) != ""
+##Record the segment placement (if any)
+segmentplacement = ${rg.segment_placement}
+#end if
 ##Must now map Galaxy datatypes to MIRA file types...
 #if $f.ext.startswith("fastq")
 ##MIRA doesn't like fastqsanger etc, just plain old fastq:
@@ -109,6 +120,19 @@

 It is particularly suited to small genomes such as bacteria.

+**Notes**
+
+.. class:: warningmark
+
+Note that the raw data for Roche 454 and Ion Torrent paired-end libraries
+sequences a circularised fragment such that the raw data starts with the
+end of the fragment, a linker, then the start of the fragment. This means
+both the start and end are sequenced from the same strand, and thus should
+be given to MIRA as orientation "2---&gt; 1---&gt;". However, in order to
+use this data with traditional tools expecting Sanger capillary style
+libraries which expect "---&gt; &lt;---" your FASTQ files may have been
+pre-processed to mimic this by reverse complementing one of the pair.
+
 **Citation**

 If you use this Galaxy tool in work leading to a scientific publication please
--- a/tools/mira4/mira4_mapping.xml	Fri Oct 11 04:28:45 2013 -0400
+++ b/tools/mira4/mira4_mapping.xml	Tue Oct 15 12:07:34 2013 -0400
@@ -5,7 +5,7 @@
         <requirement type="binary">mira</requirement>
         <requirement type="package" version="4.0">MIRA</requirement>
     </requirements>
-    <version_command interpreter="python">mira4.py -v</version_command>
+    <version_command interpreter="python">mira4.py --version</version_command>
     <command interpreter="python">
 mira4.py $manifest $out_maf $out_fasta $out_log
     </command>
@@ -38,6 +38,13 @@
                 <option value="pcbiohq">PacBio high quality (corrected)</option>
                 <option value="text">Synthetic reads (database entries, consensus sequences, artifical reads, etc)</option>
             </param>
+            <param name="segment_placement" type="select" label="Pairing type (segment placing)">
+                <option value="">None (e.g. single end sequencing)</option>
+                <option value="FR">---&gt; &lt;--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option>
+                <option value="RF">&lt;--- ---&gt; (e.g. Solexa/Illumina mate-pair library)</option>
+                <option value="SB">2---&gt; 1---&gt; (e.g. Roche 454 paired-end libraries or IonTorrent long-mate; see note)</option>
+                <option value="?">Unknown or not relevant (e.g. primer walking with Sanger capillary sequencing)</option>
+            </param>
             <param name="filenames" type="data" format="fastq,mira" multiple="true" required="true" label="Read file(s)"
                    help="Multiple files allowed, for example paired reads can be given as two files (MIRA looks at read names to identify pairs)." />
         </repeat>
@@ -97,6 +104,10 @@
 ##This is perhaps redundant as MIRA defaults to StrainX for the reads:
 strain = StrainX
 #end if
+#if str($rg.segment_placement) != ""
+##Record the segment placement (if any)
+segmentplacement = ${rg.segment_placement}
+#end if
 ##MIRA will accept multiple filenames on one data line, or multiple data lines
 #for $f in $rg.filenames
 ##Must now map Galaxy datatypes to MIRA file types...
@@ -149,6 +160,19 @@

 It is particularly suited to small genomes such as bacteria.

+**Notes**
+
+.. class:: warningmark
+
+Note that the raw data for Roche 454 and Ion Torrent paired-end libraries
+sequences a circularised fragment such that the raw data starts with the
+end of the fragment, a linker, then the start of the fragment. This means
+both the start and end are sequenced from the same strand, and thus should
+be given to MIRA as orientation "2---&gt; 1---&gt;". However, in order to
+use this data with traditional tools expecting Sanger capillary style
+libraries which expect "---&gt; &lt;---" your FASTQ files may have been
+pre-processed to mimic this by reverse complementing one of the pair.
+
 **Citation**

 If you use this Galaxy tool in work leading to a scientific publication please
--- a/tools/mira4/tool_dependencies.xml	Fri Oct 11 04:28:45 2013 -0400
+++ b/tools/mira4/tool_dependencies.xml	Tue Oct 15 12:07:34 2013 -0400
@@ -3,9 +3,9 @@
     <package name="MIRA" version="4.0">
         <install version="1.0">
             <actions>
-                <action type="download_by_url">https://downloads.sourceforge.net/project/mira-assembler/MIRA/stable/mira_4.0rc3_linux-gnu_x86_64_static.tar.bz2</action>
+                <action type="download_by_url">https://downloads.sourceforge.net/project/mira-assembler/MIRA/stable/mira_4.0rc4_linux-gnu_x86_64_static.tar.bz2</action>
                 <action type="move_directory_files">
-                    <source_directory>mira_4.0rc3_linux-gnu_x86_64_static/bin</source_directory>
+                    <source_directory>mira_4.0rc4_linux-gnu_x86_64_static/bin</source_directory>
                     <destination_directory>$INSTALL_DIR</destination_directory>
                 </action>
                 <action type="set_environment">