changeset 9:302d13490b23 draft

Uploaded v0.0.2 preview 1, BAM output
author peterjc
date Thu, 28 Nov 2013 05:07:59 -0500
parents 2ab1d6f6a8ec
children 79759fdec6cb
files tools/mira4/README.rst tools/mira4/mira4.py tools/mira4/mira4_bait.py tools/mira4/mira4_bait.xml tools/mira4/mira4_de_novo.xml tools/mira4/mira4_make_bam.py tools/mira4/mira4_mapping.xml tools/mira4/tool_dependencies.xml
diffstat 8 files changed, 442 insertions(+), 43 deletions(-) [+]
line wrap: on
line diff
--- a/tools/mira4/README.rst	Mon Oct 28 05:44:46 2013 -0400
+++ b/tools/mira4/README.rst	Thu Nov 28 05:07:59 2013 -0500
@@ -21,8 +21,8 @@
 ======================
 
 This should be straightforward. Via the Tool Shed, Galaxy should automatically
-install the 'mira' datatype, download and install the precompiled binary for
-MIRA v4.0 for the Galaxy wrapper, and run any tests.
+install the 'mira' datatype, samtools, and download and install the precompiled
+binary for MIRA v4.0 for the Galaxy wrapper, and run any tests.
 
 For MIRA 4, the Galaxy wrapper has been split in two, allowing separate
 cluster settings for de novo usage (high RAM) and mapping (lower RAM).
@@ -42,20 +42,22 @@
 
 There are four Galaxy files to install:
 
-* mira4.py (the Python wrapper script)
-* mira4_validator.py (the Python parameter validation script)
-* mira4_de_novo.xml (the Galaxy tool definition for de novo usage)
-* mira4_mapping.xml (the Galaxy tool definition for mapping usage)
+* ``mira4_de_novo.xml`` (the Galaxy tool definition for de novo usage)
+* ``mira4_mapping.xml`` (the Galaxy tool definition for mapping usage)
+* ``mira4_bait.xml`` (the Galaxy tool definition for mirabait)
+* ``mira4.py`` (the Python wrapper script)
+* ``mira_bait.py`` (the Python wrapper script for mirabait)
+* ``mira4_validator.py`` (the XML parameter validation script)
 
-The suggested location is a new tools/mira4 folder. You will also need to
-modify the tools_conf.xml file to tell Galaxy to offer the tool, and also do
-this to tools_conf.xml.sample in order to run any tests::
+The suggested location is a new ``tools/mira4`` folder. You will also need to
+modify the ``tools_conf.xml`` file to tell Galaxy to offer the tool, and also do
+this to ``tools_conf.xml.sample`` in order to run the tests::
 
   <tool file="mira4/mira4_de_novo.xml" />
   <tool file="mira4/mira4_mapping.xml" />
 
 You will also need to install MIRA, we used version 4.0 RC4, and define the
-environment variable $MIRA4 pointing at the folder containing the binaries.
+environment variable ``$MIRA4`` pointing at the folder containing the binaries.
 See:
 
 * http://chevreux.org/projects_mira.html
@@ -64,6 +66,16 @@
 You may wish to use different cluster setups for the de novo and mapping
 tools, see above.
 
+You will also need to install samtools (for generating a BAM file from MIRA's
+SAM output).
+
+After copying (or symlinking) the ``test-data`` files under Galaxy's ``test-data``
+folder, you can run the tests with::
+
+    $ ./run_functional_tests.sh -id mira_4_0_bait
+    $ ./run_functional_tests.sh -id mira_4_0_de_novo
+    $ ./run_functional_tests.sh -id mira_4_0_mapping
+
 
 History
 =======
@@ -72,6 +84,7 @@
 Version Changes
 ------- ----------------------------------------------------------------------
 v0.0.1  - Initial version (prototype for MIRA 4.0 RC4, based on wrapper for v3.4)
+v0.0.2  - Include BAM output (using ``miraconvert`` and ``samtools``).
 ======= ======================================================================
 
 
@@ -84,7 +97,7 @@
 For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use
 the following command from the Galaxy root folder::
 
-    $ tar -czf mira4_wrapper.tar.gz tools/mira4/README.rst tools/mira4/mira4_de_novo.xml tools/mira4/mira4_mapping.xml tools/mira4/mira4.py tools/mira4/mira4_validator.py tools/mira4/tool_dependencies.xml test-data/tvc_mini.fastq test-data/tvc_contigs.fasta test-data/tvc_map_ref_strain.fasta test-data/tvc_map_same_strain.fasta test-data/tvc_bait.fasta test-data/tvc_mini_bait_pos.fastq test-data/tvc_mini_bait_strict.fastq test-data/tvc_mini_bait_neg.fastq
+    $ tar -czf mira4_wrapper.tar.gz tools/mira4/README.rst tools/mira4/mira4_de_novo.xml tools/mira4/mira4_mapping.xml tools/mira4/mira4_bait.xml tools/mira4/mira4.py tools/mira4/mira4_make_bam.py tools/mira4/mira4_validator.py tools/mira4/mira4_bait.py tools/mira4/tool_dependencies.xml test-data/tvc_mini.fastq test-data/tvc_contigs.fasta test-data/tvc_map_ref_strain.fasta test-data/tvc_map_same_strain.fasta test-data/tvc_bait.fasta test-data/tvc_mini_bait_pos.fastq test-data/tvc_mini_bait_strict.fastq test-data/tvc_mini_bait_neg.fastq
 
 Check this worked::
 
@@ -92,8 +105,11 @@
     tools/mira4/README.rst
     tools/mira4/mira4_de_novo.xml
     tools/mira4/mira4_mapping.xml
+    tools/mira4/mira4_bait.xml
     tools/mira4/mira4.py
+    tools/mira4/mira4_make_bam.py
     tools/mira4/mira4_validator.py
+    tools/mira4/mira4_bait.py
     tools/mira4/tool_dependencies.xml
     test-data/tvc_mini.fastq
     test-data/tvc_contigs.fasta
--- a/tools/mira4/mira4.py	Mon Oct 28 05:44:46 2013 -0400
+++ b/tools/mira4/mira4.py	Thu Nov 28 05:07:59 2013 -0500
@@ -7,6 +7,9 @@
 import shutil
 import time
 
+#Do we need any PYTHONPATH magic?
+from mira4_make_bam import make_bam
+
 WRAPPER_VER = "0.0.1" #Keep in sync with the XML file
 
 def stop_err(msg, err=1):
@@ -28,7 +31,7 @@
         sys.exit(1)
     ver, tmp = child.communicate()
     del child
-    return ver.split("\n", 1)[0]
+    return ver.split("\n", 1)[0].strip()
 
 
 try:
@@ -38,12 +41,20 @@
 mira_binary = os.path.join(mira_path, "mira")
 if not os.path.isfile(mira_binary):
     stop_err("Missing mira under $MIRA4, %r" % mira_binary)
+mira_convert = os.path.join(mira_path, "miraconvert")
+if not os.path.isfile(mira_convert):
+    stop_err("Missing miraconvert under $MIRA4, %r" % mira_convert)
 
 mira_ver = get_version(mira_binary)
 if not mira_ver.strip().startswith("4.0"):
-    stop_err("This wrapper is for MIRA V4.0, not:\n%s" % mira_ver)
+    stop_err("This wrapper is for MIRA V4.0, not:\n%s\n%s" % (mira_ver, mira_binary))
+mira_convert_ver = get_version(mira_convert)
+if not mira_convert_ver.strip().startswith("4.0"):
+    stop_err("This wrapper is for MIRA V4.0, not:\n%s\n%s" % (mira_ver, mira_convert))
 if "-v" in sys.argv or "--version" in sys.argv:
     print "%s, MIRA wrapper version %s" % (mira_ver, WRAPPER_VER)
+    if mira_ver != mira_convert_ver:
+        print "WARNING: miraconvert %s" % mira_convert_ver
     sys.exit(0)
 
 def fix_threads(manifest):
@@ -78,6 +89,7 @@
 
 
 def collect_output(temp, name, handle):
+    """Moves files to the output filenames (global variables)."""
     n3 = (temp, name, name, name)
     f = "%s/%s_assembly/%s_d_results" % (temp, name, name)
     if not os.path.isdir(f):
@@ -95,12 +107,15 @@
 
     #De novo or single strain mapping,
     old_fasta = "%s/%s_out.unpadded.fasta" % (f, name)
+    ref_fasta = "%s/%s_out.padded.fasta" % (f, name)
     if not os.path.isfile(old_fasta):
-        #Mapping (currently StrainX versus reference)
+        #Mapping (StrainX versus reference) or de novo
         old_fasta = "%s/%s_out_StrainX.unpadded.fasta" % (f, name)
+        ref_fasta = "%s/%s_out_StrainX.padded.fasta" % (f, name)
     if not os.path.isfile(old_fasta):
-        #Triggered extractLargeContigs.sh?
-        old_fasta = "%s/%s_LargeContigs_out.fasta" % (f, name)
+        old_fasta = "%s/%s_out_ReferenceStrain.unpadded.fasta" % (f, name)
+        ref_fasta = "%s/%s_out_ReferenceStrain.padded.fasta" % (f, name)
+        
 
     missing = False
     for old, new in [(old_maf, out_maf),
@@ -116,6 +131,12 @@
         for filename in sorted(os.listdir(f)):
             sys.stderr.write("%s\n" % filename)
 
+    #For mapping mode, probably most people would expect a BAM file
+    #using the reference FASTA file...
+    msg = make_bam(mira_convert, out_maf, ref_fasta, out_bam, handle)
+    if msg:
+        stop_err(msg)
+
 def clean_up(temp, name):
     folder = "%s/%s_assembly" % (temp, name)
     if os.path.isdir(folder):
@@ -127,7 +148,7 @@
 temp = "."
 #name, out_fasta, out_qual, out_ace, out_caf, out_wig, out_log = sys.argv[1:8]
 name = "MIRA"
-manifest, out_maf, out_fasta, out_log = sys.argv[1:5]
+manifest, out_maf, out_bam, out_fasta, out_log = sys.argv[1:]
 
 fix_threads(manifest)
 
@@ -180,8 +201,8 @@
 handle.write("\n")
 handle.write("============================ MIRA has finished ===============================\n")
 handle.write("MIRA took %0.2f hours\n" % (run_time / 3600.0))
-print "MIRA took %0.2f hours" % (run_time / 3600.0)
 if return_code:
+    print "MIRA took %0.2f hours" % (run_time / 3600.0)
     handle.write("Return error code %i from command:\n" % return_code)
     handle.write(cmd + "\n")
     handle.close()
@@ -202,7 +223,11 @@
     handle.flush()
 
 #print "Collecting output..."
+start_time = time.time()
 collect_output(temp, name, handle)
+collect_time = time.time() - start_time
+handle.write("MIRA took %0.2f hours; collecting output %0.2f minutes\n" % (run_time / 3600.0, collect_time / 60.0))
+print("MIRA took %0.2f hours; collecting output %0.2f minutes\n" % (run_time / 3600.0, collect_time / 60.0))
 
 if os.path.isfile("MIRA_assembly/MIRA_d_results/ec.log"):
     #Treat as an error, but doing this AFTER collect_output
@@ -214,5 +239,6 @@
 #print "Cleaning up..."
 clean_up(temp, name)
 
+handle.write("\nDone\n")
 handle.close()
 print("Done")
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4/mira4_bait.py	Thu Nov 28 05:07:59 2013 -0500
@@ -0,0 +1,114 @@
+#!/usr/bin/env python
+"""A simple wrapper script to call MIRA4's mirabait and collect its output.
+"""
+import os
+import sys
+import subprocess
+import shutil
+import time
+
+WRAPPER_VER = "0.0.1" #Keep in sync with the XML file
+
+def stop_err(msg, err=1):
+    sys.stderr.write(msg+"\n")
+    sys.exit(err)
+
+
+def get_version(mira_binary):
+    """Run MIRA to find its version number"""
+    # At the commend line I would use: mira -v | head -n 1
+    # however there is some pipe error when doing that here.
+    cmd = [mira_binary, "-v"]
+    try:
+        child = subprocess.Popen(cmd,
+                                 stdout=subprocess.PIPE,
+                                 stderr=subprocess.STDOUT)
+    except Exception, err:
+        sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
+        sys.exit(1)
+    ver, tmp = child.communicate()
+    del child
+    #Workaround for -v not working in mirabait 4.0RC4
+    if "invalid option" in ver.split("\n", 1)[0]:
+        for line in ver.split("\n", 1):
+            if " version " in line:
+                line = line.split()
+                return line[line.index("version")+1].rstrip(")")
+        stop_err("Could not determine MIRA version:\n%s" % ver)
+    return ver.split("\n", 1)[0]
+
+try:
+    mira_path = os.environ["MIRA4"]
+except ImportError:
+    stop_err("Environment variable $MIRA4 not set")
+mira_binary = os.path.join(mira_path, "mirabait")
+if not os.path.isfile(mira_binary):
+    stop_err("Missing mirabait under $MIRA4, %r" % mira_binary)
+mira_ver = get_version(mira_binary)
+if not mira_ver.strip().startswith("4.0"):
+    stop_err("This wrapper is for MIRA V4.0, not:\n%s" % mira_ver)
+if "-v" in sys.argv or "--version" in sys.argv:
+    print "%s, MIRA wrapper version %s" % (mira_ver, WRAPPER_VER)
+    sys.exit(0)
+
+
+format, output_choice, strand_choice, kmer_length, min_occurance, bait_file, in_file, out_file = sys.argv[1:]
+
+if format.startswith("fastq"):
+    format = "fastq"
+elif format == "mira":
+    format = "maf"
+elif format != "fasta":
+    stop_err("Was not expected format %r" % format)
+
+assert out_file.endswith(".dat")
+out_file_stem = out_file[:-4]
+
+cmd_list = [mira_binary, "-f", format, "-t", format,
+            "-k", kmer_length, "-n", min_occurance,
+            bait_file, in_file, out_file_stem]
+if output_choice == "pos":
+    pass
+elif output_choice == "neg":
+    #Invert the selection...
+    cmd_list.insert(1, "-i")
+else:
+    stop_err("Output choice should be 'pos' or 'neg', not %r" % output_choice)
+if strand_choice == "both":
+    pass
+elif strand_choice == "fwd":
+    #Ingore reverse strand...
+    cmd_list.insert(1, "-r")
+else:
+    stop_err("Strand choice should be 'both' or 'fwd', not %r" % strand_choice)
+
+cmd = " ".join(cmd_list)
+#print cmd
+start_time = time.time()
+try:
+    #Run MIRA
+    child = subprocess.Popen(cmd_list,
+                             stdout=subprocess.PIPE,
+                             stderr=subprocess.STDOUT)
+except Exception, err:
+    log_manifest(manifest)
+    sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
+    sys.exit(1)
+#Use .communicate as can get deadlocks with .wait(),
+stdout, stderr = child.communicate()
+assert stderr is None # Due to way we ran with subprocess
+run_time = time.time() - start_time
+return_code = child.returncode
+print "mirabait took %0.2f minutes" % (run_time / 60.0)
+
+if return_code:
+    sys.stderr.write(stdout)
+    stop_err("Return error code %i from command:\n%s" % (return_code, cmd),
+             return_code)
+
+#Capture output
+out_tmp = out_file_stem + "." + format
+if not os.path.isfile(out_tmp):
+    sys.stderr.write(stdout)
+    stop_err("Missing output file from mirabait: %s" % out_tmp)
+shutil.move(out_tmp, out_file)
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4/mira4_bait.xml	Thu Nov 28 05:07:59 2013 -0500
@@ -0,0 +1,120 @@
+<tool id="mira_4_0_bait" name="MIRA v4.0 mirabait" version="0.0.1">
+    <description>Filter reads using kmer matches</description>
+    <requirements>
+        <requirement type="binary">mirabait</requirement>
+        <requirement type="package" version="4.0">MIRA</requirement>
+    </requirements>
+    <version_command interpreter="python">mira4_bait.py --version</version_command>
+    <command interpreter="python">
+mira4_bait.py $input_reads.ext $output_choice $strand_choice $kmer_length $min_occurence "$bait_file" "$input_reads" "$output_reads"
+    </command>
+    <stdio>
+        <!-- Assume anything other than zero is an error -->
+        <exit_code range="1:" />
+        <exit_code range=":-1" />
+    </stdio>
+    <inputs>
+	<param name="bait_file" type="data" format="fasta,fastq,mira" required="true" label="Bait file (what to look for)" />
+	<param name="input_reads" type="data" format="fasta,fastq,mira" required="true" label="Reads to search" />
+        <param name="output_choice" type="select" label="Output positive matches, or negative matches?">
+            <option value="pos">Just positive matches</option>
+            <option value="neg">Just negative matches</option>
+        </param>
+        <param name="strand_choice" type="select" label="Check for matches on both strands?">
+            <option value="both">Check both strands</option>
+            <option value="fwd">Just forward strand</option>
+        </param>
+	<param name="kmer_length" type="integer" value="31" min="1" max="32"
+	       label="k-mer length" help="Maximum 32" />
+	<param name="min_occurence" type="integer" value="1" min="1"
+	       label="Minimum k-mer occurence"
+	       help="How many k-mer matches do you want per read? Minimum one" />
+    </inputs>
+    <outputs>
+        <data name="output_reads" format="fasta" label="$input_reads.name #if str($output_choice)=='pos' then 'matching' else 'excluding matches to' # $bait_file.name">
+            <!-- TODO - Replace this with format="input:input_reads" if/when that works -->
+            <change_format>
+                <when input_dataset="input_reads" attribute="extension" value="fastq" format="fastq" />
+                <when input_dataset="input_reads" attribute="extension" value="fastqsanger" format="fastqsanger" />
+                <when input_dataset="input_reads" attribute="extension" value="fastqsolexa" format="fastqsolexa" />
+                <when input_dataset="input_reads" attribute="extension" value="fastqillumina" format="fastqillumina" />
+                <when input_dataset="input_reads" attribute="extension" value="fastqcssanger" format="fastqcssanger" />
+            </change_format>
+	</data>
+    </outputs>
+    <tests>
+        <test>
+            <param name="bait_file" value="tvc_bait.fasta" ftype="fasta" />
+            <param name="input_reads" value="tvc_mini.fastq" ftype="fastqsanger" />
+            <output name="output_reads" file="tvc_mini_bait_pos.fastq" ftype="fastqsanger" />
+	</test>
+        <test>
+            <param name="bait_file" value="tvc_bait.fasta" ftype="fasta" />
+            <param name="input_reads" value="tvc_mini.fastq" ftype="fastqsanger" />
+            <param name="kmer_length" value="32" />
+            <param name="min_occurence" value="50" />
+            <output name="output_reads" file="tvc_mini_bait_strict.fastq" ftype="fastqsanger" />
+        </test>
+        <test>
+            <param name="bait_file" value="tvc_bait.fasta" ftype="fasta" />
+            <param name="input_reads" value="tvc_mini.fastq" ftype="fastqsanger" />
+            <param name="output_choice" value="neg" />
+            <output name="output_reads" file="tvc_mini_bait_neg.fastq" ftype="fastqsanger" />
+        </test>
+    </tests>
+    <help>
+**What it does**
+
+Runs the ``mirabait`` utility from MIRA v4.0 to filter your input reads
+according to whether or not they contain perfect kmer matches to your
+bait file. By default this looks for 31-mers (kmers or *k*-mers where
+the fragment length *k* is 31), and only requires a single matching kmer.
+
+The ``mirabait`` utility is useful in many applications and pipelines
+outside of using the main MIRA tool for assembly or mapping.
+
+.. class:: warningmark
+
+Note ``mirabait`` cannot be used on protein (amino acid) sequences.
+
+**Example Usage**
+
+To remove over abundant entries like rRNA sequences, run ``mirabait`` with
+known rRNA sequences as the bait and select the *negative* matches.
+
+To do targeted assembly by fishing out reads belonging to a gene and just
+assemble these, run ``mirabait`` with the gene of interest as the bait and
+select the *positive* matches.
+
+To iteratively reconstruct mitochondria you could start by fishing out reads
+matching any known mitochondrial sequence, assembly those, and repeat.
+
+
+**Notes on paired read**
+
+.. class:: warningmark
+
+While MIRA4 is aware of many read naming conventions to identify paired read
+partners, the ``mirabait`` tool considers each read in isolation. Applying
+it to paired read files may leave you with orphaned reads.
+
+
+**Citation**
+
+If you use this Galaxy tool in work leading to a scientific publication please
+cite the following papers:
+
+Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013).
+Galaxy tools and workflows for sequence analysis with applications
+in molecular plant pathology. PeerJ 1:e167
+http://dx.doi.org/10.7717/peerj.167
+
+Bastien Chevreux, Thomas Wetter and Sándor Suhai (1999).
+Genome Sequence Assembly Using Trace Signals and Additional Sequence Information.
+Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56.
+http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler
+    </help>
+</tool>
--- a/tools/mira4/mira4_de_novo.xml	Mon Oct 28 05:44:46 2013 -0400
+++ b/tools/mira4/mira4_de_novo.xml	Thu Nov 28 05:07:59 2013 -0500
@@ -1,13 +1,21 @@
-<tool id="mira_4_0_de_novo" name="MIRA v4.0 de novo assember" version="0.0.1">
+<tool id="mira_4_0_de_novo" name="MIRA v4.0 de novo assember" version="0.0.2">
     <description>Takes Sanger, Roche 454, Solexa/Illumina, Ion Torrent and PacBio reads</description>
     <requirements>
         <requirement type="binary">mira</requirement>
+        <requirement type="binary">miraconvert</requirement>
         <requirement type="package" version="4.0">MIRA</requirement>
+        <requirement type="binary">samtools</requirement>
+        <requirement type="package" version="0.1.19">samtools</requirement>
     </requirements>
     <version_command interpreter="python">mira4.py --version</version_command>
     <command interpreter="python">
-mira4.py $manifest $out_maf $out_fasta $out_log
+mira4.py "$manifest" "$out_maf" "$out_bam" "$out_fasta" "$out_log"
     </command>
+    <stdio>
+        <!-- Assume anything other than zero is an error -->
+        <exit_code range="1:" />
+        <exit_code range=":-1" />
+    </stdio>
     <inputs>
         <param name="job_type" type="select" label="Assembly type">
             <option value="genome">Genome</option>
@@ -63,6 +71,7 @@
     <code file="mira4_validator.py" />
     <outputs>
         <data name="out_fasta" format="fasta" label="MIRA de novo contigs (FASTA)" />
+        <data name="out_bam" format="bam" label="MIRA de novo assembly (BAM)" />
         <data name="out_maf" format="mira" label="MIRA de novo assembly" />
         <data name="out_log" format="txt" label="MIRA de novo log" />
     </outputs>
@@ -118,30 +127,47 @@
         </configfile>
     </configfiles>
     <tests>
-            <!-- Based on the MIRA v3.4.1.1 bundled minidemo/estdemo2 which uses
-                 strain data and miraSearchESTSNPs. Here we just assemble it. --> 
-<!--
-Commenting out test until Galaxy framework is fixed,
-https://trello.com/c/zSTrfDOB/820-disambiguated-conditional-parameters-not-supported-in-unit-tests
+        <!-- Tiger mitochondria, selected paired end Illumina reads from SRR639755
+             Note we're using just one repeat group, and only the filenames parameter
+             within it, so this should work with current test framework limitations:
+        <test>
+            <param name="job_type" value="genome" />
+            <param name="job_quality" value="accurate" />
+            <param name="filenames" value="SRR639755_mito_pairs.fastq.gz" ftype="fastqsanger" />
+            <output name="out_fasta" file="SRR639755_mito_pairs.mira4_de_novo.fasta" ftype="fasta" />
+        </test>
+        -->
+        <!-- Simple assembly based on MIRA's minidemo/demo4 example
+             Note we're using just one repeat group,
+             but several parameters with the repeat
+             (in order to specify sanger reads, no pairing)
         <test>
-            <param name="job_method" value="denovo" />
-            <param name="job_type" value="est" />
-            <param name="job_qual" value="accurate" />
-            <param name="condBackbone.use" value="false" />
-            <param name="condSanger.use" value="true" />
-            <param name="condSanger.filename" value="tvc_mini.fastq" ftype="fastq" />
-            <param name="condRoche.use" value="false" />
-            <param name="condIllumina.use" value="false" /> 
-            <param name="condIonTorrent.use" value="false" />
-            <output name="out_fasta" file="tvc_contigs.fasta" ftype="fasta" />
-	</test>
--->
+            <param name="job_type" value="genome" />
+            <param name="job_quality" value="accurate" />
+            <param name="technology" value="sanger" />
+            <param name="type" value="none" />
+            <param name="filenames" value="U13small_m.fastq" ftype="fastqsanger" />
+            <output name="out_fasta" file="U13small_m.mira4_de_novo.fasta" ftype="fasta" />
+        </test>
+        -->
+	<!-- Simple assembly based on MIRA's minidemo/solexa1 example
+             Note we're using just one repeat group,
+             but two parameters within the repeat (filename, no pairing)
+        <test>
+            <param name="job_type" value="genome" />
+            <param name="job_quality" value="accurate" />
+            <param name="type" value="none" />
+            <param name="filenames" value="ecoli.fastq" ftype="fastqsanger" />
+            <output name="out_fasta" file="ecoli.mira4_de_novo.fasta" ftype="fasta" />
+        </test>
+        -->
     </tests>
     <help>
 
 **What it does**
 
-Runs MIRA v4.0 in de novo mode, collects the output, and throws away all the temporary files.
+Runs MIRA v4.0 in de novo mode, collects the output, generates a sorted BAM
+file, and then throws away all the temporary files.
 
 MIRA is an open source assembly tool capable of handling sequence data from
 a range of platforms (Sanger capillary, Solexa/Illumina, Roche 454, Ion Torrent
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4/mira4_make_bam.py	Thu Nov 28 05:07:59 2013 -0500
@@ -0,0 +1,85 @@
+#!/usr/bin/env python
+"""Wrapper script using miraconvert & samtools to get BAM from MIRA.
+"""
+import os
+import sys
+import shutil
+import subprocess
+import tempfile
+
+def stop_err(msg, err=1):
+    sys.stderr.write(msg+"\n")
+    sys.exit(err)
+
+def run(cmd, log_handle):
+    try:
+        child = subprocess.Popen(cmd, shell=True,
+                                 stdout=subprocess.PIPE,
+                                 stderr=subprocess.STDOUT)
+    except Exception, err:
+        sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
+        #TODO - call clean up?
+        log_handle.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
+        sys.exit(1)
+    #Use .communicate as can get deadlocks with .wait(),
+    stdout, stderr = child.communicate()
+    assert not stderr #Should be empty as sent to stdout
+    if len(stdout) > 10000:
+        #miraconvert can be very verbose (is holding stdout in RAM a problem?)
+        stdout = stdout.split("\n")
+        stdout = stdout[:10] + ["...", "<snip>", "..."] + stdout[-10:]
+        stdout = "\n".join(stdout)
+    log_handle.write(stdout)
+    return child.returncode
+
+def make_bam(mira_convert, maf_file, fasta_file, bam_file, log_handle):
+    if not os.path.isfile(mira_convert):
+        return "Missing binary %r" % mira_convert
+    if not os.path.isfile(maf_file):
+        return "Missing input MIRA file: %r" % maf_file
+    if not os.path.isfile(fasta_file):
+        return "Missing padded FASTA file: %r" % fasta_file
+
+    log_handle.write("\n====================== Converting MIRA assembly to SAM =======================\n")
+    tmp_dir = tempfile.mkdtemp()
+    sam_file = os.path.join(tmp_dir, "x.sam")
+
+    # Note add nbb to the template name, possible MIRA 4.0 RC4 bug
+    cmd = '"%s" -f maf -t samnbb "%s" "%snbb"' % (mira_convert, maf_file, sam_file)
+    return_code = run(cmd, log_handle)
+    if return_code:
+        return "Error %i from command:\n%s" % (return_code, cmd)
+    if not os.path.isfile(sam_file):
+        return "Conversion from MIRA to SAM failed"
+
+    log_handle.write("\n================= Converting MIRA assembly from SAM to BAM ===================\n")
+    #Also doing SAM to (uncompressed) BAM during depad
+    bam_stem = bam_file + ".tmp" # Have write permissions and want final file in this folder
+    cmd = 'samtools depad -S -u -T "%s" "%s" | samtools sort - "%s"' % (fasta_file, sam_file, bam_stem)
+    return_code = run(cmd, log_handle)
+    if return_code:
+        return "Error %i from command:\n%s" % (return_code, cmd)
+    if not os.path.isfile(bam_stem + ".bam"):
+        return "samtools depad or sort failed to produce BAM file"
+
+    os.remove(sam_file)
+    os.rmdir(tmp_dir)
+
+    log_handle.write("\n====================== Indexing MIRA assembly BAM file =======================\n")
+    cmd = 'samtools index "%s.bam"' % bam_stem
+    return_code = run(cmd, log_handle)
+    if return_code:
+        return "Error %i from command:\n%s" % (return_code, cmd)
+    if not os.path.isfile(bam_stem + ".bam.bai"):
+        return "samtools indexing of BAM file failed to produce BAI file"
+
+    shutil.move(bam_stem + ".bam", bam_file)
+    os.remove(bam_stem + ".bam.bai") #Let Galaxy handle that...
+
+    return None #Good :)
+
+if __name__ == "__main__":
+    mira_convert, maf_file, fasta_file, bam_file = sys.argv[1:]
+    msg = make_bam(mira_convert, maf_file, fasta_file, bam_file, sys.stdout)
+    if msg:
+        stop_err(msg)
--- a/tools/mira4/mira4_mapping.xml	Mon Oct 28 05:44:46 2013 -0400
+++ b/tools/mira4/mira4_mapping.xml	Thu Nov 28 05:07:59 2013 -0500
@@ -1,13 +1,21 @@
-<tool id="mira_4_0_mapping" name="MIRA v4.0 mapping" version="0.0.1">
+<tool id="mira_4_0_mapping" name="MIRA v4.0 mapping" version="0.0.2">
     <description>Maps Sanger, Roche 454, Solexa/Illumina, Ion Torrent and PacBio reads</description>
     <requirements>
         <requirement type="binary">mira</requirement>
+        <requirement type="binary">miraconvert</requirement>
         <requirement type="package" version="4.0">MIRA</requirement>
+        <requirement type="binary">samtools</requirement>
+        <requirement type="package" version="0.1.19">samtools</requirement>
     </requirements>
     <version_command interpreter="python">mira4.py --version</version_command>
     <command interpreter="python">
-mira4.py $manifest $out_maf $out_fasta $out_log
+mira4.py "$manifest" "$out_maf" "$out_bam" "$out_fasta" "$out_log"
     </command>
+    <stdio>
+        <!-- Assume anything other than zero is an error -->
+        <exit_code range="1:" />
+        <exit_code range=":-1" />
+    </stdio>
     <inputs>
         <param name="job_type" type="select" label="Assembly type">
             <option value="genome">Genome</option>
@@ -64,6 +72,7 @@
     </inputs>
     <outputs>
         <data name="out_fasta" format="fasta" label="MIRA #if str($strain_setup)=='same' then 'same strain' else 'reference' # mapping contigs (FASTA)" />
+        <data name="out_bam" format="bam" label="MIRA #if str($strain_setup)=='same' then 'same strain' else 'reference' # mapping assembly (BAM)" />
         <data name="out_maf" format="mira" label="MIRA #if str($strain_setup)=='same' then 'same strain' else 'reference' # mapping assembly" />
         <data name="out_log" format="txt" label="MIRA #if str($strain_setup)=='same' then 'same strain' else 'reference' # mapping log" />
     </outputs>
@@ -169,7 +178,8 @@
 
 **What it does**
 
-Runs MIRA v4.0 in mapping mode, collects the output, and throws away all the temporary files.
+Runs MIRA v4.0 in mapping mode, collects the output, generates a sorted BAM
+file, and throws away all the temporary files.
 
 MIRA is an open source assembly tool capable of handling sequence data from
 a range of platforms (Sanger capillary, Solexa/Illumina, Roche 454, Ion Torrent
--- a/tools/mira4/tool_dependencies.xml	Mon Oct 28 05:44:46 2013 -0400
+++ b/tools/mira4/tool_dependencies.xml	Thu Nov 28 05:07:59 2013 -0500
@@ -1,5 +1,8 @@
 <?xml version="1.0"?>
 <tool_dependency>
+    <package name="samtools" version="0.1.19">
+        <repository changeset_revision="54195f1d4b0f" name="package_samtools_0_1_19" owner="iuc" toolshed="http://testtoolshed.g2.bx.psu.edu" />
+    </package>
     <package name="MIRA" version="4.0">
         <install version="1.0">
             <actions>
@@ -9,7 +12,7 @@
                     <destination_directory>$INSTALL_DIR</destination_directory>
                 </action>
                 <action type="set_environment">
-                    <environment_variable name="MIRA4" action="set_to">$INSTALL_DIR</environment_variable>
+                    <environment_variable action="set_to" name="MIRA4">$INSTALL_DIR</environment_variable>
                 </action>
             </actions>
         </install>
@@ -23,4 +26,3 @@
         </readme>
     </package>
 </tool_dependency>
-