changeset 25:55ae131c5862 draft

v0.0.8 - renamed folder, added note about mirabait
author peterjc
date Wed, 02 Sep 2015 07:45:07 -0400
parents 438d756f348b
children 8aeb92e76159
files tools/mira4_0/README.rst tools/mira4_0/mira4.py tools/mira4_0/mira4_bait.py tools/mira4_0/mira4_convert.py tools/mira4_0/mira4_de_novo.xml tools/mira4_0/mira4_make_bam.py tools/mira4_0/mira4_mapping.xml tools/mira4_0/mira4_validator.py tools/mira4_0/repository_dependencies.xml tools/mira4_0/tool_dependencies.xml tools/mira4_assembler/README.rst tools/mira4_assembler/mira4.py tools/mira4_assembler/mira4_bait.py tools/mira4_assembler/mira4_convert.py tools/mira4_assembler/mira4_de_novo.xml tools/mira4_assembler/mira4_make_bam.py tools/mira4_assembler/mira4_mapping.xml tools/mira4_assembler/mira4_validator.py tools/mira4_assembler/repository_dependencies.xml tools/mira4_assembler/tool_dependencies.xml
diffstat 20 files changed, 1533 insertions(+), 1553 deletions(-) [+]
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_0/README.rst	Wed Sep 02 07:45:07 2015 -0400
@@ -0,0 +1,157 @@
+Galaxy wrapper for the MIRA assembly program (v4.0)
+===================================================
+
+This tool is copyright 2011-2015 by Peter Cock, The James Hutton Institute
+(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
+See the licence text below (MIT licence).
+
+This tool is a short Python script (to collect the MIRA output and move it
+to where Galaxy expects the files) and associated Galaxy wrapper XML file.
+
+It is available from the Galaxy Tool Shed at:
+http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler 
+
+It uses a Galaxy datatype definition 'mira' for the MIRA Assembly Format,
+http://toolshed.g2.bx.psu.edu/view/peterjc/mira_datatypes
+
+A separate wrapper for MIRA v3.4 is available from the Galaxy Tool Shed at:
+http://toolshed.g2.bx.psu.edu/view/peterjc/mira_assembler
+
+Automated Installation
+======================
+
+This should be straightforward. Via the Tool Shed, Galaxy should automatically
+install the 'mira' datatype, samtools, and download and install the precompiled
+binary for MIRA v4.0.2 for the Galaxy wrapper, and run any tests.
+
+For MIRA 4, the Galaxy wrapper has been split in two, allowing separate
+cluster settings for de novo usage (high RAM) and mapping (lower RAM).
+Consult the Galaxy adminstration documentation for your cluster setup.
+
+WARNING: For larger tasks, be aware that MIRA can require vast amounts
+of RAM and run-times of over a week are possible. This tool wrapper makes
+no attempt to spot and reject such large jobs.
+
+
+Manual Installation
+===================
+
+First install the 'mira' datatype for Galaxy, available here:
+
+* http://toolshed.g2.bx.psu.edu/view/peterjc/mira_datatypes 
+
+There are various Python and XML files to install into Galaxy:
+
+* ``mira4_de_novo.xml`` (the Galaxy tool definition for de novo usage)
+* ``mira4_mapping.xml`` (the Galaxy tool definition for mapping usage)
+* ``mira4_convert.xml`` (the Galaxy tool definition for converting MIRA files)
+* ``mira4_bait.xml`` (the Galaxy tool definition for mirabait)
+* ``mira4.py`` (the Python wrapper script)
+* ``mira4_convert.py`` (the Python wrapper script for miraconvert)
+* ``mira4_bait.py`` (the Python wrapper script for mirabait)
+* ``mira4_validator.py`` (the XML parameter validation script)
+
+The suggested location is a new ``tools/mira4_0`` folder. You will also need to
+modify the ``tools_conf.xml`` file to tell Galaxy to offer the tool::
+
+  <tool file="mira4_0/mira4_de_novo.xml" />
+  <tool file="mira4_0/mira4_mapping.xml" />
+  ...
+
+You will also need to install MIRA, we used version 4.0.2, and define the
+environment variable ``$MIRA4`` pointing at the folder containing the binaries.
+See:
+
+* http://chevreux.org/projects_mira.html
+* http://sourceforge.net/projects/mira-assembler/
+
+You may wish to use different cluster setups for the de novo and mapping
+tools, see above.
+
+You will also need to install samtools (for generating a BAM file from MIRA's
+SAM output).
+
+If you wish to run the unit tests, also move/copy the ``test-data/`` files
+under Galaxy's ``test-data/`` folder. Then::
+
+    $ ./run_tests.sh -id mira_4_0_bait
+    $ ./run_tests.sh -id mira_4_0_de_novo
+    $ ./run_tests.sh -id mira_4_0_mapping
+    $ ./run_tests.sh -id mira_4_0_convert
+
+
+History
+=======
+
+======= ======================================================================
+Version Changes
+------- ----------------------------------------------------------------------
+v0.0.1  - Initial version (prototype for MIRA 4.0 RC4, based on wrapper for v3.4)
+v0.0.2  - Include BAM output (using ``miraconvert`` and ``samtools``).
+        - Updated to target MIRA 4.0.1
+        - Simplified XML to apply input format to output data.
+        - Sets temporary folder at run time to respect environment variables
+          (``$TMPDIR``, ``$TEMP``, or ``$TMP`` in that order). This was
+          previously hard coded as ``/tmp``.
+v0.0.3  - Updated to target MIRA 4.0.2
+v0.0.4  - Using ``optparse`` for the Python wrapper script API
+        - Made MAF and BAM outputs optional
+        - Include wrapper for ``miraconvert``
+v0.0.5  - Tool definition now embeds citation information.
+v0.0.6  - Fixed error handling in ``mira4_convert.py``.
+v0.0.7  - Renamed folder (internal change only).
+        - Reorder XML elements (internal change only).
+        - Use the ``format_source=...`` tag in the MIRA bait wrapper.
+        - Planemo for Tool Shed upload (``.shed.yml``, internal change only).
+        - MIRA 4.0.2 dependency now declared via dedicated Tool Shed package.
+v0.0.8  - Renamed folder now have a MIRA 4.9.x wrapper (internal change only).
+======= ======================================================================
+
+
+Developers
+==========
+
+Development is on a dedicated GitHub repository:
+https://github.com/peterjc/pico_galaxy/tree/master/tools/mira_4_0
+
+For pushing a release to the test or main "Galaxy Tool Shed", use the following
+Planemo commands (which requires you have set your Tool Shed access details in
+``~/.planemo.yml`` and that you have access rights on the Tool Shed)::
+
+    $ planemo shed_update -t testtoolshed --check_diff ~/repositories/pico_galaxy/tools/mira4_0/
+    ...
+
+or::
+
+    $ planemo shed_update -t toolshed --check_diff ~/repositories/pico_galaxy/tools/mira4_0/
+    ...
+
+To just build and check the tar ball, use::
+
+    $ planemo shed_upload --tar_only  ~/repositories/pico_galaxy/tools/mira4_0/
+    ...
+    $ tar -tzf shed_upload.tar.gz 
+    test-data/U13small_m.fastq
+    ...
+
+
+Licence (MIT)
+=============
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_0/mira4.py	Wed Sep 02 07:45:07 2015 -0400
@@ -0,0 +1,313 @@
+#!/usr/bin/env python
+"""A simple wrapper script to call MIRA and collect its output.
+"""
+import os
+import sys
+import subprocess
+import shutil
+import time
+import tempfile
+from optparse import OptionParser
+
+#Do we need any PYTHONPATH magic?
+from mira4_make_bam import make_bam
+
+WRAPPER_VER = "0.0.4" #Keep in sync with the XML file
+
+def sys_exit(msg, err=1):
+    sys.stderr.write(msg+"\n")
+    sys.exit(err)
+
+
+def get_version(mira_binary):
+    """Run MIRA to find its version number"""
+    # At the commend line I would use: mira -v | head -n 1
+    # however there is some pipe error when doing that here.
+    cmd = [mira_binary, "-v"]
+    try:
+        child = subprocess.Popen(cmd,
+                                 stdout=subprocess.PIPE,
+                                 stderr=subprocess.STDOUT)
+    except Exception, err:
+        sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
+        sys.exit(1)
+    ver, tmp = child.communicate()
+    del child
+    return ver.split("\n", 1)[0].strip()
+
+#Parse Command Line
+usage = """Galaxy MIRA4 wrapper script v%s - use as follows:
+
+$ python mira4.py ...
+
+This will run the MIRA binary and collect its output files as directed.
+""" % WRAPPER_VER
+parser = OptionParser(usage=usage)
+parser.add_option("-m", "--manifest", dest="manifest",
+                  default=None, metavar="FILE",
+                  help="MIRA manifest filename")
+parser.add_option("--maf", dest="maf",
+                  default="-", metavar="FILE",
+                  help="MIRA MAF output filename")
+parser.add_option("--bam", dest="bam",
+                  default="-", metavar="FILE",
+                  help="Unpadded BAM output filename")
+parser.add_option("--fasta", dest="fasta",
+                  default="-", metavar="FILE",
+                  help="Unpadded FASTA output filename")
+parser.add_option("--log", dest="log",
+                  default="-", metavar="FILE",
+                  help="MIRA logging output filename")
+parser.add_option("-v", "--version", dest="version",
+                  default=False, action="store_true",
+                  help="Show version and quit")
+options, args = parser.parse_args()
+manifest = options.manifest
+out_maf = options.maf
+out_bam = options.bam
+out_fasta = options.fasta
+out_log = options.log
+
+try:
+    mira_path = os.environ["MIRA4"]
+except KeyError:
+    sys_exit("Environment variable $MIRA4 not set")
+mira_binary = os.path.join(mira_path, "mira")
+if not os.path.isfile(mira_binary):
+    sys_exit("Missing mira under $MIRA4, %r\nFolder contained: %s"
+             % (mira_binary, ", ".join(os.listdir(mira_path))))
+mira_convert = os.path.join(mira_path, "miraconvert")
+if not os.path.isfile(mira_convert):
+    sys_exit("Missing miraconvert under $MIRA4, %r\nFolder contained: %s"
+             % (mira_convert, ", ".join(os.listdir(mira_path))))
+
+mira_ver = get_version(mira_binary)
+if not mira_ver.strip().startswith("4.0"):
+    sys_exit("This wrapper is for MIRA V4.0, not:\n%s\n%s" % (mira_ver, mira_binary))
+mira_convert_ver = get_version(mira_convert)
+if not mira_convert_ver.strip().startswith("4.0"):
+    sys_exit("This wrapper is for MIRA V4.0, not:\n%s\n%s" % (mira_ver, mira_convert))
+if options.version:
+    print "%s, MIRA wrapper version %s" % (mira_ver, WRAPPER_VER)
+    if mira_ver != mira_convert_ver:
+        print "WARNING: miraconvert %s" % mira_convert_ver
+    sys.exit(0)
+
+if not manifest:
+    sys_exit("Manifest is required")
+elif not os.path.isfile(manifest):
+    sys_exit("Missing input MIRA manifest file: %r" % manifest)
+
+
+try:
+    threads = int(os.environ.get("GALAXY_SLOTS", "1"))
+except ValueError:
+    threads = 1
+assert 1 <= threads, threads
+
+
+def override_temp(manifest):
+    """Override ``-DI:trt=/tmp`` in manifest with environment variable.
+
+    Currently MIRA 4 does not allow envronment variables like ``$TMP``
+    inside the manifest, which is a problem if you need to override
+    the default at run time.
+
+    The tool XML will ``/tmp`` and we replace that here with
+    ``tempfile.gettempdir()`` which will respect $TMPDIR, $TEMP, $TMP
+    as explained in the Python standard library documentation:
+    http://docs.python.org/2/library/tempfile.html#tempfile.tempdir
+
+    By default MIRA 4 would write its temporary files within the output
+    folder, which is a problem if that is a network drive.
+    """
+    handle = open(manifest, "r")
+    text = handle.read()
+    handle.close()
+
+    #At time of writing, this is at the end of a file,
+    #but could be followed by a space in future...
+    text = text.replace("-DI:trt=/tmp", "-DI:trt=" + tempfile.gettempdir())
+
+    #Want to try to ensure this gets written to disk before MIRA attempts
+    #to open it - any networked file system may impose a delay...
+    handle = open(manifest, "w")
+    handle.write(text)
+    handle.flush()
+    os.fsync(handle.fileno())
+    handle.close()
+
+
+def log_manifest(manifest):
+    """Write the manifest file to stderr."""
+    sys.stderr.write("\n%s\nManifest file\n%s\n" % ("="*60, "="*60))
+    with open(manifest) as h:
+        for line in h:
+            sys.stderr.write(line)
+    sys.stderr.write("\n%s\nEnd of manifest\n%s\n" % ("="*60, "="*60))
+
+
+def collect_output(temp, name, handle):
+    """Moves files to the output filenames (global variables)."""
+    n3 = (temp, name, name, name)
+    f = "%s/%s_assembly/%s_d_results" % (temp, name, name)
+    if not os.path.isdir(f):
+        log_manifest(manifest)
+        sys_exit("Missing output folder")
+    if not os.listdir(f):
+        log_manifest(manifest)
+        sys_exit("Empty output folder")
+    missing = []
+
+    old_maf = "%s/%s_out.maf" % (f, name)
+    if not os.path.isfile(old_maf):
+        #Triggered extractLargeContigs.sh?
+        old_maf = "%s/%s_LargeContigs_out.maf" % (f, name)
+
+    #De novo or single strain mapping,
+    old_fasta = "%s/%s_out.unpadded.fasta" % (f, name)
+    ref_fasta = "%s/%s_out.padded.fasta" % (f, name)
+    if not os.path.isfile(old_fasta):
+        #Mapping (StrainX versus reference) or de novo
+        old_fasta = "%s/%s_out_StrainX.unpadded.fasta" % (f, name)
+        ref_fasta = "%s/%s_out_StrainX.padded.fasta" % (f, name)
+    if not os.path.isfile(old_fasta):
+        old_fasta = "%s/%s_out_ReferenceStrain.unpadded.fasta" % (f, name)
+        ref_fasta = "%s/%s_out_ReferenceStrain.padded.fasta" % (f, name)
+        
+
+    missing = False
+    for old, new in [(old_maf, out_maf),
+                     (old_fasta, out_fasta)]:
+        if not os.path.isfile(old):
+            missing = True
+        elif not new or new == "-":
+            handle.write("Ignoring %s\n" % old)
+        else:
+            handle.write("Capturing %s\n" % old)
+            shutil.move(old, new)
+    if missing:
+        log_manifest(manifest)
+        sys.stderr.write("Contents of %r:\n" % f)
+        for filename in sorted(os.listdir(f)):
+            sys.stderr.write("%s\n" % filename)
+
+    #For mapping mode, probably most people would expect a BAM file
+    #using the reference FASTA file...
+    if out_bam and out_bam != "-":
+        if out_maf and out_maf != "-":
+            msg = make_bam(mira_convert, out_maf, ref_fasta, out_bam, handle)
+        else:
+            #Not collecting the MAF file, use original location        
+            msg = make_bam(mira_convert, old_maf, ref_fasta, out_bam, handle)
+        if msg:
+            sys_exit(msg)
+
+def clean_up(temp, name):
+    folder = "%s/%s_assembly" % (temp, name)
+    if os.path.isdir(folder):
+        shutil.rmtree(folder)
+
+#TODO - Run MIRA in /tmp or a configurable directory?
+#Currently Galaxy puts us somewhere safe like:
+#/opt/galaxy-dist/database/job_working_directory/846/
+temp = "."
+
+name = "MIRA"
+
+override_temp(manifest)
+
+start_time = time.time()
+cmd_list = [mira_binary, "-t", str(threads), manifest]
+cmd = " ".join(cmd_list)
+
+assert os.path.isdir(temp)
+d = "%s_assembly" % name
+#This can fail on my development machine if stale folders exist
+#under Galaxy's .../database/job_working_directory/ tree:
+assert not os.path.isdir(d), "Path %r already exists:\n%s" % (d, os.path.abspath(d))
+try:
+    #Check path access
+    os.mkdir(d)
+except Exception, err:
+    log_manifest(manifest)
+    sys.stderr.write("Error making directory %s\n%s" % (d, err))
+    sys.exit(1)
+
+#print os.path.abspath(".")
+#print cmd
+
+if out_log and out_log != "-":
+    handle = open(out_log, "w")
+else:
+    handle = open(os.devnull, "w")
+handle.write("======================== MIRA manifest (instructions) ========================\n")
+m = open(manifest, "rU")
+for line in m:
+    handle.write(line)
+m.close()
+del m
+handle.write("\n")
+handle.write("============================ Starting MIRA now ===============================\n")
+handle.flush()
+try:
+    #Run MIRA
+    child = subprocess.Popen(cmd_list,
+                             stdout=handle,
+                             stderr=subprocess.STDOUT)
+except Exception, err:
+    log_manifest(manifest)
+    sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
+    #TODO - call clean up?
+    handle.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
+    handle.close()
+    sys.exit(1)
+#Use .communicate as can get deadlocks with .wait(),
+stdout, stderr = child.communicate()
+assert not stdout and not stderr #Should be empty as sent to handle
+run_time = time.time() - start_time
+return_code = child.returncode
+handle.write("\n")
+handle.write("============================ MIRA has finished ===============================\n")
+handle.write("MIRA took %0.2f hours\n" % (run_time / 3600.0))
+if return_code:
+    print "MIRA took %0.2f hours" % (run_time / 3600.0)
+    handle.write("Return error code %i from command:\n" % return_code)
+    handle.write(cmd + "\n")
+    handle.close()
+    clean_up(temp, name)
+    log_manifest(manifest)
+    sys_exit("Return error code %i from command:\n%s" % (return_code, cmd),
+             return_code)
+handle.flush()
+
+if os.path.isfile("MIRA_assembly/MIRA_d_results/ec.log"):
+    handle.write("\n")
+    handle.write("====================== Extract Large Contigs failed ==========================\n")
+    e = open("MIRA_assembly/MIRA_d_results/ec.log", "rU")
+    for line in e:
+        handle.write(line)
+    e.close()
+    handle.write("============================ (end of ec.log) =================================\n")
+    handle.flush()
+
+#print "Collecting output..."
+start_time = time.time()
+collect_output(temp, name, handle)
+collect_time = time.time() - start_time
+handle.write("MIRA took %0.2f hours; collecting output %0.2f minutes\n" % (run_time / 3600.0, collect_time / 60.0))
+print("MIRA took %0.2f hours; collecting output %0.2f minutes\n" % (run_time / 3600.0, collect_time / 60.0))
+
+if os.path.isfile("MIRA_assembly/MIRA_d_results/ec.log"):
+    #Treat as an error, but doing this AFTER collect_output
+    sys.stderr.write("Extract Large Contigs failed\n")
+    handle.write("Extract Large Contigs failed\n")
+    handle.close()
+    sys.exit(1)
+
+#print "Cleaning up..."
+clean_up(temp, name)
+
+handle.write("\nDone\n")
+handle.close()
+print("Done")
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_0/mira4_bait.py	Wed Sep 02 07:45:07 2015 -0400
@@ -0,0 +1,114 @@
+#!/usr/bin/env python
+"""A simple wrapper script to call MIRA4's mirabait and collect its output.
+"""
+import os
+import sys
+import subprocess
+import shutil
+import time
+
+WRAPPER_VER = "0.0.5" #Keep in sync with the XML file
+
+def sys_exit(msg, err=1):
+    sys.stderr.write(msg+"\n")
+    sys.exit(err)
+
+
+def get_version(mira_binary):
+    """Run MIRA to find its version number"""
+    # At the commend line I would use: mira -v | head -n 1
+    # however there is some pipe error when doing that here.
+    cmd = [mira_binary, "-v"]
+    try:
+        child = subprocess.Popen(cmd,
+                                 stdout=subprocess.PIPE,
+                                 stderr=subprocess.STDOUT)
+    except Exception, err:
+        sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
+        sys.exit(1)
+    ver, tmp = child.communicate()
+    del child
+    #Workaround for -v not working in mirabait 4.0RC4
+    if "invalid option" in ver.split("\n", 1)[0]:
+        for line in ver.split("\n", 1):
+            if " version " in line:
+                line = line.split()
+                return line[line.index("version")+1].rstrip(")")
+        sys_exit("Could not determine MIRA version:\n%s" % ver)
+    return ver.split("\n", 1)[0]
+
+try:
+    mira_path = os.environ["MIRA4"]
+except KeyError:
+    sys_exit("Environment variable $MIRA4 not set")
+mira_binary = os.path.join(mira_path, "mirabait")
+if not os.path.isfile(mira_binary):
+    sys_exit("Missing mirabait under $MIRA4, %r\nFolder contained: %s"
+             % (mira_binary, ", ".join(os.listdir(mira_path))))
+mira_ver = get_version(mira_binary)
+if not mira_ver.strip().startswith("4.0"):
+    sys_exit("This wrapper is for MIRA V4.0, not:\n%s" % mira_ver)
+if "-v" in sys.argv or "--version" in sys.argv:
+    print "%s, MIRA wrapper version %s" % (mira_ver, WRAPPER_VER)
+    sys.exit(0)
+
+
+format, output_choice, strand_choice, kmer_length, min_occurance, bait_file, in_file, out_file = sys.argv[1:]
+
+if format.startswith("fastq"):
+    format = "fastq"
+elif format == "mira":
+    format = "maf"
+elif format != "fasta":
+    sys_exit("Was not expected format %r" % format)
+
+assert out_file.endswith(".dat")
+out_file_stem = out_file[:-4]
+
+cmd_list = [mira_binary, "-f", format, "-t", format,
+            "-k", kmer_length, "-n", min_occurance,
+            bait_file, in_file, out_file_stem]
+if output_choice == "pos":
+    pass
+elif output_choice == "neg":
+    #Invert the selection...
+    cmd_list.insert(1, "-i")
+else:
+    sys_exit("Output choice should be 'pos' or 'neg', not %r" % output_choice)
+if strand_choice == "both":
+    pass
+elif strand_choice == "fwd":
+    #Ingore reverse strand...
+    cmd_list.insert(1, "-r")
+else:
+    sys_exit("Strand choice should be 'both' or 'fwd', not %r" % strand_choice)
+
+cmd = " ".join(cmd_list)
+#print cmd
+start_time = time.time()
+try:
+    #Run MIRA
+    child = subprocess.Popen(cmd_list,
+                             stdout=subprocess.PIPE,
+                             stderr=subprocess.STDOUT)
+except Exception, err:
+    sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
+    sys.exit(1)
+#Use .communicate as can get deadlocks with .wait(),
+stdout, stderr = child.communicate()
+assert stderr is None # Due to way we ran with subprocess
+run_time = time.time() - start_time
+return_code = child.returncode
+print "mirabait took %0.2f minutes" % (run_time / 60.0)
+
+if return_code:
+    sys.stderr.write(stdout)
+    sys_exit("Return error code %i from command:\n%s" % (return_code, cmd),
+             return_code)
+
+#Capture output
+out_tmp = out_file_stem + "." + format
+if not os.path.isfile(out_tmp):
+    sys.stderr.write(stdout)
+    sys_exit("Missing output file from mirabait: %s" % out_tmp)
+shutil.move(out_tmp, out_file)
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_0/mira4_convert.py	Wed Sep 02 07:45:07 2015 -0400
@@ -0,0 +1,226 @@
+#!/usr/bin/env python
+"""A simple wrapper script to call MIRA and collect its output.
+
+This focuses on the miraconvert binary.
+"""
+import os
+import sys
+import subprocess
+import shutil
+import time
+import tempfile
+from optparse import OptionParser
+try:
+    from io import BytesIO
+except ImportError:
+    #Should we worry about Python 2.5 or older?
+    from StringIO import StringIO as BytesIO
+
+#Do we need any PYTHONPATH magic?
+from mira4_make_bam import depad
+
+WRAPPER_VER = "0.0.7"  # Keep in sync with the XML file
+
+def sys_exit(msg, err=1):
+    sys.stderr.write(msg+"\n")
+    sys.exit(err)
+
+def run(cmd):
+    #Avoid using shell=True when we call subprocess to ensure if the Python
+    #script is killed, so too is the child process.
+    try:
+        child = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+    except Exception, err:
+        sys_exit("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
+    #Use .communicate as can get deadlocks with .wait(),
+    stdout, stderr = child.communicate()
+    return_code = child.returncode
+    if return_code:
+        cmd_str = " ".join(cmd)  # doesn't quote spaces etc
+        if stderr and stdout:
+            sys_exit("Return code %i from command:\n%s\n\n%s\n\n%s" % (return_code, cmd_str, stdout, stderr))
+        else:
+            sys_exit("Return code %i from command:\n%s\n%s" % (return_code, cmd_str, stderr))
+
+def get_version(mira_binary):
+    """Run MIRA to find its version number"""
+    # At the commend line I would use: mira -v | head -n 1
+    # however there is some pipe error when doing that here.
+    cmd = [mira_binary, "-v"]
+    try:
+        child = subprocess.Popen(cmd,
+                                 stdout=subprocess.PIPE,
+                                 stderr=subprocess.STDOUT)
+    except Exception, err:
+        sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
+        sys.exit(1)
+    ver, tmp = child.communicate()
+    del child
+    return ver.split("\n", 1)[0].strip()
+
+#Parse Command Line
+usage = """Galaxy MIRA4 wrapper script v%s - use as follows:
+
+$ python mira4_convert.py ...
+
+This will run the MIRA miraconvert binary and collect its output files as directed.
+""" % WRAPPER_VER
+parser = OptionParser(usage=usage)
+parser.add_option("--input", dest="input",
+                  default=None, metavar="FILE",
+                  help="MIRA input filename")
+parser.add_option("-x", "--min_length", dest="min_length",
+                  default="0",
+                  help="Minimum contig length")
+parser.add_option("-y", "--min_cover", dest="min_cover",
+                  default="0",
+                  help="Minimum average contig coverage")
+parser.add_option("-z", "--min_reads", dest="min_reads",
+                  default="0",
+                  help="Minimum reads per contig")
+parser.add_option("--maf", dest="maf",
+                  default="", metavar="FILE",
+                  help="MIRA MAF output filename")
+parser.add_option("--ace", dest="ace",
+                  default="", metavar="FILE",
+                  help="ACE output filename")
+parser.add_option("--bam", dest="bam",
+                  default="", metavar="FILE",
+                  help="Unpadded BAM output filename")
+parser.add_option("--fasta", dest="fasta",
+                  default="", metavar="FILE",
+                  help="Unpadded FASTA output filename")
+parser.add_option("--cstats", dest="cstats",
+                  default="", metavar="FILE",
+                  help="Contig statistics filename")
+parser.add_option("-v", "--version", dest="version",
+                  default=False, action="store_true",
+                  help="Show version and quit")
+options, args = parser.parse_args()
+if args:
+    sys_exit("Expected options (e.g. --input example.maf), not arguments")
+
+input_maf = options.input
+out_maf = options.maf
+out_bam = options.bam
+out_fasta = options.fasta
+out_ace = options.ace
+out_cstats = options.cstats
+
+try:
+    mira_path = os.environ["MIRA4"]
+except KeyError:
+    sys_exit("Environment variable $MIRA4 not set")
+mira_convert = os.path.join(mira_path, "miraconvert")
+if not os.path.isfile(mira_convert):
+    sys_exit("Missing miraconvert under $MIRA4, %r\nFolder contained: %s"
+             % (mira_convert, ", ".join(os.listdir(mira_path))))
+
+mira_convert_ver = get_version(mira_convert)
+if not mira_convert_ver.strip().startswith("4.0"):
+    sys_exit("This wrapper is for MIRA V4.0, not:\n%s\n%s" % (mira_convert_ver, mira_convert))
+if options.version:
+    print("%s, MIRA wrapper version %s" % (mira_convert_ver, WRAPPER_VER))
+    sys.exit(0)
+
+if not input_maf:
+    sys_exit("Input MIRA file is required")
+elif not os.path.isfile(input_maf):
+    sys_exit("Missing input MIRA file: %r" % input_maf)
+
+if not (out_maf or out_bam or out_fasta or out_ace or out_cstats):
+    sys_exit("No output requested")
+
+
+def check_min_int(value, name):
+    try:
+        i = int(value)
+    except:
+        sys_exit("Bad %s setting, %r" % (name, value))
+    if i < 0:
+        sys_exit("Negative %s setting, %r" % (name, value))
+    return i
+
+min_length = check_min_int(options.min_length, "minimum length")
+min_cover = check_min_int(options.min_cover, "minimum cover")
+min_reads = check_min_int(options.min_reads, "minimum reads")
+
+#TODO - Run MIRA in /tmp or a configurable directory?
+#Currently Galaxy puts us somewhere safe like:
+#/opt/galaxy-dist/database/job_working_directory/846/
+temp = "."
+
+
+cmd_list = [mira_convert]
+if min_length:
+    cmd_list.extend(["-x", str(min_length)])
+if min_cover:
+    cmd_list.extend(["-y", str(min_cover)])
+if min_reads:
+    cmd_list.extend(["-z", str(min_reads)])
+cmd_list.extend(["-f", "maf", input_maf, os.path.join(temp, "converted")])
+if out_maf:
+    cmd_list.append("maf")
+if out_bam:
+    cmd_list.append("samnbb")
+    if not out_fasta:
+        #Need this for samtools depad
+        out_fasta = os.path.join(temp, "depadded.fasta")
+if out_fasta:
+    cmd_list.append("fasta")
+if out_ace:
+    cmd_list.append("ace")
+if out_cstats:
+    cmd_list.append("cstats")
+run(cmd_list)
+
+def collect(old, new):
+    if not os.path.isfile(old):
+        sys_exit("Missing expected output file %s" % old)
+    shutil.move(old, new)
+
+if out_maf:
+    collect(os.path.join(temp, "converted.maf"), out_maf)
+if out_fasta:
+    #Can we look at the MAF file to see if there are multiple strains?
+    old = os.path.join(temp, "converted_AllStrains.unpadded.fasta")
+    if os.path.isfile(old):
+        collect(old, out_fasta)
+    else:
+        #Might the output be filtered down to zero contigs?
+        old = os.path.join(temp, "converted.fasta")
+        if not os.path.isfile(old):
+            sys_exit("Missing expected output FASTA file")
+        elif os.path.getsize(old) == 0:
+            print("Warning - no contigs (harsh filters?)")
+            collect(old, out_fasta)
+        else:
+            sys_exit("Missing expected output FASTA file (only generic file present)")
+if out_ace:
+    collect(os.path.join(temp, "converted.maf"), out_ace)
+if out_cstats:
+    collect(os.path.join(temp, "converted_info_contigstats.txt"), out_cstats)
+
+if out_bam:
+    assert os.path.isfile(out_fasta)
+    old = os.path.join(temp, "converted.samnbb")
+    if not os.path.isfile(old):
+        old = os.path.join(temp, "converted.sam")
+    if not os.path.isfile(old):
+        sys_exit("Missing expected intermediate file %s" % old)
+    h = BytesIO()
+    msg = depad(out_fasta, old, out_bam, h)
+    if msg:
+        print(msg)
+        print(h.getvalue())
+        h.close()
+        sys.exit(1)
+    h.close()
+    if out_fasta == os.path.join(temp, "depadded.fasta"):
+        #Not asked for by Galaxy, no longer needed
+        os.remove(out_fasta)
+
+if min_length or min_cover or min_reads:
+    print("Filtered.")
+else:
+    print("Converted.")
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_0/mira4_de_novo.xml	Wed Sep 02 07:45:07 2015 -0400
@@ -0,0 +1,275 @@
+<tool id="mira_4_0_de_novo" name="MIRA v4.0 de novo assember" version="0.0.8">
+    <description>Takes Sanger, Roche 454, Solexa/Illumina, Ion Torrent and PacBio reads</description>
+    <requirements>
+        <requirement type="binary">mira</requirement>
+        <requirement type="binary">miraconvert</requirement>
+        <requirement type="package" version="4.0.2">MIRA</requirement>
+        <requirement type="binary">samtools</requirement>
+        <requirement type="package" version="0.1.19">samtools</requirement>
+    </requirements>
+    <code file="mira4_validator.py" />
+    <stdio>
+        <!-- Assume anything other than zero is an error -->
+        <exit_code range="1:" />
+        <exit_code range=":-1" />
+    </stdio>
+    <version_command interpreter="python">mira4.py --version</version_command>
+    <command interpreter="python">mira4.py
+--manifest "$manifest"
+#if str($maf_wanted)=="true":
+--maf "$out_maf"
+#end if
+#if str($bam_wanted)=="true":
+--bam "$out_bam"
+#end if
+--fasta "$out_fasta"
+--log "$out_log"
+    </command>
+    <configfiles>
+        <configfile name="manifest">
+project = MIRA
+job = denovo,${job_type},${job_quality}
+parameters = -NW:cmrnl=no -DI:trt=/tmp -OUT:orc=no
+## -GE:not is short for -GENERAL:number_of_threads and using one (1)
+## can be useful for repeatability of assemblies and bug hunting.
+## This is overriden by the command line -t switch which is easier
+## to set from within Galaxy.
+##
+## -NW:cmrnl is short for -NAG_AND_WARN:check_maxreadnamelength
+## and without this MIRA aborts with read names over 40 characters
+## due to limitations of some downstream tools.
+##
+## -DI:trt is short for -DIRECTORY:tmp_redirected_to and should
+## point to a local hard drive (not something like NFS on network).
+## We replace /tmp with an environment variable via mira4.py
+##
+## -OUT:orc=no is short for -OUTPUT:output_result_caf=no 
+## which turns off an output file we don't want anyway.
+
+#for $rg in $read_group
+
+##This bar goes into the manifest as a comment line
+#------------------------------------------------------------------------------
+
+readgroup
+technology = ${rg.technology}
+##Record the segment placement (if any)
+#if str($rg.segments.type) == "paired"
+segment_placement = ${rg.segments.placement}
+segment_naming = ${rg.segments.naming}
+#if str($rg.segments.min_size) != "" or str($rg.segments.max_size) != ""
+##If our min/max validation failed I trust MIRA to give an error message...
+template_size = $rg.segments.min_size $rg.segments.max_size
+#end if
+#end if
+##if str($rg.segments.type) == "none"
+##MIRA4 manual says use segment_placement = unknown or ? for unpaired data
+##but this stopped working in MIRA 4.0 RC5 and 4.0 (final). See:
+##http://www.freelists.org/post/mira_talk/Unpaired-reads-and-segment-placement--or-unknown
+##segment_placement = ?
+##end if
+##MIRA will accept multiple filenames on one data line, or multiple data lines
+#for $f in $rg.filenames
+##Must now map Galaxy datatypes to MIRA file types...
+#if $f.ext.startswith("fastq")
+##MIRA doesn't like fastqsanger etc, just plain old fastq:
+data = fastq::$f
+#elif $f.ext == "mira"
+##We're calling *.maf the "mira" format in Galaxy (name space collision)
+data = maf::$f
+#else
+##MIRA is happy with fasta as name,
+data = ${f.ext}::$f
+#end if
+#end for
+#end for
+        </configfile>
+    </configfiles>
+    <inputs>
+        <param name="job_type" type="select" label="Assembly type">
+            <option value="genome">Genome</option>
+            <option value="est">EST (transcriptome)</option>
+        </param>
+        <param name="job_quality" type="select" label="Assembly quality grade">
+            <option value="accurate">Accurate</option>
+            <option value="draft">Draft</option>
+        </param>
+        <repeat name="read_group" title="Read Group" min="1">
+            <param name="technology" type="select" label="Read technology">
+                <option value="solexa">Solexa/Illumina</option>
+                <option value="sanger">Sanger cappillary sequencing</option>
+                <option value="454">Roche 454</option>
+                <option value="iontor">Ion Torrent</option>
+                <option value="pcbiolq">PacBio low quality (raw)</option>
+                <option value="pcbiohq">PacBio high quality (corrected)</option>
+                <option value="text">Synthetic reads (database entries, consensus sequences, artifical reads, etc)</option>
+                <!-- TODO reference/backbone as an entry here? -->
+            </param>
+            <conditional name="segments">
+                <param name="type" type="select" label="Are these paired reads?">
+                    <option value="paired">Paired reads</option>
+                    <option value="none">Single reads or not relevant (e.g. primer walking with Sanger capillary sequencing)</option>
+                </param>
+                <when value="paired">
+                    <param name="placement" type="select" label="Pairing type (segment placing)">
+                        <option value="FR">---&gt; &lt;--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option>
+                        <option value="RF">&lt;--- ---&gt; (e.g. Solexa/Illumina mate-pair library)</option>
+                        <option value="SB">2---&gt; 1---&gt; (e.g. Roche 454 paired-end libraries or IonTorrent long-mate; see note)</option>
+                    </param>
+                    <!-- min/max validation is done via the <code> tag -->
+                    <param name="min_size" type="integer" optional="true" min="0" value=""
+                           label="Minimum size of 'good' DNA templates in the library preparation"
+                           help="Optional, but if used you must also supply a maximum value." />
+                    <param name="max_size" type="integer" optional="true" min="0" value=""
+                           label="Maximum size of 'good' DNA templates in the library preparation"
+                           help="Optional, but if used you must also supply a minimum value." />
+                    <param name="naming" type="select" label="Pair naming convention">
+                        <option value="solexa">Solexa/Illumina (using '/1' and '/2' suffixes, or later Illumina colon system)</option>
+                        <option value="FR">Forward/Reverse scheme (using '.f*' and '.r*' suffixes)</option>
+                        <option value="tigr">TIGR scheme (using 'TF*' and 'TR*' suffixes)</option>
+                        <option value="sanger">Sanger scheme (see notes)</option>
+                        <option value="stlouis">St. Louis scheme (see notes)</option>
+                    </param>
+                </when>
+                <when value="none" /><!-- no further questions -->
+            </conditional>
+            <param name="filenames" type="data" format="fastq,mira" multiple="true" required="true" label="Read file(s)"
+                  help="Multiple files allowed, for example paired reads can be given as two files (MIRA looks at read names to identify pairs)." />
+        </repeat>
+        <param name="maf_wanted" type="boolean" label="Output assembly in MIRA's own format?" checked="False" />
+        <param name="bam_wanted" type="boolean" label="Convert assembly into BAM format?" checked="True" />
+    </inputs>
+    <outputs>
+        <data name="out_fasta" format="fasta" label="MIRA de novo contigs (FASTA)" />
+        <data name="out_bam" format="bam" label="MIRA de novo assembly (BAM)">
+            <filter>bam_wanted is True</filter>
+        </data>
+        <data name="out_maf" format="mira" label="MIRA de novo assembly">
+            <filter>maf_wanted is True</filter>
+        </data>
+        <!-- TODO?                                                                                                                          
+        <data name="out_contigstats" format="tabular" label="MIRA contig stats" />                                                          
+        -->
+        <data name="out_log" format="txt" label="MIRA de novo log" />
+    </outputs>
+    <tests>
+        <!-- Tiger mitochondria, selected paired end Illumina reads from SRR639755
+             Note we're using just one repeat group, and only the filenames parameter
+             within it, so this should work with current test framework limitations:
+             TODO: Revise example and/or -NW:cac=warn and -NW:acv=80 settings
+             MIRA 4.0 complains as coverage is about x93 which is over 80 limit.
+             Also MIRA 4.0 gives three contigs as output.
+        <test>
+            <param name="job_type" value="genome" />
+            <param name="job_quality" value="accurate" />
+            <param name="filenames" value="SRR639755_mito_pairs.fastq.gz" ftype="fastqsanger" />
+            <output name="out_fasta" file="SRR639755_mito_pairs.mira4_de_novo.fasta" ftype="fasta" />
+        </test>
+        -->
+        <!-- Simple assembly based on MIRA's minidemo/demo4 example
+             Note we're using just one repeat group,
+             but several parameters with the repeat
+        -->
+        <test>
+            <param name="job_type" value="genome" />
+            <param name="job_quality" value="accurate" />
+            <param name="technology" value="sanger" />
+            <param name="type" value="none" />
+            <param name="filenames" value="U13small_m.fastq" ftype="fastqsanger" />
+            <param name="maf_wanted" value="true"/>
+            <param name="bam_wanted" value="true"/>
+            <output name="out_fasta" file="U13small_m.mira4_de_novo.fasta" ftype="fasta" />
+            <output name="out_bam" file="empty_file.dat" compare="contains" />
+            <!-- TODO: Suggest startswith as a compare method? -->
+            <output name="out_maf" file="header.mira" compare="contains" />
+            <output name="out_log" file="empty_file.dat" compare="contains" />
+        </test>
+        <!-- Simple assembly based on MIRA's minidemo/solexa1 example
+             Note we're using just one repeat group,
+             but two parameters within the repeat (filename, no pairing)
+         -->
+        <test>
+            <param name="job_type" value="genome" />
+            <param name="job_quality" value="accurate" />
+            <param name="type" value="none" />
+            <param name="filenames" value="ecoli.fastq" ftype="fastqsanger" />
+            <param name="maf_wanted" value="false"/>
+            <param name="bam_wanted" value="false"/>
+            <output name="out_fasta" file="ecoli.mira4_de_novo.fasta" ftype="fasta" />
+            <output name="out_log" file="empty_file.dat" compare="contains" />
+        </test>
+    </tests>
+    <help>
+
+**What it does**
+
+Runs MIRA v4.0 in de novo mode, collects the output, generates a sorted BAM
+file, and then throws away all the temporary files.
+
+MIRA is an open source assembly tool capable of handling sequence data from
+a range of platforms (Sanger capillary, Solexa/Illumina, Roche 454, Ion Torrent
+and also PacBio).
+
+It is particularly suited to small genomes such as bacteria.
+
+
+**Notes on paired reads**
+
+.. class:: warningmark
+
+MIRA uses read naming conventions to identify paired read partners
+(and does not care about their order in the input files). In most cases,
+the Solexa/Illumina setting is fine. For Sanger capillary sequencing,
+you may need to rename your reads to match one of the standard conventions
+supported by MIRA. For Roche 454 or Ion Torrent the appropriate settings
+depend on how the FASTQ file was produced:
+
+* If using Roche's ``sffinfo`` or older versions of ``sff_extract``
+  to convert SFF files to FASTQ, your reads will probably have the
+  ``---&gt; &lt;---`` orientation and use the ``.f`` and ``.r``
+  suffixes (FR naming).
+
+* If using a recent version of ``sff_extract``, then the ``/1`` and ``/2``
+  suffixes are used (Solexa/Illumina style naming) and the original
+  ``2---&gt; 1---&gt;`` orientation is preserved.
+
+The reason for this is the raw data for Roche 454 and Ion Torrent paired-end
+libraries sequences a circularised fragment such that the raw data begins
+with the end of the fragment, a linker, then the start of the fragment.
+This means both the start and end are sequenced from the same strand, and
+have the orientation ``2---&gt; 1---&gt;``. However, in order to use the data
+with traditional tools expecting Sanger capillary style ``---&gt; &lt;---``
+orientation it was common to reverse complement one of the pair to mimic this.
+
+
+**Citation**
+
+If you use this Galaxy tool in work leading to a scientific publication please
+cite the following papers:
+
+Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013).
+Galaxy tools and workflows for sequence analysis with applications
+in molecular plant pathology. PeerJ 1:e167
+http://dx.doi.org/10.7717/peerj.167
+
+Bastien Chevreux, Thomas Wetter and Sándor Suhai (1999).
+Genome Sequence Assembly Using Trace Signals and Additional Sequence Information.
+Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56.
+http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler
+    </help>
+    <citations>
+        <citation type="doi">10.7717/peerj.167</citation>
+        <citation type="bibtex">@ARTICLE{Chevreux1999-mira3,
+        author = {B. Chevreux and T. Wetter and S. Suhai},
+        year = {1999},
+        title = {Genome Sequence Assembly Using Trace Signals and Additional Sequence Information},
+        journal = {Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB)}
+        volume = {99},
+        pages = {45-56},
+        url = {http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html}
+        }</citation>
+    </citations>
+</tool>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_0/mira4_make_bam.py	Wed Sep 02 07:45:07 2015 -0400
@@ -0,0 +1,92 @@
+#!/usr/bin/env python
+"""Wrapper script using miraconvert & samtools to get BAM from MIRA.
+"""
+import os
+import sys
+import shutil
+import subprocess
+import tempfile
+
+def sys_exit(msg, err=1):
+    sys.stderr.write(msg+"\n")
+    sys.exit(err)
+
+def run(cmd, log_handle):
+    try:
+        child = subprocess.Popen(cmd, shell=True,
+                                 stdout=subprocess.PIPE,
+                                 stderr=subprocess.STDOUT)
+    except Exception, err:
+        sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
+        #TODO - call clean up?
+        log_handle.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
+        sys.exit(1)
+    #Use .communicate as can get deadlocks with .wait(),
+    stdout, stderr = child.communicate()
+    assert not stderr #Should be empty as sent to stdout
+    if len(stdout) > 10000:
+        #miraconvert can be very verbose (is holding stdout in RAM a problem?)
+        stdout = stdout.split("\n")
+        stdout = stdout[:10] + ["...", "<snip>", "..."] + stdout[-10:]
+        stdout = "\n".join(stdout)
+    log_handle.write(stdout)
+    return child.returncode
+
+def depad(fasta_file, sam_file, bam_file, log_handle):
+    log_handle.write("\n================= Converting MIRA assembly from SAM to BAM ===================\n")
+    #Also doing SAM to (uncompressed) BAM during depad
+    bam_stem = bam_file + ".tmp" # Have write permissions and want final file in this folder
+    cmd = 'samtools depad -S -u -T "%s" "%s" | samtools sort - "%s"' % (fasta_file, sam_file, bam_stem)
+    return_code = run(cmd, log_handle)
+    if return_code:
+        return "Error %i from command:\n%s" % (return_code, cmd)
+    if not os.path.isfile(bam_stem + ".bam"):
+        return "samtools depad or sort failed to produce BAM file"
+
+    log_handle.write("\n====================== Indexing MIRA assembly BAM file =======================\n")
+    cmd = 'samtools index "%s.bam"' % bam_stem
+    return_code = run(cmd, log_handle)
+    if return_code:
+        return "Error %i from command:\n%s" % (return_code, cmd)
+    if not os.path.isfile(bam_stem + ".bam.bai"):
+        return "samtools indexing of BAM file failed to produce BAI file"
+
+    shutil.move(bam_stem + ".bam", bam_file)
+    os.remove(bam_stem + ".bam.bai") #Let Galaxy handle that...
+
+
+def make_bam(mira_convert, maf_file, fasta_file, bam_file, log_handle):
+    if not os.path.isfile(mira_convert):
+        return "Missing binary %r" % mira_convert
+    if not os.path.isfile(maf_file):
+        return "Missing input MIRA file: %r" % maf_file
+    if not os.path.isfile(fasta_file):
+        return "Missing padded FASTA file: %r" % fasta_file
+
+    log_handle.write("\n====================== Converting MIRA assembly to SAM =======================\n")
+    tmp_dir = tempfile.mkdtemp()
+    sam_file = os.path.join(tmp_dir, "x.sam")
+
+    # Note add nbb to the template name, possible MIRA 4.0 RC4 bug
+    cmd = '"%s" -f maf -t samnbb "%s" "%snbb"' % (mira_convert, maf_file, sam_file)
+    return_code = run(cmd, log_handle)
+    if return_code:
+        return "Error %i from command:\n%s" % (return_code, cmd)
+    if not os.path.isfile(sam_file):
+        return "Conversion from MIRA to SAM failed"
+
+    #Also doing SAM to (uncompressed) BAM during depad
+    msg = depad(fasta_file, sam_file, bam_file, log_handle)
+    if msg:
+        return msg
+
+    os.remove(sam_file)
+    os.rmdir(tmp_dir)
+
+    return None #Good :)
+
+if __name__ == "__main__":
+    mira_convert, maf_file, fasta_file, bam_file = sys.argv[1:]
+    msg = make_bam(mira_convert, maf_file, fasta_file, bam_file, sys.stdout)
+    if msg:
+        sys_exit(msg)
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_0/mira4_mapping.xml	Wed Sep 02 07:45:07 2015 -0400
@@ -0,0 +1,279 @@
+<tool id="mira_4_0_mapping" name="MIRA v4.0 mapping" version="0.0.8">
+    <description>Maps Sanger, Roche 454, Solexa/Illumina, Ion Torrent and PacBio reads</description>
+    <requirements>
+        <requirement type="binary">mira</requirement>
+        <requirement type="binary">miraconvert</requirement>
+        <requirement type="package" version="4.0.2">MIRA</requirement>
+        <requirement type="binary">samtools</requirement>
+        <requirement type="package" version="0.1.19">samtools</requirement>
+    </requirements>
+    <stdio>
+        <!-- Assume anything other than zero is an error -->
+        <exit_code range="1:" />
+        <exit_code range=":-1" />
+    </stdio>
+    <version_command interpreter="python">mira4.py --version</version_command>
+    <command interpreter="python">mira4.py
+--manifest "$manifest"
+#if str($maf_wanted) == "true":
+--maf "$out_maf"
+#end if
+#if str($bam_wanted) == "true":
+--bam "$out_bam"
+#end if
+--fasta "$out_fasta"
+--log "$out_log"
+    </command>
+    <configfiles>
+        <configfile name="manifest">
+project = MIRA
+job = mapping,${job_type},${job_quality}
+parameters = -NW:cmrnl=no -DI:trt=/tmp -OUT:orc=no
+## -GE:not is short for -GENERAL:number_of_threads and using one (1)
+## can be useful for repeatability of assemblies and bug hunting.
+## This is overriden by the command line -t switch which is easier
+## to set from within Galaxy.
+##
+## -NW:cmrnl is short for -NAG_AND_WARN:check_maxreadnamelength
+## and without this MIRA aborts with read names over 40 characters
+## due to limitations of some downstream tools.
+##
+## -DI:trt is short for -DIRECTORY:tmp_redirected_to and should
+## point to a local hard drive (not something like NFS on network).
+## We replace /tmp with an environment variable via mira4.py
+##
+## -OUT:orc=no is short for -OUTPUT:output_result_caf=no
+## which turns off an output file we don't want anyway.
+
+##This bar goes into the manifest as a comment line
+#------------------------------------------------------------------------------
+
+readgroup
+is_reference
+#if str($strain_setup)=="same"
+strain = StrainX
+#end if
+#for $f in $references
+##Must now map Galaxy datatypes to MIRA file types...
+#if $f.ext.startswith("fastq")
+##MIRA doesn't like fastqsanger etc, just plain old fastq:
+data = fastq::$f
+#elif $f.ext == "mira"
+##We're calling *.maf the "mira" format in Galaxy (name space collision)
+data = maf::$f
+#elif $f.ext == "fasta"
+##We're calling MIRA with the file type as "fna" as otherwise it wants quals
+data = fna::$f
+#else
+##Currently don't expect anything else...
+data = ${f.ext}::$f
+#end if
+#end for
+#for $rg in $read_group
+
+##This bar goes into the manifest as a comment line
+#------------------------------------------------------------------------------
+
+readgroup
+technology = ${rg.technology}
+#if str($strain_setup)=="same"
+##This is perhaps redundant as MIRA defaults to StrainX for the reads:
+strain = StrainX
+#end if
+##Record the segment placement (if any)
+#if str($rg.segments.type) == "paired"
+segment_placement = ${rg.segments.placement}
+segment_naming = ${rg.segments.naming}
+#end if
+##if str($rg.segments.type) == "none"
+##MIRA4 manual says use segment_placement = unknown or ? for unpaired data
+##but this stopped working in MIRA 4.0 RC5 and 4.0 (final). See:
+##http://www.freelists.org/post/mira_talk/Unpaired-reads-and-segment-placement--or-unknown
+##segment_placement = ?
+##end if
+##MIRA will accept multiple filenames on one data line, or multiple data lines
+#for $f in $rg.filenames
+##Must now map Galaxy datatypes to MIRA file types...
+#if $f.ext.startswith("fastq")
+##MIRA doesn't like fastqsanger etc, just plain old fastq:
+data = fastq::$f
+#elif $f.ext == "mira"
+##We're calling *.maf the "mira" format in Galaxy (name space collision)
+data = maf::$f
+#else
+##Currently don't expect anything else...
+data = ${f.ext}::$f
+#end if
+#end for
+#end for
+        </configfile>
+    </configfiles>
+    <inputs>
+        <param name="job_type" type="select" label="Assembly type">
+            <option value="genome">Genome</option>
+            <option value="est">EST (transcriptome)</option>
+        </param>
+        <param name="job_quality" type="select" label="Assembly quality grade">
+            <option value="accurate">Accurate</option>
+            <option value="draft">Draft</option>
+        </param>
+        <!-- TODO? Allow technology type for references? -->
+        <!-- TODO? Allow strain settings for reference(s) and reads? -->
+        <!-- TODO? Use a repeat to allow for multi-strain references? -->
+        <!-- TODO? Add strain to the mapping read groups? -->
+        <param name="references" type="data" format="fasta,fastq,mira" multiple="true" required="true" label="Backbone reference file(s)"
+               help="Multiple files allowed, for example one FASTA file per chromosome or plasmid." />
+        <param name="strain_setup" type="select" label="Strain configuration (reference vs reads)">
+            <option value="default">Different strains - mapping reads onto a related reference ('StrainX' vs 'ReferenceStrain')</option>
+            <option value="same">Same strain - mapping reads from same reference (all 'StrainX')</option>
+        </param>
+        <repeat name="read_group" title="Read Group" min="1">
+            <param name="technology" type="select" label="Read technology">
+                <option value="solexa">Solexa/Illumina</option>
+                <option value="sanger">Sanger cappillary sequencing</option>
+                <option value="454">Roche 454</option>
+                <option value="iontor">Ion Torrent</option>
+                <option value="pcbiolq">PacBio low quality (raw)</option>
+                <option value="pcbiohq">PacBio high quality (corrected)</option>
+                <option value="text">Synthetic reads (database entries, consensus sequences, artifical reads, etc)</option>
+            </param>
+            <conditional name="segments">
+                <param name="type" type="select" label="Are these paired reads?">
+                    <option value="paired">Paired reads</option>
+                    <option value="none">Single reads or not relevant (e.g. primer walking with Sanger capillary sequencing)</option>
+                </param>
+                <when value="paired">
+                    <param name="placement" type="select" label="Pairing type (segment placing)">
+                        <option value="FR">---&gt; &lt;--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option>
+                        <option value="RF">&lt;--- ---&gt; (e.g. Solexa/Illumina mate-pair library)</option>
+                        <option value="SB">2---&gt; 1---&gt; (e.g. Roche 454 paired-end libraries or IonTorrent long-mate; see note)</option>
+                    </param>
+                    <param name="naming" type="select" label="Pair naming convention">
+                        <option value="solexa">Solexa/Illumina (using '/1' and '/2' suffixes, or later Illumina colon system)</option>
+                        <option value="FR">Forward/Reverse scheme (using '.f*' and '.r*' suffixes)</option>
+                        <option value="tigr">TIGR scheme (using 'TF*' and 'TR*' suffixes)</option>
+                        <option value="sanger">Sanger scheme (see notes)</option>
+                        <option value="stlouis">St. Louis scheme (see notes)</option>
+                    </param>
+                </when>
+                <when value="none" /><!-- no further questions -->
+            </conditional>
+            <param name="filenames" type="data" format="fastq,mira" multiple="true" required="true" label="Read file(s)"
+                   help="Multiple files allowed, for example paired reads can be given as two files (MIRA looks at read names to identify pairs)." />
+        </repeat>
+        <param name="maf_wanted" type="boolean" label="Output mapping in MIRA's own format?" checked="False" />
+        <param name="bam_wanted" type="boolean" label="Convert mapping into BAM format?" checked="True" />
+    </inputs>
+    <outputs>
+        <data name="out_fasta" format="fasta" label="MIRA #if str($strain_setup)=='same' then 'same strain' else 'reference' # mapping contigs (FASTA)" />
+        <data name="out_bam" format="bam" label="MIRA #if str($strain_setup)=='same' then 'same strain' else 'reference' # mapping assembly (BAM)">
+            <filter>bam_wanted is True</filter>
+        </data>
+        <data name="out_maf" format="mira" label="MIRA #if str($strain_setup)=='same' then 'same strain' else 'reference' # mapping assembly">
+            <filter>maf_wanted is True</filter>
+        </data>
+        <data name="out_log" format="txt" label="MIRA #if str($strain_setup)=='same' then 'same strain' else 'reference' # mapping log" />
+    </outputs>
+    <tests>
+        <test>
+            <param name="job_type" value="genome" />
+            <param name="job_quality" value="accurate" />
+            <param name="references" value="tvc_contigs.fasta" ftype="fasta" />
+            <param name="strain_setup" value="default" />
+            <param name="type" value="none" />
+            <param name="filenames" value="tvc_mini.fastq" ftype="fastqsanger" />
+            <param name="maf_wanted" value="true"/>
+            <param name="bam_wanted" value="true"/>
+            <output name="out_fasta" file="tvc_map_ref_strain.fasta" ftype="fasta" />
+            <output name="out_bam" file="empty_file.dat" compare="contains" />
+            <!-- TODO: Suggest startswith as a compare method? -->
+            <output name="out_maf" file="header.mira" compare="contains" />
+            <output name="out_log" file="empty_file.dat" compare="contains" />
+        </test>
+        <test>
+            <param name="job_type" value="genome" />
+            <param name="job_quality" value="accurate" />
+            <param name="references" value="tvc_contigs.fasta" ftype="fasta" />
+            <param name="strain_setup" value="same" />
+            <param name="type" value="none" />
+            <param name="filenames" value="tvc_mini.fastq" ftype="fastqsanger" />
+            <param name="maf_wanted" value="false"/>
+            <param name="bam_wanted" value="false"/>
+            <output name="out_fasta" file="tvc_map_same_strain.fasta" ftype="fasta" />
+            <output name="out_log" file="empty_file.dat" compare="contains" />
+        </test>
+    </tests>
+    <help>
+
+**What it does**
+
+Runs MIRA v4.0 in mapping mode, collects the output, generates a sorted BAM
+file, and throws away all the temporary files.
+
+MIRA is an open source assembly tool capable of handling sequence data from
+a range of platforms (Sanger capillary, Solexa/Illumina, Roche 454, Ion Torrent
+and also PacBio).
+
+It is particularly suited to small genomes such as bacteria.
+
+
+**Notes on paired reads**
+
+.. class:: warningmark
+
+MIRA uses read naming conventions to identify paired read partners
+(and does not care about their order in the input files). In most cases,
+the Solexa/Illumina setting is fine. For Sanger capillary sequencing,
+you may need to rename your reads to match one of the standard conventions
+supported by MIRA. For Roche 454 or Ion Torrent the appropriate settings
+depend on how the FASTQ file was produced:
+
+* If using Roche's ``sffinfo`` or older versions of ``sff_extract``
+  to convert SFF files to FASTQ, your reads will probably have the
+  ``---&gt; &lt;---`` orientation and use the ``.f`` and ``.r``
+  suffixes (FR naming).
+
+* If using a recent version of ``sff_extract``, then the ``/1`` and ``/2``
+  suffixes are used (Solexa/Illumina style naming) and the original
+  ``2---&gt; 1---&gt;`` orientation is preserved.
+
+The reason for this is the raw data for Roche 454 and Ion Torrent paired-end
+libraries sequences a circularised fragment such that the raw data begins
+with the end of the fragment, a linker, then the start of the fragment.
+This means both the start and end are sequenced from the same strand, and
+have the orientation ``2---&gt; 1---&gt;``. However, in order to use the data
+with traditional tools expecting Sanger capillary style ``---&gt; &lt;---``
+orientation it was common to reverse complement one of the pair to mimic this.
+
+
+**Citation**
+
+If you use this Galaxy tool in work leading to a scientific publication please
+cite the following papers:
+
+Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013).
+Galaxy tools and workflows for sequence analysis with applications
+in molecular plant pathology. PeerJ 1:e167
+http://dx.doi.org/10.7717/peerj.167
+
+Bastien Chevreux, Thomas Wetter and Sándor Suhai (1999).
+Genome Sequence Assembly Using Trace Signals and Additional Sequence Information.
+Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56.
+http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler
+    </help>
+    <citations>
+        <citation type="doi">10.7717/peerj.167</citation>
+        <citation type="bibtex">@ARTICLE{Chevreux1999-mira3,
+        author = {B. Chevreux and T. Wetter and S. Suhai},
+        year = {1999},
+        title = {Genome Sequence Assembly Using Trace Signals and Additional Sequence Information},
+        journal = {Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB)}
+        volume = {99},
+        pages = {45-56},
+        url = {http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html}
+        }</citation>
+    </citations>
+</tool>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_0/mira4_validator.py	Wed Sep 02 07:45:07 2015 -0400
@@ -0,0 +1,64 @@
+#Called from the Galaxy Tool XML file
+#import sys
+
+def validate_input(trans, error_map, param_values, page_param_map):
+    """Validates the min_size/max_size user input, before execution."""
+    err_list = []
+    for read_group in param_values["read_group"]:
+        err = dict()
+        segments = read_group["segments"]
+        if str(segments["type"]) != "paired":
+            err_list.append(dict())
+            continue
+
+        min_size = str(segments["min_size"]).strip()
+        max_size = str(segments["max_size"]).strip()
+        #sys.stderr.write("DEBUG min_size=%r, max_size=%r\n" % (min_size, max_size))
+
+        #Somehow Galaxy seems to turn an empty field into string "None"...
+        if min_size=="None":
+            min_size = ""
+        if max_size=="None":
+            max_size = ""
+
+        if min_size=="" and max_size=="":
+            #Both missing is good
+            pass
+        elif min_size=="":
+            err["min_size"] = "Minimum size required if maximum size given"
+        elif max_size=="":
+            err["max_size"] = "Maximum size required if minimum size given"
+            
+        if min_size:
+            try:
+                min_size_int = int(min_size)
+                if min_size_int < 0:
+                    err["min_size"] = "Minumum size must not be negative (%i)" % min_size_int
+                    min_size = None # Avoid doing comparison below
+            except ValueError:
+                err["min_size"] = "Minimum size is not an integer (%s)" % min_size
+                min_size = None # Avoid doing comparison below
+
+        if max_size:
+            try:
+                max_size_int = int(max_size)
+                if max_size_int< 0:
+                    err["max_size"] = "Maximum size must not be negative (%i)" % max_size_int
+                    max_size = None # Avoid doing comparison below
+            except ValueError:
+                err["max_size"] = "Maximum size is not an integer (%s)" % max_size
+                max_size = None # Avoid doing comparison below
+
+        if min_size and max_size and min_size_int > max_size_int:
+            msg = "Minimum size must be less than maximum size (%i vs %i)" % (min_size_int, max_size_int)
+            err["min_size"] = msg
+            err["max_size"] = msg
+
+        if err:
+            err_list.append({"segments":err})
+        else:
+            err_list.append(dict())
+
+    if any(err_list):
+        #Return an error map only if any readgroup gave errors
+        error_map["read_group"] = err_list
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_0/repository_dependencies.xml	Wed Sep 02 07:45:07 2015 -0400
@@ -0,0 +1,4 @@
+<?xml version="1.0"?>
+<repositories description="This requires the MIRA datatype definitions (e.g. the MIRA Assembly Format).">
+    <repository changeset_revision="29ff2a2b6eaa" name="mira_datatypes" owner="peterjc" toolshed="https://testtoolshed.g2.bx.psu.edu" />
+</repositories>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_0/tool_dependencies.xml	Wed Sep 02 07:45:07 2015 -0400
@@ -0,0 +1,9 @@
+<?xml version="1.0"?>
+<tool_dependency>
+    <package name="samtools" version="0.1.19">
+        <repository changeset_revision="0e56e4dac6e7" name="package_samtools_0_1_19" owner="iuc" toolshed="https://testtoolshed.g2.bx.psu.edu" />
+    </package>
+    <package name="MIRA" version="4.0.2">
+        <repository changeset_revision="abd3e33f604f" name="package_mira_4_0_2" owner="peterjc" toolshed="https://testtoolshed.g2.bx.psu.edu" />
+    </package>
+</tool_dependency>
--- a/tools/mira4_assembler/README.rst	Thu May 21 11:23:43 2015 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,177 +0,0 @@
-Galaxy wrapper for the MIRA assembly program (v4.0)
-===================================================
-
-This tool is copyright 2011-2014 by Peter Cock, The James Hutton Institute
-(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
-See the licence text below (MIT licence).
-
-This tool is a short Python script (to collect the MIRA output and move it
-to where Galaxy expects the files) and associated Galaxy wrapper XML file.
-
-It is available from the Galaxy Tool Shed at:
-http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler 
-
-It uses a Galaxy datatype definition 'mira' for the MIRA Assembly Format,
-http://toolshed.g2.bx.psu.edu/view/peterjc/mira_datatypes
-
-A separate wrapper for MIRA v3.4 is available from the Galaxy Tool Shed at:
-http://toolshed.g2.bx.psu.edu/view/peterjc/mira_assembler
-
-Automated Installation
-======================
-
-This should be straightforward. Via the Tool Shed, Galaxy should automatically
-install the 'mira' datatype, samtools, and download and install the precompiled
-binary for MIRA v4.0.2 for the Galaxy wrapper, and run any tests.
-
-For MIRA 4, the Galaxy wrapper has been split in two, allowing separate
-cluster settings for de novo usage (high RAM) and mapping (lower RAM).
-Consult the Galaxy adminstration documentation for your cluster setup.
-
-WARNING: For larger tasks, be aware that MIRA can require vast amounts
-of RAM and run-times of over a week are possible. This tool wrapper makes
-no attempt to spot and reject such large jobs.
-
-
-Manual Installation
-===================
-
-First install the 'mira' datatype for Galaxy, available here:
-
-* http://toolshed.g2.bx.psu.edu/view/peterjc/mira_datatypes 
-
-There are four Galaxy files to install:
-
-* ``mira4_de_novo.xml`` (the Galaxy tool definition for de novo usage)
-* ``mira4_mapping.xml`` (the Galaxy tool definition for mapping usage)
-* ``mira4_convert.xml`` (the Galaxy tool definition for converting MIRA files)
-* ``mira4_bait.xml`` (the Galaxy tool definition for mirabait)
-* ``mira4.py`` (the Python wrapper script)
-* ``mira4_convert.py`` (the Python wrapper script for miraconvert)
-* ``mira4_bait.py`` (the Python wrapper script for mirabait)
-* ``mira4_validator.py`` (the XML parameter validation script)
-
-The suggested location is a new ``tools/mira4`` folder. You will also need to
-modify the ``tools_conf.xml`` file to tell Galaxy to offer the tool::
-
-  <tool file="mira4/mira4_de_novo.xml" />
-  <tool file="mira4/mira4_mapping.xml" />
-
-You will also need to install MIRA, we used version 4.0.2, and define the
-environment variable ``$MIRA4`` pointing at the folder containing the binaries.
-See:
-
-* http://chevreux.org/projects_mira.html
-* http://sourceforge.net/projects/mira-assembler/
-
-You may wish to use different cluster setups for the de novo and mapping
-tools, see above.
-
-You will also need to install samtools (for generating a BAM file from MIRA's
-SAM output).
-
-If you wish to run the unit tests, also move/copy the ``test-data/`` files
-under Galaxy's ``test-data/`` folder. Then::
-
-    $ ./run_tests.sh -id mira_4_0_bait
-    $ ./run_tests.sh -id mira_4_0_de_novo
-    $ ./run_tests.sh -id mira_4_0_mapping
-    $ ./run_tests.sh -id mira_4_0_convert
-
-
-History
-=======
-
-======= ======================================================================
-Version Changes
-------- ----------------------------------------------------------------------
-v0.0.1  - Initial version (prototype for MIRA 4.0 RC4, based on wrapper for v3.4)
-v0.0.2  - Include BAM output (using ``miraconvert`` and ``samtools``).
-        - Updated to target MIRA 4.0.1
-        - Simplified XML to apply input format to output data.
-        - Sets temporary folder at run time to respect environment variables
-          (``$TMPDIR``, ``$TEMP``, or ``$TMP`` in that order). This was
-          previously hard coded as ``/tmp``.
-v0.0.3  - Updated to target MIRA 4.0.2
-v0.0.4  - Using ``optparse`` for the Python wrapper script API
-        - Made MAF and BAM outputs optional
-        - Include wrapper for ``miraconvert``
-v0.0.5  - Tool definition now embeds citation information.
-v0.0.6  - Fixed error handling in ``mira4_convert.py``.
-v0.0.7  - Renamed folder (internal change only).
-        - Reorder XML elements (internal change only).
-        - Use the ``format_source=...`` tag in the MIRA bait wrapper.
-        - Planemo for Tool Shed upload (``.shed.yml``, internal change only).
-        - MIRA 4.0.2 dependency now declared via dedicated Tool Shed package.
-======= ======================================================================
-
-
-Developers
-==========
-
-Development is on a dedicated GitHub repository:
-https://github.com/peterjc/pico_galaxy/tree/master/tools/mira4_assembler
-
-For pushing a release to the test or main "Galaxy Tool Shed", use the following
-Planemo commands (which requires you have set your Tool Shed access details in
-``~/.planemo.yml`` and that you have access rights on the Tool Shed)::
-
-    $ planemo shed_update --shed_target testtoolshed --check_diff ~/repositories/pico_galaxy/tools/mira4_assembler/
-    ...
-
-or::
-
-    $ planemo shed_update --shed_target toolshed --check_diff ~/repositories/pico_galaxy/tools/mira4_assembler/
-    ...
-
-To just build and check the tar ball, use::
-
-    $ planemo shed_upload --tar_only  ~/repositories/pico_galaxy/tools/mira4_assembler/
-    ...
-    $ tar -tzf shed_upload.tar.gz 
-    test-data/U13small_m.fastq
-    test-data/U13small_m.mira4_de_novo.fasta
-    test-data/ecoli.fastq
-    test-data/ecoli.mira4_de_novo.fasta
-    test-data/empty_file.dat
-    test-data/header.mira
-    test-data/tvc_mini.fastq
-    test-data/tvc_contigs.fasta
-    test-data/tvc_map_ref_strain.fasta
-    test-data/tvc_map_same_strain.fasta
-    test-data/tvc_bait.fasta
-    test-data/tvc_mini_bait_neg.fastq
-    test-data/tvc_mini_bait_pos.fastq
-    test-data/tvc_mini_bait_strict.fastq
-    tools/mira4_assembler/README.rst
-    tools/mira4_assembler/mira4.py
-    tools/mira4_assembler/mira4_bait.py
-    tools/mira4_assembler/mira4_convert.py
-    tools/mira4_assembler/mira4_de_novo.xml
-    tools/mira4_assembler/mira4_make_bam.py
-    tools/mira4_assembler/mira4_mapping.xml
-    tools/mira4_assembler/mira4_validator.py
-    tools/mira4_assembler/repository_dependencies.xml
-    tools/mira4_assembler/tool_dependencies.xml
-
-
-Licence (MIT)
-=============
-
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in
-all copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
-THE SOFTWARE.
--- a/tools/mira4_assembler/mira4.py	Thu May 21 11:23:43 2015 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,313 +0,0 @@
-#!/usr/bin/env python
-"""A simple wrapper script to call MIRA and collect its output.
-"""
-import os
-import sys
-import subprocess
-import shutil
-import time
-import tempfile
-from optparse import OptionParser
-
-#Do we need any PYTHONPATH magic?
-from mira4_make_bam import make_bam
-
-WRAPPER_VER = "0.0.4" #Keep in sync with the XML file
-
-def sys_exit(msg, err=1):
-    sys.stderr.write(msg+"\n")
-    sys.exit(err)
-
-
-def get_version(mira_binary):
-    """Run MIRA to find its version number"""
-    # At the commend line I would use: mira -v | head -n 1
-    # however there is some pipe error when doing that here.
-    cmd = [mira_binary, "-v"]
-    try:
-        child = subprocess.Popen(cmd,
-                                 stdout=subprocess.PIPE,
-                                 stderr=subprocess.STDOUT)
-    except Exception, err:
-        sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
-        sys.exit(1)
-    ver, tmp = child.communicate()
-    del child
-    return ver.split("\n", 1)[0].strip()
-
-#Parse Command Line
-usage = """Galaxy MIRA4 wrapper script v%s - use as follows:
-
-$ python mira4.py ...
-
-This will run the MIRA binary and collect its output files as directed.
-""" % WRAPPER_VER
-parser = OptionParser(usage=usage)
-parser.add_option("-m", "--manifest", dest="manifest",
-                  default=None, metavar="FILE",
-                  help="MIRA manifest filename")
-parser.add_option("--maf", dest="maf",
-                  default="-", metavar="FILE",
-                  help="MIRA MAF output filename")
-parser.add_option("--bam", dest="bam",
-                  default="-", metavar="FILE",
-                  help="Unpadded BAM output filename")
-parser.add_option("--fasta", dest="fasta",
-                  default="-", metavar="FILE",
-                  help="Unpadded FASTA output filename")
-parser.add_option("--log", dest="log",
-                  default="-", metavar="FILE",
-                  help="MIRA logging output filename")
-parser.add_option("-v", "--version", dest="version",
-                  default=False, action="store_true",
-                  help="Show version and quit")
-options, args = parser.parse_args()
-manifest = options.manifest
-out_maf = options.maf
-out_bam = options.bam
-out_fasta = options.fasta
-out_log = options.log
-
-try:
-    mira_path = os.environ["MIRA4"]
-except KeyError:
-    sys_exit("Environment variable $MIRA4 not set")
-mira_binary = os.path.join(mira_path, "mira")
-if not os.path.isfile(mira_binary):
-    sys_exit("Missing mira under $MIRA4, %r\nFolder contained: %s"
-             % (mira_binary, ", ".join(os.listdir(mira_path))))
-mira_convert = os.path.join(mira_path, "miraconvert")
-if not os.path.isfile(mira_convert):
-    sys_exit("Missing miraconvert under $MIRA4, %r\nFolder contained: %s"
-             % (mira_convert, ", ".join(os.listdir(mira_path))))
-
-mira_ver = get_version(mira_binary)
-if not mira_ver.strip().startswith("4.0"):
-    sys_exit("This wrapper is for MIRA V4.0, not:\n%s\n%s" % (mira_ver, mira_binary))
-mira_convert_ver = get_version(mira_convert)
-if not mira_convert_ver.strip().startswith("4.0"):
-    sys_exit("This wrapper is for MIRA V4.0, not:\n%s\n%s" % (mira_ver, mira_convert))
-if options.version:
-    print "%s, MIRA wrapper version %s" % (mira_ver, WRAPPER_VER)
-    if mira_ver != mira_convert_ver:
-        print "WARNING: miraconvert %s" % mira_convert_ver
-    sys.exit(0)
-
-if not manifest:
-    sys_exit("Manifest is required")
-elif not os.path.isfile(manifest):
-    sys_exit("Missing input MIRA manifest file: %r" % manifest)
-
-
-try:
-    threads = int(os.environ.get("GALAXY_SLOTS", "1"))
-except ValueError:
-    threads = 1
-assert 1 <= threads, threads
-
-
-def override_temp(manifest):
-    """Override ``-DI:trt=/tmp`` in manifest with environment variable.
-
-    Currently MIRA 4 does not allow envronment variables like ``$TMP``
-    inside the manifest, which is a problem if you need to override
-    the default at run time.
-
-    The tool XML will ``/tmp`` and we replace that here with
-    ``tempfile.gettempdir()`` which will respect $TMPDIR, $TEMP, $TMP
-    as explained in the Python standard library documentation:
-    http://docs.python.org/2/library/tempfile.html#tempfile.tempdir
-
-    By default MIRA 4 would write its temporary files within the output
-    folder, which is a problem if that is a network drive.
-    """
-    handle = open(manifest, "r")
-    text = handle.read()
-    handle.close()
-
-    #At time of writing, this is at the end of a file,
-    #but could be followed by a space in future...
-    text = text.replace("-DI:trt=/tmp", "-DI:trt=" + tempfile.gettempdir())
-
-    #Want to try to ensure this gets written to disk before MIRA attempts
-    #to open it - any networked file system may impose a delay...
-    handle = open(manifest, "w")
-    handle.write(text)
-    handle.flush()
-    os.fsync(handle.fileno())
-    handle.close()
-
-
-def log_manifest(manifest):
-    """Write the manifest file to stderr."""
-    sys.stderr.write("\n%s\nManifest file\n%s\n" % ("="*60, "="*60))
-    with open(manifest) as h:
-        for line in h:
-            sys.stderr.write(line)
-    sys.stderr.write("\n%s\nEnd of manifest\n%s\n" % ("="*60, "="*60))
-
-
-def collect_output(temp, name, handle):
-    """Moves files to the output filenames (global variables)."""
-    n3 = (temp, name, name, name)
-    f = "%s/%s_assembly/%s_d_results" % (temp, name, name)
-    if not os.path.isdir(f):
-        log_manifest(manifest)
-        sys_exit("Missing output folder")
-    if not os.listdir(f):
-        log_manifest(manifest)
-        sys_exit("Empty output folder")
-    missing = []
-
-    old_maf = "%s/%s_out.maf" % (f, name)
-    if not os.path.isfile(old_maf):
-        #Triggered extractLargeContigs.sh?
-        old_maf = "%s/%s_LargeContigs_out.maf" % (f, name)
-
-    #De novo or single strain mapping,
-    old_fasta = "%s/%s_out.unpadded.fasta" % (f, name)
-    ref_fasta = "%s/%s_out.padded.fasta" % (f, name)
-    if not os.path.isfile(old_fasta):
-        #Mapping (StrainX versus reference) or de novo
-        old_fasta = "%s/%s_out_StrainX.unpadded.fasta" % (f, name)
-        ref_fasta = "%s/%s_out_StrainX.padded.fasta" % (f, name)
-    if not os.path.isfile(old_fasta):
-        old_fasta = "%s/%s_out_ReferenceStrain.unpadded.fasta" % (f, name)
-        ref_fasta = "%s/%s_out_ReferenceStrain.padded.fasta" % (f, name)
-        
-
-    missing = False
-    for old, new in [(old_maf, out_maf),
-                     (old_fasta, out_fasta)]:
-        if not os.path.isfile(old):
-            missing = True
-        elif not new or new == "-":
-            handle.write("Ignoring %s\n" % old)
-        else:
-            handle.write("Capturing %s\n" % old)
-            shutil.move(old, new)
-    if missing:
-        log_manifest(manifest)
-        sys.stderr.write("Contents of %r:\n" % f)
-        for filename in sorted(os.listdir(f)):
-            sys.stderr.write("%s\n" % filename)
-
-    #For mapping mode, probably most people would expect a BAM file
-    #using the reference FASTA file...
-    if out_bam and out_bam != "-":
-        if out_maf and out_maf != "-":
-            msg = make_bam(mira_convert, out_maf, ref_fasta, out_bam, handle)
-        else:
-            #Not collecting the MAF file, use original location        
-            msg = make_bam(mira_convert, old_maf, ref_fasta, out_bam, handle)
-        if msg:
-            sys_exit(msg)
-
-def clean_up(temp, name):
-    folder = "%s/%s_assembly" % (temp, name)
-    if os.path.isdir(folder):
-        shutil.rmtree(folder)
-
-#TODO - Run MIRA in /tmp or a configurable directory?
-#Currently Galaxy puts us somewhere safe like:
-#/opt/galaxy-dist/database/job_working_directory/846/
-temp = "."
-
-name = "MIRA"
-
-override_temp(manifest)
-
-start_time = time.time()
-cmd_list = [mira_binary, "-t", str(threads), manifest]
-cmd = " ".join(cmd_list)
-
-assert os.path.isdir(temp)
-d = "%s_assembly" % name
-#This can fail on my development machine if stale folders exist
-#under Galaxy's .../database/job_working_directory/ tree:
-assert not os.path.isdir(d), "Path %r already exists:\n%s" % (d, os.path.abspath(d))
-try:
-    #Check path access
-    os.mkdir(d)
-except Exception, err:
-    log_manifest(manifest)
-    sys.stderr.write("Error making directory %s\n%s" % (d, err))
-    sys.exit(1)
-
-#print os.path.abspath(".")
-#print cmd
-
-if out_log and out_log != "-":
-    handle = open(out_log, "w")
-else:
-    handle = open(os.devnull, "w")
-handle.write("======================== MIRA manifest (instructions) ========================\n")
-m = open(manifest, "rU")
-for line in m:
-    handle.write(line)
-m.close()
-del m
-handle.write("\n")
-handle.write("============================ Starting MIRA now ===============================\n")
-handle.flush()
-try:
-    #Run MIRA
-    child = subprocess.Popen(cmd_list,
-                             stdout=handle,
-                             stderr=subprocess.STDOUT)
-except Exception, err:
-    log_manifest(manifest)
-    sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
-    #TODO - call clean up?
-    handle.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
-    handle.close()
-    sys.exit(1)
-#Use .communicate as can get deadlocks with .wait(),
-stdout, stderr = child.communicate()
-assert not stdout and not stderr #Should be empty as sent to handle
-run_time = time.time() - start_time
-return_code = child.returncode
-handle.write("\n")
-handle.write("============================ MIRA has finished ===============================\n")
-handle.write("MIRA took %0.2f hours\n" % (run_time / 3600.0))
-if return_code:
-    print "MIRA took %0.2f hours" % (run_time / 3600.0)
-    handle.write("Return error code %i from command:\n" % return_code)
-    handle.write(cmd + "\n")
-    handle.close()
-    clean_up(temp, name)
-    log_manifest(manifest)
-    sys_exit("Return error code %i from command:\n%s" % (return_code, cmd),
-             return_code)
-handle.flush()
-
-if os.path.isfile("MIRA_assembly/MIRA_d_results/ec.log"):
-    handle.write("\n")
-    handle.write("====================== Extract Large Contigs failed ==========================\n")
-    e = open("MIRA_assembly/MIRA_d_results/ec.log", "rU")
-    for line in e:
-        handle.write(line)
-    e.close()
-    handle.write("============================ (end of ec.log) =================================\n")
-    handle.flush()
-
-#print "Collecting output..."
-start_time = time.time()
-collect_output(temp, name, handle)
-collect_time = time.time() - start_time
-handle.write("MIRA took %0.2f hours; collecting output %0.2f minutes\n" % (run_time / 3600.0, collect_time / 60.0))
-print("MIRA took %0.2f hours; collecting output %0.2f minutes\n" % (run_time / 3600.0, collect_time / 60.0))
-
-if os.path.isfile("MIRA_assembly/MIRA_d_results/ec.log"):
-    #Treat as an error, but doing this AFTER collect_output
-    sys.stderr.write("Extract Large Contigs failed\n")
-    handle.write("Extract Large Contigs failed\n")
-    handle.close()
-    sys.exit(1)
-
-#print "Cleaning up..."
-clean_up(temp, name)
-
-handle.write("\nDone\n")
-handle.close()
-print("Done")
--- a/tools/mira4_assembler/mira4_bait.py	Thu May 21 11:23:43 2015 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,114 +0,0 @@
-#!/usr/bin/env python
-"""A simple wrapper script to call MIRA4's mirabait and collect its output.
-"""
-import os
-import sys
-import subprocess
-import shutil
-import time
-
-WRAPPER_VER = "0.0.5" #Keep in sync with the XML file
-
-def sys_exit(msg, err=1):
-    sys.stderr.write(msg+"\n")
-    sys.exit(err)
-
-
-def get_version(mira_binary):
-    """Run MIRA to find its version number"""
-    # At the commend line I would use: mira -v | head -n 1
-    # however there is some pipe error when doing that here.
-    cmd = [mira_binary, "-v"]
-    try:
-        child = subprocess.Popen(cmd,
-                                 stdout=subprocess.PIPE,
-                                 stderr=subprocess.STDOUT)
-    except Exception, err:
-        sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
-        sys.exit(1)
-    ver, tmp = child.communicate()
-    del child
-    #Workaround for -v not working in mirabait 4.0RC4
-    if "invalid option" in ver.split("\n", 1)[0]:
-        for line in ver.split("\n", 1):
-            if " version " in line:
-                line = line.split()
-                return line[line.index("version")+1].rstrip(")")
-        sys_exit("Could not determine MIRA version:\n%s" % ver)
-    return ver.split("\n", 1)[0]
-
-try:
-    mira_path = os.environ["MIRA4"]
-except KeyError:
-    sys_exit("Environment variable $MIRA4 not set")
-mira_binary = os.path.join(mira_path, "mirabait")
-if not os.path.isfile(mira_binary):
-    sys_exit("Missing mirabait under $MIRA4, %r\nFolder contained: %s"
-             % (mira_binary, ", ".join(os.listdir(mira_path))))
-mira_ver = get_version(mira_binary)
-if not mira_ver.strip().startswith("4.0"):
-    sys_exit("This wrapper is for MIRA V4.0, not:\n%s" % mira_ver)
-if "-v" in sys.argv or "--version" in sys.argv:
-    print "%s, MIRA wrapper version %s" % (mira_ver, WRAPPER_VER)
-    sys.exit(0)
-
-
-format, output_choice, strand_choice, kmer_length, min_occurance, bait_file, in_file, out_file = sys.argv[1:]
-
-if format.startswith("fastq"):
-    format = "fastq"
-elif format == "mira":
-    format = "maf"
-elif format != "fasta":
-    sys_exit("Was not expected format %r" % format)
-
-assert out_file.endswith(".dat")
-out_file_stem = out_file[:-4]
-
-cmd_list = [mira_binary, "-f", format, "-t", format,
-            "-k", kmer_length, "-n", min_occurance,
-            bait_file, in_file, out_file_stem]
-if output_choice == "pos":
-    pass
-elif output_choice == "neg":
-    #Invert the selection...
-    cmd_list.insert(1, "-i")
-else:
-    sys_exit("Output choice should be 'pos' or 'neg', not %r" % output_choice)
-if strand_choice == "both":
-    pass
-elif strand_choice == "fwd":
-    #Ingore reverse strand...
-    cmd_list.insert(1, "-r")
-else:
-    sys_exit("Strand choice should be 'both' or 'fwd', not %r" % strand_choice)
-
-cmd = " ".join(cmd_list)
-#print cmd
-start_time = time.time()
-try:
-    #Run MIRA
-    child = subprocess.Popen(cmd_list,
-                             stdout=subprocess.PIPE,
-                             stderr=subprocess.STDOUT)
-except Exception, err:
-    sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
-    sys.exit(1)
-#Use .communicate as can get deadlocks with .wait(),
-stdout, stderr = child.communicate()
-assert stderr is None # Due to way we ran with subprocess
-run_time = time.time() - start_time
-return_code = child.returncode
-print "mirabait took %0.2f minutes" % (run_time / 60.0)
-
-if return_code:
-    sys.stderr.write(stdout)
-    sys_exit("Return error code %i from command:\n%s" % (return_code, cmd),
-             return_code)
-
-#Capture output
-out_tmp = out_file_stem + "." + format
-if not os.path.isfile(out_tmp):
-    sys.stderr.write(stdout)
-    sys_exit("Missing output file from mirabait: %s" % out_tmp)
-shutil.move(out_tmp, out_file)
--- a/tools/mira4_assembler/mira4_convert.py	Thu May 21 11:23:43 2015 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,226 +0,0 @@
-#!/usr/bin/env python
-"""A simple wrapper script to call MIRA and collect its output.
-
-This focuses on the miraconvert binary.
-"""
-import os
-import sys
-import subprocess
-import shutil
-import time
-import tempfile
-from optparse import OptionParser
-try:
-    from io import BytesIO
-except ImportError:
-    #Should we worry about Python 2.5 or older?
-    from StringIO import StringIO as BytesIO
-
-#Do we need any PYTHONPATH magic?
-from mira4_make_bam import depad
-
-WRAPPER_VER = "0.0.7"  # Keep in sync with the XML file
-
-def sys_exit(msg, err=1):
-    sys.stderr.write(msg+"\n")
-    sys.exit(err)
-
-def run(cmd):
-    #Avoid using shell=True when we call subprocess to ensure if the Python
-    #script is killed, so too is the child process.
-    try:
-        child = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
-    except Exception, err:
-        sys_exit("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
-    #Use .communicate as can get deadlocks with .wait(),
-    stdout, stderr = child.communicate()
-    return_code = child.returncode
-    if return_code:
-        cmd_str = " ".join(cmd)  # doesn't quote spaces etc
-        if stderr and stdout:
-            sys_exit("Return code %i from command:\n%s\n\n%s\n\n%s" % (return_code, cmd_str, stdout, stderr))
-        else:
-            sys_exit("Return code %i from command:\n%s\n%s" % (return_code, cmd_str, stderr))
-
-def get_version(mira_binary):
-    """Run MIRA to find its version number"""
-    # At the commend line I would use: mira -v | head -n 1
-    # however there is some pipe error when doing that here.
-    cmd = [mira_binary, "-v"]
-    try:
-        child = subprocess.Popen(cmd,
-                                 stdout=subprocess.PIPE,
-                                 stderr=subprocess.STDOUT)
-    except Exception, err:
-        sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
-        sys.exit(1)
-    ver, tmp = child.communicate()
-    del child
-    return ver.split("\n", 1)[0].strip()
-
-#Parse Command Line
-usage = """Galaxy MIRA4 wrapper script v%s - use as follows:
-
-$ python mira4_convert.py ...
-
-This will run the MIRA miraconvert binary and collect its output files as directed.
-""" % WRAPPER_VER
-parser = OptionParser(usage=usage)
-parser.add_option("--input", dest="input",
-                  default=None, metavar="FILE",
-                  help="MIRA input filename")
-parser.add_option("-x", "--min_length", dest="min_length",
-                  default="0",
-                  help="Minimum contig length")
-parser.add_option("-y", "--min_cover", dest="min_cover",
-                  default="0",
-                  help="Minimum average contig coverage")
-parser.add_option("-z", "--min_reads", dest="min_reads",
-                  default="0",
-                  help="Minimum reads per contig")
-parser.add_option("--maf", dest="maf",
-                  default="", metavar="FILE",
-                  help="MIRA MAF output filename")
-parser.add_option("--ace", dest="ace",
-                  default="", metavar="FILE",
-                  help="ACE output filename")
-parser.add_option("--bam", dest="bam",
-                  default="", metavar="FILE",
-                  help="Unpadded BAM output filename")
-parser.add_option("--fasta", dest="fasta",
-                  default="", metavar="FILE",
-                  help="Unpadded FASTA output filename")
-parser.add_option("--cstats", dest="cstats",
-                  default="", metavar="FILE",
-                  help="Contig statistics filename")
-parser.add_option("-v", "--version", dest="version",
-                  default=False, action="store_true",
-                  help="Show version and quit")
-options, args = parser.parse_args()
-if args:
-    sys_exit("Expected options (e.g. --input example.maf), not arguments")
-
-input_maf = options.input
-out_maf = options.maf
-out_bam = options.bam
-out_fasta = options.fasta
-out_ace = options.ace
-out_cstats = options.cstats
-
-try:
-    mira_path = os.environ["MIRA4"]
-except KeyError:
-    sys_exit("Environment variable $MIRA4 not set")
-mira_convert = os.path.join(mira_path, "miraconvert")
-if not os.path.isfile(mira_convert):
-    sys_exit("Missing miraconvert under $MIRA4, %r\nFolder contained: %s"
-             % (mira_convert, ", ".join(os.listdir(mira_path))))
-
-mira_convert_ver = get_version(mira_convert)
-if not mira_convert_ver.strip().startswith("4.0"):
-    sys_exit("This wrapper is for MIRA V4.0, not:\n%s\n%s" % (mira_convert_ver, mira_convert))
-if options.version:
-    print("%s, MIRA wrapper version %s" % (mira_convert_ver, WRAPPER_VER))
-    sys.exit(0)
-
-if not input_maf:
-    sys_exit("Input MIRA file is required")
-elif not os.path.isfile(input_maf):
-    sys_exit("Missing input MIRA file: %r" % input_maf)
-
-if not (out_maf or out_bam or out_fasta or out_ace or out_cstats):
-    sys_exit("No output requested")
-
-
-def check_min_int(value, name):
-    try:
-        i = int(value)
-    except:
-        sys_exit("Bad %s setting, %r" % (name, value))
-    if i < 0:
-        sys_exit("Negative %s setting, %r" % (name, value))
-    return i
-
-min_length = check_min_int(options.min_length, "minimum length")
-min_cover = check_min_int(options.min_cover, "minimum cover")
-min_reads = check_min_int(options.min_reads, "minimum reads")
-
-#TODO - Run MIRA in /tmp or a configurable directory?
-#Currently Galaxy puts us somewhere safe like:
-#/opt/galaxy-dist/database/job_working_directory/846/
-temp = "."
-
-
-cmd_list = [mira_convert]
-if min_length:
-    cmd_list.extend(["-x", str(min_length)])
-if min_cover:
-    cmd_list.extend(["-y", str(min_cover)])
-if min_reads:
-    cmd_list.extend(["-z", str(min_reads)])
-cmd_list.extend(["-f", "maf", input_maf, os.path.join(temp, "converted")])
-if out_maf:
-    cmd_list.append("maf")
-if out_bam:
-    cmd_list.append("samnbb")
-    if not out_fasta:
-        #Need this for samtools depad
-        out_fasta = os.path.join(temp, "depadded.fasta")
-if out_fasta:
-    cmd_list.append("fasta")
-if out_ace:
-    cmd_list.append("ace")
-if out_cstats:
-    cmd_list.append("cstats")
-run(cmd_list)
-
-def collect(old, new):
-    if not os.path.isfile(old):
-        sys_exit("Missing expected output file %s" % old)
-    shutil.move(old, new)
-
-if out_maf:
-    collect(os.path.join(temp, "converted.maf"), out_maf)
-if out_fasta:
-    #Can we look at the MAF file to see if there are multiple strains?
-    old = os.path.join(temp, "converted_AllStrains.unpadded.fasta")
-    if os.path.isfile(old):
-        collect(old, out_fasta)
-    else:
-        #Might the output be filtered down to zero contigs?
-        old = os.path.join(temp, "converted.fasta")
-        if not os.path.isfile(old):
-            sys_exit("Missing expected output FASTA file")
-        elif os.path.getsize(old) == 0:
-            print("Warning - no contigs (harsh filters?)")
-            collect(old, out_fasta)
-        else:
-            sys_exit("Missing expected output FASTA file (only generic file present)")
-if out_ace:
-    collect(os.path.join(temp, "converted.maf"), out_ace)
-if out_cstats:
-    collect(os.path.join(temp, "converted_info_contigstats.txt"), out_cstats)
-
-if out_bam:
-    assert os.path.isfile(out_fasta)
-    old = os.path.join(temp, "converted.samnbb")
-    if not os.path.isfile(old):
-        old = os.path.join(temp, "converted.sam")
-    if not os.path.isfile(old):
-        sys_exit("Missing expected intermediate file %s" % old)
-    h = BytesIO()
-    msg = depad(out_fasta, old, out_bam, h)
-    if msg:
-        print(msg)
-        print(h.getvalue())
-        h.close()
-        sys.exit(1)
-    h.close()
-    if out_fasta == os.path.join(temp, "depadded.fasta"):
-        #Not asked for by Galaxy, no longer needed
-        os.remove(out_fasta)
-
-if min_length or min_cover or min_reads:
-    print("Filtered.")
-else:
-    print("Converted.")
--- a/tools/mira4_assembler/mira4_de_novo.xml	Thu May 21 11:23:43 2015 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,275 +0,0 @@
-<tool id="mira_4_0_de_novo" name="MIRA v4.0 de novo assember" version="0.0.7">
-    <description>Takes Sanger, Roche 454, Solexa/Illumina, Ion Torrent and PacBio reads</description>
-    <requirements>
-        <requirement type="binary">mira</requirement>
-        <requirement type="binary">miraconvert</requirement>
-        <requirement type="package" version="4.0.2">MIRA</requirement>
-        <requirement type="binary">samtools</requirement>
-        <requirement type="package" version="0.1.19">samtools</requirement>
-    </requirements>
-    <code file="mira4_validator.py" />
-    <stdio>
-        <!-- Assume anything other than zero is an error -->
-        <exit_code range="1:" />
-        <exit_code range=":-1" />
-    </stdio>
-    <version_command interpreter="python">mira4.py --version</version_command>
-    <command interpreter="python">mira4.py
---manifest "$manifest"
-#if str($maf_wanted)=="true":
---maf "$out_maf"
-#end if
-#if str($bam_wanted)=="true":
---bam "$out_bam"
-#end if
---fasta "$out_fasta"
---log "$out_log"
-    </command>
-    <configfiles>
-        <configfile name="manifest">
-project = MIRA
-job = denovo,${job_type},${job_quality}
-parameters = -NW:cmrnl=no -DI:trt=/tmp -OUT:orc=no
-## -GE:not is short for -GENERAL:number_of_threads and using one (1)
-## can be useful for repeatability of assemblies and bug hunting.
-## This is overriden by the command line -t switch which is easier
-## to set from within Galaxy.
-##
-## -NW:cmrnl is short for -NAG_AND_WARN:check_maxreadnamelength
-## and without this MIRA aborts with read names over 40 characters
-## due to limitations of some downstream tools.
-##
-## -DI:trt is short for -DIRECTORY:tmp_redirected_to and should
-## point to a local hard drive (not something like NFS on network).
-## We replace /tmp with an environment variable via mira4.py
-##
-## -OUT:orc=no is short for -OUTPUT:output_result_caf=no 
-## which turns off an output file we don't want anyway.
-
-#for $rg in $read_group
-
-##This bar goes into the manifest as a comment line
-#------------------------------------------------------------------------------
-
-readgroup
-technology = ${rg.technology}
-##Record the segment placement (if any)
-#if str($rg.segments.type) == "paired"
-segment_placement = ${rg.segments.placement}
-segment_naming = ${rg.segments.naming}
-#if str($rg.segments.min_size) != "" or str($rg.segments.max_size) != ""
-##If our min/max validation failed I trust MIRA to give an error message...
-template_size = $rg.segments.min_size $rg.segments.max_size
-#end if
-#end if
-##if str($rg.segments.type) == "none"
-##MIRA4 manual says use segment_placement = unknown or ? for unpaired data
-##but this stopped working in MIRA 4.0 RC5 and 4.0 (final). See:
-##http://www.freelists.org/post/mira_talk/Unpaired-reads-and-segment-placement--or-unknown
-##segment_placement = ?
-##end if
-##MIRA will accept multiple filenames on one data line, or multiple data lines
-#for $f in $rg.filenames
-##Must now map Galaxy datatypes to MIRA file types...
-#if $f.ext.startswith("fastq")
-##MIRA doesn't like fastqsanger etc, just plain old fastq:
-data = fastq::$f
-#elif $f.ext == "mira"
-##We're calling *.maf the "mira" format in Galaxy (name space collision)
-data = maf::$f
-#else
-##MIRA is happy with fasta as name,
-data = ${f.ext}::$f
-#end if
-#end for
-#end for
-        </configfile>
-    </configfiles>
-    <inputs>
-        <param name="job_type" type="select" label="Assembly type">
-            <option value="genome">Genome</option>
-            <option value="est">EST (transcriptome)</option>
-        </param>
-        <param name="job_quality" type="select" label="Assembly quality grade">
-            <option value="accurate">Accurate</option>
-            <option value="draft">Draft</option>
-        </param>
-        <repeat name="read_group" title="Read Group" min="1">
-            <param name="technology" type="select" label="Read technology">
-                <option value="solexa">Solexa/Illumina</option>
-                <option value="sanger">Sanger cappillary sequencing</option>
-                <option value="454">Roche 454</option>
-                <option value="iontor">Ion Torrent</option>
-                <option value="pcbiolq">PacBio low quality (raw)</option>
-                <option value="pcbiohq">PacBio high quality (corrected)</option>
-                <option value="text">Synthetic reads (database entries, consensus sequences, artifical reads, etc)</option>
-                <!-- TODO reference/backbone as an entry here? -->
-            </param>
-            <conditional name="segments">
-                <param name="type" type="select" label="Are these paired reads?">
-                    <option value="paired">Paired reads</option>
-                    <option value="none">Single reads or not relevant (e.g. primer walking with Sanger capillary sequencing)</option>
-                </param>
-                <when value="paired">
-                    <param name="placement" type="select" label="Pairing type (segment placing)">
-                        <option value="FR">---&gt; &lt;--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option>
-                        <option value="RF">&lt;--- ---&gt; (e.g. Solexa/Illumina mate-pair library)</option>
-                        <option value="SB">2---&gt; 1---&gt; (e.g. Roche 454 paired-end libraries or IonTorrent long-mate; see note)</option>
-                    </param>
-                    <!-- min/max validation is done via the <code> tag -->
-                    <param name="min_size" type="integer" optional="true" min="0" value=""
-                           label="Minimum size of 'good' DNA templates in the library preparation"
-                           help="Optional, but if used you must also supply a maximum value." />
-                    <param name="max_size" type="integer" optional="true" min="0" value=""
-                           label="Maximum size of 'good' DNA templates in the library preparation"
-                           help="Optional, but if used you must also supply a minimum value." />
-                    <param name="naming" type="select" label="Pair naming convention">
-                        <option value="solexa">Solexa/Illumina (using '/1' and '/2' suffixes, or later Illumina colon system)</option>
-                        <option value="FR">Forward/Reverse scheme (using '.f*' and '.r*' suffixes)</option>
-                        <option value="tigr">TIGR scheme (using 'TF*' and 'TR*' suffixes)</option>
-                        <option value="sanger">Sanger scheme (see notes)</option>
-                        <option value="stlouis">St. Louis scheme (see notes)</option>
-                    </param>
-                </when>
-                <when value="none" /><!-- no further questions -->
-            </conditional>
-            <param name="filenames" type="data" format="fastq,mira" multiple="true" required="true" label="Read file(s)"
-                  help="Multiple files allowed, for example paired reads can be given as two files (MIRA looks at read names to identify pairs)." />
-        </repeat>
-        <param name="maf_wanted" type="boolean" label="Output assembly in MIRA's own format?" checked="False" />
-        <param name="bam_wanted" type="boolean" label="Convert assembly into BAM format?" checked="True" />
-    </inputs>
-    <outputs>
-        <data name="out_fasta" format="fasta" label="MIRA de novo contigs (FASTA)" />
-        <data name="out_bam" format="bam" label="MIRA de novo assembly (BAM)">
-            <filter>bam_wanted is True</filter>
-        </data>
-        <data name="out_maf" format="mira" label="MIRA de novo assembly">
-            <filter>maf_wanted is True</filter>
-        </data>
-        <!-- TODO?                                                                                                                          
-        <data name="out_contigstats" format="tabular" label="MIRA contig stats" />                                                          
-        -->
-        <data name="out_log" format="txt" label="MIRA de novo log" />
-    </outputs>
-    <tests>
-        <!-- Tiger mitochondria, selected paired end Illumina reads from SRR639755
-             Note we're using just one repeat group, and only the filenames parameter
-             within it, so this should work with current test framework limitations:
-             TODO: Revise example and/or -NW:cac=warn and -NW:acv=80 settings
-             MIRA 4.0 complains as coverage is about x93 which is over 80 limit.
-             Also MIRA 4.0 gives three contigs as output.
-        <test>
-            <param name="job_type" value="genome" />
-            <param name="job_quality" value="accurate" />
-            <param name="filenames" value="SRR639755_mito_pairs.fastq.gz" ftype="fastqsanger" />
-            <output name="out_fasta" file="SRR639755_mito_pairs.mira4_de_novo.fasta" ftype="fasta" />
-        </test>
-        -->
-        <!-- Simple assembly based on MIRA's minidemo/demo4 example
-             Note we're using just one repeat group,
-             but several parameters with the repeat
-        -->
-        <test>
-            <param name="job_type" value="genome" />
-            <param name="job_quality" value="accurate" />
-            <param name="technology" value="sanger" />
-            <param name="type" value="none" />
-            <param name="filenames" value="U13small_m.fastq" ftype="fastqsanger" />
-            <param name="maf_wanted" value="true"/>
-            <param name="bam_wanted" value="true"/>
-            <output name="out_fasta" file="U13small_m.mira4_de_novo.fasta" ftype="fasta" />
-            <output name="out_bam" file="empty_file.dat" compare="contains" />
-            <!-- TODO: Suggest startswith as a compare method? -->
-            <output name="out_maf" file="header.mira" compare="contains" />
-            <output name="out_log" file="empty_file.dat" compare="contains" />
-        </test>
-        <!-- Simple assembly based on MIRA's minidemo/solexa1 example
-             Note we're using just one repeat group,
-             but two parameters within the repeat (filename, no pairing)
-         -->
-        <test>
-            <param name="job_type" value="genome" />
-            <param name="job_quality" value="accurate" />
-            <param name="type" value="none" />
-            <param name="filenames" value="ecoli.fastq" ftype="fastqsanger" />
-            <param name="maf_wanted" value="false"/>
-            <param name="bam_wanted" value="false"/>
-            <output name="out_fasta" file="ecoli.mira4_de_novo.fasta" ftype="fasta" />
-            <output name="out_log" file="empty_file.dat" compare="contains" />
-        </test>
-    </tests>
-    <help>
-
-**What it does**
-
-Runs MIRA v4.0 in de novo mode, collects the output, generates a sorted BAM
-file, and then throws away all the temporary files.
-
-MIRA is an open source assembly tool capable of handling sequence data from
-a range of platforms (Sanger capillary, Solexa/Illumina, Roche 454, Ion Torrent
-and also PacBio).
-
-It is particularly suited to small genomes such as bacteria.
-
-
-**Notes on paired reads**
-
-.. class:: warningmark
-
-MIRA uses read naming conventions to identify paired read partners
-(and does not care about their order in the input files). In most cases,
-the Solexa/Illumina setting is fine. For Sanger capillary sequencing,
-you may need to rename your reads to match one of the standard conventions
-supported by MIRA. For Roche 454 or Ion Torrent the appropriate settings
-depend on how the FASTQ file was produced:
-
-* If using Roche's ``sffinfo`` or older versions of ``sff_extract``
-  to convert SFF files to FASTQ, your reads will probably have the
-  ``---&gt; &lt;---`` orientation and use the ``.f`` and ``.r``
-  suffixes (FR naming).
-
-* If using a recent version of ``sff_extract``, then the ``/1`` and ``/2``
-  suffixes are used (Solexa/Illumina style naming) and the original
-  ``2---&gt; 1---&gt;`` orientation is preserved.
-
-The reason for this is the raw data for Roche 454 and Ion Torrent paired-end
-libraries sequences a circularised fragment such that the raw data begins
-with the end of the fragment, a linker, then the start of the fragment.
-This means both the start and end are sequenced from the same strand, and
-have the orientation ``2---&gt; 1---&gt;``. However, in order to use the data
-with traditional tools expecting Sanger capillary style ``---&gt; &lt;---``
-orientation it was common to reverse complement one of the pair to mimic this.
-
-
-**Citation**
-
-If you use this Galaxy tool in work leading to a scientific publication please
-cite the following papers:
-
-Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013).
-Galaxy tools and workflows for sequence analysis with applications
-in molecular plant pathology. PeerJ 1:e167
-http://dx.doi.org/10.7717/peerj.167
-
-Bastien Chevreux, Thomas Wetter and Sándor Suhai (1999).
-Genome Sequence Assembly Using Trace Signals and Additional Sequence Information.
-Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56.
-http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html
-
-This wrapper is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler
-    </help>
-    <citations>
-        <citation type="doi">10.7717/peerj.167</citation>
-        <citation type="bibtex">@ARTICLE{Chevreux1999-mira3,
-        author = {B. Chevreux and T. Wetter and S. Suhai},
-        year = {1999},
-        title = {Genome Sequence Assembly Using Trace Signals and Additional Sequence Information},
-        journal = {Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB)}
-        volume = {99},
-        pages = {45-56},
-        url = {http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html}
-        }</citation>
-    </citations>
-</tool>
--- a/tools/mira4_assembler/mira4_make_bam.py	Thu May 21 11:23:43 2015 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,92 +0,0 @@
-#!/usr/bin/env python
-"""Wrapper script using miraconvert & samtools to get BAM from MIRA.
-"""
-import os
-import sys
-import shutil
-import subprocess
-import tempfile
-
-def sys_exit(msg, err=1):
-    sys.stderr.write(msg+"\n")
-    sys.exit(err)
-
-def run(cmd, log_handle):
-    try:
-        child = subprocess.Popen(cmd, shell=True,
-                                 stdout=subprocess.PIPE,
-                                 stderr=subprocess.STDOUT)
-    except Exception, err:
-        sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
-        #TODO - call clean up?
-        log_handle.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
-        sys.exit(1)
-    #Use .communicate as can get deadlocks with .wait(),
-    stdout, stderr = child.communicate()
-    assert not stderr #Should be empty as sent to stdout
-    if len(stdout) > 10000:
-        #miraconvert can be very verbose (is holding stdout in RAM a problem?)
-        stdout = stdout.split("\n")
-        stdout = stdout[:10] + ["...", "<snip>", "..."] + stdout[-10:]
-        stdout = "\n".join(stdout)
-    log_handle.write(stdout)
-    return child.returncode
-
-def depad(fasta_file, sam_file, bam_file, log_handle):
-    log_handle.write("\n================= Converting MIRA assembly from SAM to BAM ===================\n")
-    #Also doing SAM to (uncompressed) BAM during depad
-    bam_stem = bam_file + ".tmp" # Have write permissions and want final file in this folder
-    cmd = 'samtools depad -S -u -T "%s" "%s" | samtools sort - "%s"' % (fasta_file, sam_file, bam_stem)
-    return_code = run(cmd, log_handle)
-    if return_code:
-        return "Error %i from command:\n%s" % (return_code, cmd)
-    if not os.path.isfile(bam_stem + ".bam"):
-        return "samtools depad or sort failed to produce BAM file"
-
-    log_handle.write("\n====================== Indexing MIRA assembly BAM file =======================\n")
-    cmd = 'samtools index "%s.bam"' % bam_stem
-    return_code = run(cmd, log_handle)
-    if return_code:
-        return "Error %i from command:\n%s" % (return_code, cmd)
-    if not os.path.isfile(bam_stem + ".bam.bai"):
-        return "samtools indexing of BAM file failed to produce BAI file"
-
-    shutil.move(bam_stem + ".bam", bam_file)
-    os.remove(bam_stem + ".bam.bai") #Let Galaxy handle that...
-
-
-def make_bam(mira_convert, maf_file, fasta_file, bam_file, log_handle):
-    if not os.path.isfile(mira_convert):
-        return "Missing binary %r" % mira_convert
-    if not os.path.isfile(maf_file):
-        return "Missing input MIRA file: %r" % maf_file
-    if not os.path.isfile(fasta_file):
-        return "Missing padded FASTA file: %r" % fasta_file
-
-    log_handle.write("\n====================== Converting MIRA assembly to SAM =======================\n")
-    tmp_dir = tempfile.mkdtemp()
-    sam_file = os.path.join(tmp_dir, "x.sam")
-
-    # Note add nbb to the template name, possible MIRA 4.0 RC4 bug
-    cmd = '"%s" -f maf -t samnbb "%s" "%snbb"' % (mira_convert, maf_file, sam_file)
-    return_code = run(cmd, log_handle)
-    if return_code:
-        return "Error %i from command:\n%s" % (return_code, cmd)
-    if not os.path.isfile(sam_file):
-        return "Conversion from MIRA to SAM failed"
-
-    #Also doing SAM to (uncompressed) BAM during depad
-    msg = depad(fasta_file, sam_file, bam_file, log_handle)
-    if msg:
-        return msg
-
-    os.remove(sam_file)
-    os.rmdir(tmp_dir)
-
-    return None #Good :)
-
-if __name__ == "__main__":
-    mira_convert, maf_file, fasta_file, bam_file = sys.argv[1:]
-    msg = make_bam(mira_convert, maf_file, fasta_file, bam_file, sys.stdout)
-    if msg:
-        sys_exit(msg)
--- a/tools/mira4_assembler/mira4_mapping.xml	Thu May 21 11:23:43 2015 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,279 +0,0 @@
-<tool id="mira_4_0_mapping" name="MIRA v4.0 mapping" version="0.0.7">
-    <description>Maps Sanger, Roche 454, Solexa/Illumina, Ion Torrent and PacBio reads</description>
-    <requirements>
-        <requirement type="binary">mira</requirement>
-        <requirement type="binary">miraconvert</requirement>
-        <requirement type="package" version="4.0.2">MIRA</requirement>
-        <requirement type="binary">samtools</requirement>
-        <requirement type="package" version="0.1.19">samtools</requirement>
-    </requirements>
-    <stdio>
-        <!-- Assume anything other than zero is an error -->
-        <exit_code range="1:" />
-        <exit_code range=":-1" />
-    </stdio>
-    <version_command interpreter="python">mira4.py --version</version_command>
-    <command interpreter="python">mira4.py
---manifest "$manifest"
-#if str($maf_wanted) == "true":
---maf "$out_maf"
-#end if
-#if str($bam_wanted) == "true":
---bam "$out_bam"
-#end if
---fasta "$out_fasta"
---log "$out_log"
-    </command>
-    <configfiles>
-        <configfile name="manifest">
-project = MIRA
-job = mapping,${job_type},${job_quality}
-parameters = -NW:cmrnl=no -DI:trt=/tmp -OUT:orc=no
-## -GE:not is short for -GENERAL:number_of_threads and using one (1)
-## can be useful for repeatability of assemblies and bug hunting.
-## This is overriden by the command line -t switch which is easier
-## to set from within Galaxy.
-##
-## -NW:cmrnl is short for -NAG_AND_WARN:check_maxreadnamelength
-## and without this MIRA aborts with read names over 40 characters
-## due to limitations of some downstream tools.
-##
-## -DI:trt is short for -DIRECTORY:tmp_redirected_to and should
-## point to a local hard drive (not something like NFS on network).
-## We replace /tmp with an environment variable via mira4.py
-##
-## -OUT:orc=no is short for -OUTPUT:output_result_caf=no
-## which turns off an output file we don't want anyway.
-
-##This bar goes into the manifest as a comment line
-#------------------------------------------------------------------------------
-
-readgroup
-is_reference
-#if str($strain_setup)=="same"
-strain = StrainX
-#end if
-#for $f in $references
-##Must now map Galaxy datatypes to MIRA file types...
-#if $f.ext.startswith("fastq")
-##MIRA doesn't like fastqsanger etc, just plain old fastq:
-data = fastq::$f
-#elif $f.ext == "mira"
-##We're calling *.maf the "mira" format in Galaxy (name space collision)
-data = maf::$f
-#elif $f.ext == "fasta"
-##We're calling MIRA with the file type as "fna" as otherwise it wants quals
-data = fna::$f
-#else
-##Currently don't expect anything else...
-data = ${f.ext}::$f
-#end if
-#end for
-#for $rg in $read_group
-
-##This bar goes into the manifest as a comment line
-#------------------------------------------------------------------------------
-
-readgroup
-technology = ${rg.technology}
-#if str($strain_setup)=="same"
-##This is perhaps redundant as MIRA defaults to StrainX for the reads:
-strain = StrainX
-#end if
-##Record the segment placement (if any)
-#if str($rg.segments.type) == "paired"
-segment_placement = ${rg.segments.placement}
-segment_naming = ${rg.segments.naming}
-#end if
-##if str($rg.segments.type) == "none"
-##MIRA4 manual says use segment_placement = unknown or ? for unpaired data
-##but this stopped working in MIRA 4.0 RC5 and 4.0 (final). See:
-##http://www.freelists.org/post/mira_talk/Unpaired-reads-and-segment-placement--or-unknown
-##segment_placement = ?
-##end if
-##MIRA will accept multiple filenames on one data line, or multiple data lines
-#for $f in $rg.filenames
-##Must now map Galaxy datatypes to MIRA file types...
-#if $f.ext.startswith("fastq")
-##MIRA doesn't like fastqsanger etc, just plain old fastq:
-data = fastq::$f
-#elif $f.ext == "mira"
-##We're calling *.maf the "mira" format in Galaxy (name space collision)
-data = maf::$f
-#else
-##Currently don't expect anything else...
-data = ${f.ext}::$f
-#end if
-#end for
-#end for
-        </configfile>
-    </configfiles>
-    <inputs>
-        <param name="job_type" type="select" label="Assembly type">
-            <option value="genome">Genome</option>
-            <option value="est">EST (transcriptome)</option>
-        </param>
-        <param name="job_quality" type="select" label="Assembly quality grade">
-            <option value="accurate">Accurate</option>
-            <option value="draft">Draft</option>
-        </param>
-        <!-- TODO? Allow technology type for references? -->
-        <!-- TODO? Allow strain settings for reference(s) and reads? -->
-        <!-- TODO? Use a repeat to allow for multi-strain references? -->
-        <!-- TODO? Add strain to the mapping read groups? -->
-        <param name="references" type="data" format="fasta,fastq,mira" multiple="true" required="true" label="Backbone reference file(s)"
-               help="Multiple files allowed, for example one FASTA file per chromosome or plasmid." />
-        <param name="strain_setup" type="select" label="Strain configuration (reference vs reads)">
-            <option value="default">Different strains - mapping reads onto a related reference ('StrainX' vs 'ReferenceStrain')</option>
-            <option value="same">Same strain - mapping reads from same reference (all 'StrainX')</option>
-        </param>
-        <repeat name="read_group" title="Read Group" min="1">
-            <param name="technology" type="select" label="Read technology">
-                <option value="solexa">Solexa/Illumina</option>
-                <option value="sanger">Sanger cappillary sequencing</option>
-                <option value="454">Roche 454</option>
-                <option value="iontor">Ion Torrent</option>
-                <option value="pcbiolq">PacBio low quality (raw)</option>
-                <option value="pcbiohq">PacBio high quality (corrected)</option>
-                <option value="text">Synthetic reads (database entries, consensus sequences, artifical reads, etc)</option>
-            </param>
-            <conditional name="segments">
-                <param name="type" type="select" label="Are these paired reads?">
-                    <option value="paired">Paired reads</option>
-                    <option value="none">Single reads or not relevant (e.g. primer walking with Sanger capillary sequencing)</option>
-                </param>
-                <when value="paired">
-                    <param name="placement" type="select" label="Pairing type (segment placing)">
-                        <option value="FR">---&gt; &lt;--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option>
-                        <option value="RF">&lt;--- ---&gt; (e.g. Solexa/Illumina mate-pair library)</option>
-                        <option value="SB">2---&gt; 1---&gt; (e.g. Roche 454 paired-end libraries or IonTorrent long-mate; see note)</option>
-                    </param>
-                    <param name="naming" type="select" label="Pair naming convention">
-                        <option value="solexa">Solexa/Illumina (using '/1' and '/2' suffixes, or later Illumina colon system)</option>
-                        <option value="FR">Forward/Reverse scheme (using '.f*' and '.r*' suffixes)</option>
-                        <option value="tigr">TIGR scheme (using 'TF*' and 'TR*' suffixes)</option>
-                        <option value="sanger">Sanger scheme (see notes)</option>
-                        <option value="stlouis">St. Louis scheme (see notes)</option>
-                    </param>
-                </when>
-                <when value="none" /><!-- no further questions -->
-            </conditional>
-            <param name="filenames" type="data" format="fastq,mira" multiple="true" required="true" label="Read file(s)"
-                   help="Multiple files allowed, for example paired reads can be given as two files (MIRA looks at read names to identify pairs)." />
-        </repeat>
-        <param name="maf_wanted" type="boolean" label="Output mapping in MIRA's own format?" checked="False" />
-        <param name="bam_wanted" type="boolean" label="Convert mapping into BAM format?" checked="True" />
-    </inputs>
-    <outputs>
-        <data name="out_fasta" format="fasta" label="MIRA #if str($strain_setup)=='same' then 'same strain' else 'reference' # mapping contigs (FASTA)" />
-        <data name="out_bam" format="bam" label="MIRA #if str($strain_setup)=='same' then 'same strain' else 'reference' # mapping assembly (BAM)">
-            <filter>bam_wanted is True</filter>
-        </data>
-        <data name="out_maf" format="mira" label="MIRA #if str($strain_setup)=='same' then 'same strain' else 'reference' # mapping assembly">
-            <filter>maf_wanted is True</filter>
-        </data>
-        <data name="out_log" format="txt" label="MIRA #if str($strain_setup)=='same' then 'same strain' else 'reference' # mapping log" />
-    </outputs>
-    <tests>
-        <test>
-            <param name="job_type" value="genome" />
-            <param name="job_quality" value="accurate" />
-            <param name="references" value="tvc_contigs.fasta" ftype="fasta" />
-            <param name="strain_setup" value="default" />
-            <param name="type" value="none" />
-            <param name="filenames" value="tvc_mini.fastq" ftype="fastqsanger" />
-            <param name="maf_wanted" value="true"/>
-            <param name="bam_wanted" value="true"/>
-            <output name="out_fasta" file="tvc_map_ref_strain.fasta" ftype="fasta" />
-            <output name="out_bam" file="empty_file.dat" compare="contains" />
-            <!-- TODO: Suggest startswith as a compare method? -->
-            <output name="out_maf" file="header.mira" compare="contains" />
-            <output name="out_log" file="empty_file.dat" compare="contains" />
-        </test>
-        <test>
-            <param name="job_type" value="genome" />
-            <param name="job_quality" value="accurate" />
-            <param name="references" value="tvc_contigs.fasta" ftype="fasta" />
-            <param name="strain_setup" value="same" />
-            <param name="type" value="none" />
-            <param name="filenames" value="tvc_mini.fastq" ftype="fastqsanger" />
-            <param name="maf_wanted" value="false"/>
-            <param name="bam_wanted" value="false"/>
-            <output name="out_fasta" file="tvc_map_same_strain.fasta" ftype="fasta" />
-            <output name="out_log" file="empty_file.dat" compare="contains" />
-        </test>
-    </tests>
-    <help>
-
-**What it does**
-
-Runs MIRA v4.0 in mapping mode, collects the output, generates a sorted BAM
-file, and throws away all the temporary files.
-
-MIRA is an open source assembly tool capable of handling sequence data from
-a range of platforms (Sanger capillary, Solexa/Illumina, Roche 454, Ion Torrent
-and also PacBio).
-
-It is particularly suited to small genomes such as bacteria.
-
-
-**Notes on paired reads**
-
-.. class:: warningmark
-
-MIRA uses read naming conventions to identify paired read partners
-(and does not care about their order in the input files). In most cases,
-the Solexa/Illumina setting is fine. For Sanger capillary sequencing,
-you may need to rename your reads to match one of the standard conventions
-supported by MIRA. For Roche 454 or Ion Torrent the appropriate settings
-depend on how the FASTQ file was produced:
-
-* If using Roche's ``sffinfo`` or older versions of ``sff_extract``
-  to convert SFF files to FASTQ, your reads will probably have the
-  ``---&gt; &lt;---`` orientation and use the ``.f`` and ``.r``
-  suffixes (FR naming).
-
-* If using a recent version of ``sff_extract``, then the ``/1`` and ``/2``
-  suffixes are used (Solexa/Illumina style naming) and the original
-  ``2---&gt; 1---&gt;`` orientation is preserved.
-
-The reason for this is the raw data for Roche 454 and Ion Torrent paired-end
-libraries sequences a circularised fragment such that the raw data begins
-with the end of the fragment, a linker, then the start of the fragment.
-This means both the start and end are sequenced from the same strand, and
-have the orientation ``2---&gt; 1---&gt;``. However, in order to use the data
-with traditional tools expecting Sanger capillary style ``---&gt; &lt;---``
-orientation it was common to reverse complement one of the pair to mimic this.
-
-
-**Citation**
-
-If you use this Galaxy tool in work leading to a scientific publication please
-cite the following papers:
-
-Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013).
-Galaxy tools and workflows for sequence analysis with applications
-in molecular plant pathology. PeerJ 1:e167
-http://dx.doi.org/10.7717/peerj.167
-
-Bastien Chevreux, Thomas Wetter and Sándor Suhai (1999).
-Genome Sequence Assembly Using Trace Signals and Additional Sequence Information.
-Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56.
-http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html
-
-This wrapper is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler
-    </help>
-    <citations>
-        <citation type="doi">10.7717/peerj.167</citation>
-        <citation type="bibtex">@ARTICLE{Chevreux1999-mira3,
-        author = {B. Chevreux and T. Wetter and S. Suhai},
-        year = {1999},
-        title = {Genome Sequence Assembly Using Trace Signals and Additional Sequence Information},
-        journal = {Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB)}
-        volume = {99},
-        pages = {45-56},
-        url = {http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html}
-        }</citation>
-    </citations>
-</tool>
--- a/tools/mira4_assembler/mira4_validator.py	Thu May 21 11:23:43 2015 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,64 +0,0 @@
-#Called from the Galaxy Tool XML file
-#import sys
-
-def validate_input(trans, error_map, param_values, page_param_map):
-    """Validates the min_size/max_size user input, before execution."""
-    err_list = []
-    for read_group in param_values["read_group"]:
-        err = dict()
-        segments = read_group["segments"]
-        if str(segments["type"]) != "paired":
-            err_list.append(dict())
-            continue
-
-        min_size = str(segments["min_size"]).strip()
-        max_size = str(segments["max_size"]).strip()
-        #sys.stderr.write("DEBUG min_size=%r, max_size=%r\n" % (min_size, max_size))
-
-        #Somehow Galaxy seems to turn an empty field into string "None"...
-        if min_size=="None":
-            min_size = ""
-        if max_size=="None":
-            max_size = ""
-
-        if min_size=="" and max_size=="":
-            #Both missing is good
-            pass
-        elif min_size=="":
-            err["min_size"] = "Minimum size required if maximum size given"
-        elif max_size=="":
-            err["max_size"] = "Maximum size required if minimum size given"
-            
-        if min_size:
-            try:
-                min_size_int = int(min_size)
-                if min_size_int < 0:
-                    err["min_size"] = "Minumum size must not be negative (%i)" % min_size_int
-                    min_size = None # Avoid doing comparison below
-            except ValueError:
-                err["min_size"] = "Minimum size is not an integer (%s)" % min_size
-                min_size = None # Avoid doing comparison below
-
-        if max_size:
-            try:
-                max_size_int = int(max_size)
-                if max_size_int< 0:
-                    err["max_size"] = "Maximum size must not be negative (%i)" % max_size_int
-                    max_size = None # Avoid doing comparison below
-            except ValueError:
-                err["max_size"] = "Maximum size is not an integer (%s)" % max_size
-                max_size = None # Avoid doing comparison below
-
-        if min_size and max_size and min_size_int > max_size_int:
-            msg = "Minimum size must be less than maximum size (%i vs %i)" % (min_size_int, max_size_int)
-            err["min_size"] = msg
-            err["max_size"] = msg
-
-        if err:
-            err_list.append({"segments":err})
-        else:
-            err_list.append(dict())
-
-    if any(err_list):
-        #Return an error map only if any readgroup gave errors
-        error_map["read_group"] = err_list
--- a/tools/mira4_assembler/repository_dependencies.xml	Thu May 21 11:23:43 2015 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,4 +0,0 @@
-<?xml version="1.0"?>
-<repositories description="This requires the MIRA datatype definitions (e.g. the MIRA Assembly Format).">
-    <repository changeset_revision="29ff2a2b6eaa" name="mira_datatypes" owner="peterjc" toolshed="https://testtoolshed.g2.bx.psu.edu" />
-</repositories>
--- a/tools/mira4_assembler/tool_dependencies.xml	Thu May 21 11:23:43 2015 -0400
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,9 +0,0 @@
-<?xml version="1.0"?>
-<tool_dependency>
-    <package name="samtools" version="0.1.19">
-        <repository changeset_revision="0e56e4dac6e7" name="package_samtools_0_1_19" owner="iuc" toolshed="https://testtoolshed.g2.bx.psu.edu" />
-    </package>
-    <package name="MIRA" version="4.0.2">
-        <repository changeset_revision="4be5c2deb512" name="package_mira_4_0_2" owner="peterjc" toolshed="https://testtoolshed.g2.bx.psu.edu" />
-    </package>
-</tool_dependency>