# HG changeset patch
# User peterjc
# Date 1431510943 14400
# Node ID efeff76741b85421367b896a1f15f2181654764d
# Parent 4abe8d59a438648095ffe3687c7beb6338da2679
planemo upload for repository https://github.com/peterjc/pico_galaxy/tools/mira4_assembler/ commit ed415dd625233aa2c2880c14ee6ac4b73aa57a93-dirty
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4/README.rst
--- a/tools/mira4/README.rst Tue Oct 28 08:29:59 2014 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,165 +0,0 @@
-Galaxy wrapper for the MIRA assembly program (v4.0)
-===================================================
-
-This tool is copyright 2011-2014 by Peter Cock, The James Hutton Institute
-(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
-See the licence text below (MIT licence).
-
-This tool is a short Python script (to collect the MIRA output and move it
-to where Galaxy expects the files) and associated Galaxy wrapper XML file.
-
-It is available from the Galaxy Tool Shed at:
-http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler
-
-It uses a Galaxy datatype definition 'mira' for the MIRA Assembly Format,
-http://toolshed.g2.bx.psu.edu/view/peterjc/mira_datatypes
-
-A separate wrapper for MIRA v3.4 is available from the Galaxy Tool Shed at:
-http://toolshed.g2.bx.psu.edu/view/peterjc/mira_assembler
-
-Automated Installation
-======================
-
-This should be straightforward. Via the Tool Shed, Galaxy should automatically
-install the 'mira' datatype, samtools, and download and install the precompiled
-binary for MIRA v4.0.2 for the Galaxy wrapper, and run any tests.
-
-For MIRA 4, the Galaxy wrapper has been split in two, allowing separate
-cluster settings for de novo usage (high RAM) and mapping (lower RAM).
-Consult the Galaxy adminstration documentation for your cluster setup.
-
-WARNING: For larger tasks, be aware that MIRA can require vast amounts
-of RAM and run-times of over a week are possible. This tool wrapper makes
-no attempt to spot and reject such large jobs.
-
-
-Manual Installation
-===================
-
-First install the 'mira' datatype for Galaxy, available here:
-
-* http://toolshed.g2.bx.psu.edu/view/peterjc/mira_datatypes
-
-There are four Galaxy files to install:
-
-* ``mira4_de_novo.xml`` (the Galaxy tool definition for de novo usage)
-* ``mira4_mapping.xml`` (the Galaxy tool definition for mapping usage)
-* ``mira4_convert.xml`` (the Galaxy tool definition for converting MIRA files)
-* ``mira4_bait.xml`` (the Galaxy tool definition for mirabait)
-* ``mira4.py`` (the Python wrapper script)
-* ``mira4_convert.py`` (the Python wrapper script for miraconvert)
-* ``mira4_bait.py`` (the Python wrapper script for mirabait)
-* ``mira4_validator.py`` (the XML parameter validation script)
-
-The suggested location is a new ``tools/mira4`` folder. You will also need to
-modify the ``tools_conf.xml`` file to tell Galaxy to offer the tool, and also do
-this to ``tools_conf.xml.sample`` in order to run the tests::
-
-
-
-
-You will also need to install MIRA, we used version 4.0.2, and define the
-environment variable ``$MIRA4`` pointing at the folder containing the binaries.
-See:
-
-* http://chevreux.org/projects_mira.html
-* http://sourceforge.net/projects/mira-assembler/
-
-You may wish to use different cluster setups for the de novo and mapping
-tools, see above.
-
-You will also need to install samtools (for generating a BAM file from MIRA's
-SAM output).
-
-After copying (or symlinking) the ``test-data`` files under Galaxy's ``test-data``
-folder, you can run the tests with::
-
- $ ./run_functional_tests.sh -id mira_4_0_bait
- $ ./run_functional_tests.sh -id mira_4_0_de_novo
- $ ./run_functional_tests.sh -id mira_4_0_mapping
- $ ./run_functional_tests.sh -id mira_4_0_convert
-
-
-History
-=======
-
-======= ======================================================================
-Version Changes
-------- ----------------------------------------------------------------------
-v0.0.1 - Initial version (prototype for MIRA 4.0 RC4, based on wrapper for v3.4)
-v0.0.2 - Include BAM output (using ``miraconvert`` and ``samtools``).
- - Updated to target MIRA 4.0.1
- - Simplified XML to apply input format to output data.
- - Sets temporary folder at run time to respect environment variables
- (``$TMPDIR``, ``$TEMP``, or ``$TMP`` in that order). This was
- previously hard coded as ``/tmp``.
-v0.0.3 - Updated to target MIRA 4.0.2
-v0.0.4 - Using optparse for the Python wrapper script API
- - Made MAF and BAM outputs optional
- - Include wrapper for ``miraconvert``
-======= ======================================================================
-
-
-Developers
-==========
-
-Development is on a dedicated GitHub repository:
-https://github.com/peterjc/pico_galaxy/tree/master/tools/mira4
-
-For making the "Galaxy Tool Shed" http://toolshed.g2.bx.psu.edu/ tarball use
-the following command from the Galaxy root folder::
-
- $ tar -czf mira4_wrapper.tar.gz tools/mira4/README.rst tools/mira4/mira4_de_novo.xml tools/mira4/mira4_mapping.xml tools/mira4/mira4_bait.xml tools/mira4/mira4_convert.xml tools/mira4/mira4.py tools/mira4/mira4_make_bam.py tools/mira4/mira4_validator.py tools/mira4/mira4_convert.py tools/mira4/mira4_bait.py tools/mira4/tool_dependencies.xml tools/mira4/repository_dependencies.xml test-data/U13small_m.fastq test-data/U13small_m.mira4_de_novo.fasta test-data/tvc_mini.fastq test-data/tvc_contigs.fasta test-data/tvc_map_ref_strain.fasta test-data/tvc_map_same_strain.fasta test-data/tvc_bait.fasta test-data/tvc_mini_bait_pos.fastq test-data/tvc_mini_bait_strict.fastq test-data/tvc_mini_bait_neg.fastq test-data/ecoli.fastq test-data/ecoli.mira4_de_novo.fasta test-data/header.mira test-data/empty_file.dat
-
-Check this worked::
-
- $ tar -tzf mira4_wrapper.tar.gz
- tools/mira4/README.rst
- tools/mira4/mira4_de_novo.xml
- tools/mira4/mira4_mapping.xml
- tools/mira4/mira4_bait.xml
- tools/mira4/mira4_convert.xml
- tools/mira4/mira4.py
- tools/mira4/mira4_make_bam.py
- tools/mira4/mira4_validator.py
- tools/mira4/mira4_convert.py
- tools/mira4/mira4_bait.py
- tools/mira4/tool_dependencies.xml
- tools/mira4/repository_dependencies.xml
- test-data/U13small_m.fastq
- test-data/U13small_m.mira4_de_novo.fasta
- test-data/tvc_mini.fastq
- test-data/tvc_contigs.fasta
- test-data/tvc_map_ref_strain.fasta
- test-data/tvc_map_same_strain.fasta
- test-data/tvc_bait.fasta
- test-data/tvc_mini_bait_pos.fastq
- test-data/tvc_mini_bait_strict.fastq
- test-data/tvc_mini_bait_neg.fastq
- test-data/ecoli.fastq
- test-data/ecoli.mira4_de_novo.fasta
- test-data/header.mira
- test-data/empty_file.dat
-
-
-
-Licence (MIT)
-=============
-
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in
-all copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
-THE SOFTWARE.
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4/mira4.py
--- a/tools/mira4/mira4.py Tue Oct 28 08:29:59 2014 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,313 +0,0 @@
-#!/usr/bin/env python
-"""A simple wrapper script to call MIRA and collect its output.
-"""
-import os
-import sys
-import subprocess
-import shutil
-import time
-import tempfile
-from optparse import OptionParser
-
-#Do we need any PYTHONPATH magic?
-from mira4_make_bam import make_bam
-
-WRAPPER_VER = "0.0.4" #Keep in sync with the XML file
-
-def stop_err(msg, err=1):
- sys.stderr.write(msg+"\n")
- sys.exit(err)
-
-
-def get_version(mira_binary):
- """Run MIRA to find its version number"""
- # At the commend line I would use: mira -v | head -n 1
- # however there is some pipe error when doing that here.
- cmd = [mira_binary, "-v"]
- try:
- child = subprocess.Popen(cmd,
- stdout=subprocess.PIPE,
- stderr=subprocess.STDOUT)
- except Exception, err:
- sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
- sys.exit(1)
- ver, tmp = child.communicate()
- del child
- return ver.split("\n", 1)[0].strip()
-
-#Parse Command Line
-usage = """Galaxy MIRA4 wrapper script v%s - use as follows:
-
-$ python mira4.py ...
-
-This will run the MIRA binary and collect its output files as directed.
-""" % WRAPPER_VER
-parser = OptionParser(usage=usage)
-parser.add_option("-m", "--manifest", dest="manifest",
- default=None, metavar="FILE",
- help="MIRA manifest filename")
-parser.add_option("--maf", dest="maf",
- default="-", metavar="FILE",
- help="MIRA MAF output filename")
-parser.add_option("--bam", dest="bam",
- default="-", metavar="FILE",
- help="Unpadded BAM output filename")
-parser.add_option("--fasta", dest="fasta",
- default="-", metavar="FILE",
- help="Unpadded FASTA output filename")
-parser.add_option("--log", dest="log",
- default="-", metavar="FILE",
- help="MIRA logging output filename")
-parser.add_option("-v", "--version", dest="version",
- default=False, action="store_true",
- help="Show version and quit")
-options, args = parser.parse_args()
-manifest = options.manifest
-out_maf = options.maf
-out_bam = options.bam
-out_fasta = options.fasta
-out_log = options.log
-
-try:
- mira_path = os.environ["MIRA4"]
-except KeyError:
- stop_err("Environment variable $MIRA4 not set")
-mira_binary = os.path.join(mira_path, "mira")
-if not os.path.isfile(mira_binary):
- stop_err("Missing mira under $MIRA4, %r\nFolder contained: %s"
- % (mira_binary, ", ".join(os.listdir(mira_path))))
-mira_convert = os.path.join(mira_path, "miraconvert")
-if not os.path.isfile(mira_convert):
- stop_err("Missing miraconvert under $MIRA4, %r\nFolder contained: %s"
- % (mira_convert, ", ".join(os.listdir(mira_path))))
-
-mira_ver = get_version(mira_binary)
-if not mira_ver.strip().startswith("4.0"):
- stop_err("This wrapper is for MIRA V4.0, not:\n%s\n%s" % (mira_ver, mira_binary))
-mira_convert_ver = get_version(mira_convert)
-if not mira_convert_ver.strip().startswith("4.0"):
- stop_err("This wrapper is for MIRA V4.0, not:\n%s\n%s" % (mira_ver, mira_convert))
-if options.version:
- print "%s, MIRA wrapper version %s" % (mira_ver, WRAPPER_VER)
- if mira_ver != mira_convert_ver:
- print "WARNING: miraconvert %s" % mira_convert_ver
- sys.exit(0)
-
-if not manifest:
- stop_err("Manifest is required")
-elif not os.path.isfile(manifest):
- stop_err("Missing input MIRA manifest file: %r" % manifest)
-
-
-try:
- threads = int(os.environ.get("GALAXY_SLOTS", "1"))
-except ValueError:
- threads = 1
-assert 1 <= threads, threads
-
-
-def override_temp(manifest):
- """Override ``-DI:trt=/tmp`` in manifest with environment variable.
-
- Currently MIRA 4 does not allow envronment variables like ``$TMP``
- inside the manifest, which is a problem if you need to override
- the default at run time.
-
- The tool XML will ``/tmp`` and we replace that here with
- ``tempfile.gettempdir()`` which will respect $TMPDIR, $TEMP, $TMP
- as explained in the Python standard library documentation:
- http://docs.python.org/2/library/tempfile.html#tempfile.tempdir
-
- By default MIRA 4 would write its temporary files within the output
- folder, which is a problem if that is a network drive.
- """
- handle = open(manifest, "r")
- text = handle.read()
- handle.close()
-
- #At time of writing, this is at the end of a file,
- #but could be followed by a space in future...
- text = text.replace("-DI:trt=/tmp", "-DI:trt=" + tempfile.gettempdir())
-
- #Want to try to ensure this gets written to disk before MIRA attempts
- #to open it - any networked file system may impose a delay...
- handle = open(manifest, "w")
- handle.write(text)
- handle.flush()
- os.fsync(handle.fileno())
- handle.close()
-
-
-def log_manifest(manifest):
- """Write the manifest file to stderr."""
- sys.stderr.write("\n%s\nManifest file\n%s\n" % ("="*60, "="*60))
- with open(manifest) as h:
- for line in h:
- sys.stderr.write(line)
- sys.stderr.write("\n%s\nEnd of manifest\n%s\n" % ("="*60, "="*60))
-
-
-def collect_output(temp, name, handle):
- """Moves files to the output filenames (global variables)."""
- n3 = (temp, name, name, name)
- f = "%s/%s_assembly/%s_d_results" % (temp, name, name)
- if not os.path.isdir(f):
- log_manifest(manifest)
- stop_err("Missing output folder")
- if not os.listdir(f):
- log_manifest(manifest)
- stop_err("Empty output folder")
- missing = []
-
- old_maf = "%s/%s_out.maf" % (f, name)
- if not os.path.isfile(old_maf):
- #Triggered extractLargeContigs.sh?
- old_maf = "%s/%s_LargeContigs_out.maf" % (f, name)
-
- #De novo or single strain mapping,
- old_fasta = "%s/%s_out.unpadded.fasta" % (f, name)
- ref_fasta = "%s/%s_out.padded.fasta" % (f, name)
- if not os.path.isfile(old_fasta):
- #Mapping (StrainX versus reference) or de novo
- old_fasta = "%s/%s_out_StrainX.unpadded.fasta" % (f, name)
- ref_fasta = "%s/%s_out_StrainX.padded.fasta" % (f, name)
- if not os.path.isfile(old_fasta):
- old_fasta = "%s/%s_out_ReferenceStrain.unpadded.fasta" % (f, name)
- ref_fasta = "%s/%s_out_ReferenceStrain.padded.fasta" % (f, name)
-
-
- missing = False
- for old, new in [(old_maf, out_maf),
- (old_fasta, out_fasta)]:
- if not os.path.isfile(old):
- missing = True
- elif not new or new == "-":
- handle.write("Ignoring %s\n" % old)
- else:
- handle.write("Capturing %s\n" % old)
- shutil.move(old, new)
- if missing:
- log_manifest(manifest)
- sys.stderr.write("Contents of %r:\n" % f)
- for filename in sorted(os.listdir(f)):
- sys.stderr.write("%s\n" % filename)
-
- #For mapping mode, probably most people would expect a BAM file
- #using the reference FASTA file...
- if out_bam and out_bam != "-":
- if out_maf and out_maf != "-":
- msg = make_bam(mira_convert, out_maf, ref_fasta, out_bam, handle)
- else:
- #Not collecting the MAF file, use original location
- msg = make_bam(mira_convert, old_maf, ref_fasta, out_bam, handle)
- if msg:
- stop_err(msg)
-
-def clean_up(temp, name):
- folder = "%s/%s_assembly" % (temp, name)
- if os.path.isdir(folder):
- shutil.rmtree(folder)
-
-#TODO - Run MIRA in /tmp or a configurable directory?
-#Currently Galaxy puts us somewhere safe like:
-#/opt/galaxy-dist/database/job_working_directory/846/
-temp = "."
-
-name = "MIRA"
-
-override_temp(manifest)
-
-start_time = time.time()
-cmd_list = [mira_binary, "-t", str(threads), manifest]
-cmd = " ".join(cmd_list)
-
-assert os.path.isdir(temp)
-d = "%s_assembly" % name
-#This can fail on my development machine if stale folders exist
-#under Galaxy's .../database/job_working_directory/ tree:
-assert not os.path.isdir(d), "Path %r already exists:\n%s" % (d, os.path.abspath(d))
-try:
- #Check path access
- os.mkdir(d)
-except Exception, err:
- log_manifest(manifest)
- sys.stderr.write("Error making directory %s\n%s" % (d, err))
- sys.exit(1)
-
-#print os.path.abspath(".")
-#print cmd
-
-if out_log and out_log != "-":
- handle = open(out_log, "w")
-else:
- handle = open(os.devnull, "w")
-handle.write("======================== MIRA manifest (instructions) ========================\n")
-m = open(manifest, "rU")
-for line in m:
- handle.write(line)
-m.close()
-del m
-handle.write("\n")
-handle.write("============================ Starting MIRA now ===============================\n")
-handle.flush()
-try:
- #Run MIRA
- child = subprocess.Popen(cmd_list,
- stdout=handle,
- stderr=subprocess.STDOUT)
-except Exception, err:
- log_manifest(manifest)
- sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
- #TODO - call clean up?
- handle.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
- handle.close()
- sys.exit(1)
-#Use .communicate as can get deadlocks with .wait(),
-stdout, stderr = child.communicate()
-assert not stdout and not stderr #Should be empty as sent to handle
-run_time = time.time() - start_time
-return_code = child.returncode
-handle.write("\n")
-handle.write("============================ MIRA has finished ===============================\n")
-handle.write("MIRA took %0.2f hours\n" % (run_time / 3600.0))
-if return_code:
- print "MIRA took %0.2f hours" % (run_time / 3600.0)
- handle.write("Return error code %i from command:\n" % return_code)
- handle.write(cmd + "\n")
- handle.close()
- clean_up(temp, name)
- log_manifest(manifest)
- stop_err("Return error code %i from command:\n%s" % (return_code, cmd),
- return_code)
-handle.flush()
-
-if os.path.isfile("MIRA_assembly/MIRA_d_results/ec.log"):
- handle.write("\n")
- handle.write("====================== Extract Large Contigs failed ==========================\n")
- e = open("MIRA_assembly/MIRA_d_results/ec.log", "rU")
- for line in e:
- handle.write(line)
- e.close()
- handle.write("============================ (end of ec.log) =================================\n")
- handle.flush()
-
-#print "Collecting output..."
-start_time = time.time()
-collect_output(temp, name, handle)
-collect_time = time.time() - start_time
-handle.write("MIRA took %0.2f hours; collecting output %0.2f minutes\n" % (run_time / 3600.0, collect_time / 60.0))
-print("MIRA took %0.2f hours; collecting output %0.2f minutes\n" % (run_time / 3600.0, collect_time / 60.0))
-
-if os.path.isfile("MIRA_assembly/MIRA_d_results/ec.log"):
- #Treat as an error, but doing this AFTER collect_output
- sys.stderr.write("Extract Large Contigs failed\n")
- handle.write("Extract Large Contigs failed\n")
- handle.close()
- sys.exit(1)
-
-#print "Cleaning up..."
-clean_up(temp, name)
-
-handle.write("\nDone\n")
-handle.close()
-print("Done")
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4/mira4_bait.py
--- a/tools/mira4/mira4_bait.py Tue Oct 28 08:29:59 2014 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,115 +0,0 @@
-#!/usr/bin/env python
-"""A simple wrapper script to call MIRA4's mirabait and collect its output.
-"""
-import os
-import sys
-import subprocess
-import shutil
-import time
-
-WRAPPER_VER = "0.0.1" #Keep in sync with the XML file
-
-def stop_err(msg, err=1):
- sys.stderr.write(msg+"\n")
- sys.exit(err)
-
-
-def get_version(mira_binary):
- """Run MIRA to find its version number"""
- # At the commend line I would use: mira -v | head -n 1
- # however there is some pipe error when doing that here.
- cmd = [mira_binary, "-v"]
- try:
- child = subprocess.Popen(cmd,
- stdout=subprocess.PIPE,
- stderr=subprocess.STDOUT)
- except Exception, err:
- sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
- sys.exit(1)
- ver, tmp = child.communicate()
- del child
- #Workaround for -v not working in mirabait 4.0RC4
- if "invalid option" in ver.split("\n", 1)[0]:
- for line in ver.split("\n", 1):
- if " version " in line:
- line = line.split()
- return line[line.index("version")+1].rstrip(")")
- stop_err("Could not determine MIRA version:\n%s" % ver)
- return ver.split("\n", 1)[0]
-
-try:
- mira_path = os.environ["MIRA4"]
-except KeyError:
- stop_err("Environment variable $MIRA4 not set")
-mira_binary = os.path.join(mira_path, "mirabait")
-if not os.path.isfile(mira_binary):
- stop_err("Missing mirabait under $MIRA4, %r\nFolder contained: %s"
- % (mira_binary, ", ".join(os.listdir(mira_path))))
-mira_ver = get_version(mira_binary)
-if not mira_ver.strip().startswith("4.0"):
- stop_err("This wrapper is for MIRA V4.0, not:\n%s" % mira_ver)
-if "-v" in sys.argv or "--version" in sys.argv:
- print "%s, MIRA wrapper version %s" % (mira_ver, WRAPPER_VER)
- sys.exit(0)
-
-
-format, output_choice, strand_choice, kmer_length, min_occurance, bait_file, in_file, out_file = sys.argv[1:]
-
-if format.startswith("fastq"):
- format = "fastq"
-elif format == "mira":
- format = "maf"
-elif format != "fasta":
- stop_err("Was not expected format %r" % format)
-
-assert out_file.endswith(".dat")
-out_file_stem = out_file[:-4]
-
-cmd_list = [mira_binary, "-f", format, "-t", format,
- "-k", kmer_length, "-n", min_occurance,
- bait_file, in_file, out_file_stem]
-if output_choice == "pos":
- pass
-elif output_choice == "neg":
- #Invert the selection...
- cmd_list.insert(1, "-i")
-else:
- stop_err("Output choice should be 'pos' or 'neg', not %r" % output_choice)
-if strand_choice == "both":
- pass
-elif strand_choice == "fwd":
- #Ingore reverse strand...
- cmd_list.insert(1, "-r")
-else:
- stop_err("Strand choice should be 'both' or 'fwd', not %r" % strand_choice)
-
-cmd = " ".join(cmd_list)
-#print cmd
-start_time = time.time()
-try:
- #Run MIRA
- child = subprocess.Popen(cmd_list,
- stdout=subprocess.PIPE,
- stderr=subprocess.STDOUT)
-except Exception, err:
- log_manifest(manifest)
- sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
- sys.exit(1)
-#Use .communicate as can get deadlocks with .wait(),
-stdout, stderr = child.communicate()
-assert stderr is None # Due to way we ran with subprocess
-run_time = time.time() - start_time
-return_code = child.returncode
-print "mirabait took %0.2f minutes" % (run_time / 60.0)
-
-if return_code:
- sys.stderr.write(stdout)
- stop_err("Return error code %i from command:\n%s" % (return_code, cmd),
- return_code)
-
-#Capture output
-out_tmp = out_file_stem + "." + format
-if not os.path.isfile(out_tmp):
- sys.stderr.write(stdout)
- stop_err("Missing output file from mirabait: %s" % out_tmp)
-shutil.move(out_tmp, out_file)
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4/mira4_bait.xml
--- a/tools/mira4/mira4_bait.xml Tue Oct 28 08:29:59 2014 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,112 +0,0 @@
-
- Filter reads using kmer matches
-
- mirabait
- MIRA
-
- mira4_bait.py --version
-
-mira4_bait.py $input_reads.ext $output_choice $strand_choice $kmer_length $min_occurence "$bait_file" "$input_reads" "$output_reads"
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-**What it does**
-
-Runs the ``mirabait`` utility from MIRA v4.0 to filter your input reads
-according to whether or not they contain perfect kmer matches to your
-bait file. By default this looks for 31-mers (kmers or *k*-mers where
-the fragment length *k* is 31), and only requires a single matching kmer.
-
-The ``mirabait`` utility is useful in many applications and pipelines
-outside of using the main MIRA tool for assembly or mapping.
-
-.. class:: warningmark
-
-Note ``mirabait`` cannot be used on protein (amino acid) sequences.
-
-**Example Usage**
-
-To remove over abundant entries like rRNA sequences, run ``mirabait`` with
-known rRNA sequences as the bait and select the *negative* matches.
-
-To do targeted assembly by fishing out reads belonging to a gene and just
-assemble these, run ``mirabait`` with the gene of interest as the bait and
-select the *positive* matches.
-
-To iteratively reconstruct mitochondria you could start by fishing out reads
-matching any known mitochondrial sequence, assembly those, and repeat.
-
-
-**Notes on paired read**
-
-.. class:: warningmark
-
-While MIRA4 is aware of many read naming conventions to identify paired read
-partners, the ``mirabait`` tool considers each read in isolation. Applying
-it to paired read files may leave you with orphaned reads.
-
-
-**Citation**
-
-If you use this Galaxy tool in work leading to a scientific publication please
-cite the following papers:
-
-Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013).
-Galaxy tools and workflows for sequence analysis with applications
-in molecular plant pathology. PeerJ 1:e167
-http://dx.doi.org/10.7717/peerj.167
-
-Bastien Chevreux, Thomas Wetter and Sándor Suhai (1999).
-Genome Sequence Assembly Using Trace Signals and Additional Sequence Information.
-Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56.
-http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html
-
-This wrapper is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler
-
-
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4/mira4_convert.py
--- a/tools/mira4/mira4_convert.py Tue Oct 28 08:29:59 2014 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,225 +0,0 @@
-#!/usr/bin/env python
-"""A simple wrapper script to call MIRA and collect its output.
-
-This focuses on the miraconvert binary.
-"""
-import os
-import sys
-import subprocess
-import shutil
-import time
-import tempfile
-from optparse import OptionParser
-try:
- from io import BytesIO
-except ImportError:
- #Should we worry about Python 2.5 or older?
- from StringIO import StringIO as BytesIO
-
-#Do we need any PYTHONPATH magic?
-from mira4_make_bam import depad
-
-WRAPPER_VER = "0.0.5" #Keep in sync with the XML file
-
-def stop_err(msg, err=1):
- sys.stderr.write(msg+"\n")
- sys.exit(err)
-
-def run(cmd):
- #Avoid using shell=True when we call subprocess to ensure if the Python
- #script is killed, so too is the child process.
- try:
- child = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
- except Exception, err:
- stop_err("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
- #Use .communicate as can get deadlocks with .wait(),
- stdout, stderr = child.communicate()
- return_code = child.returncode
- if return_code:
- if stderr and stdout:
- stop_err("Return code %i from command:\n%s\n\n%s\n\n%s" % (return_code, err, stdout, stderr))
- else:
- stop_err("Return code %i from command:\n%s\n%s" % (return_code, err, stderr))
-
-def get_version(mira_binary):
- """Run MIRA to find its version number"""
- # At the commend line I would use: mira -v | head -n 1
- # however there is some pipe error when doing that here.
- cmd = [mira_binary, "-v"]
- try:
- child = subprocess.Popen(cmd,
- stdout=subprocess.PIPE,
- stderr=subprocess.STDOUT)
- except Exception, err:
- sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
- sys.exit(1)
- ver, tmp = child.communicate()
- del child
- return ver.split("\n", 1)[0].strip()
-
-#Parse Command Line
-usage = """Galaxy MIRA4 wrapper script v%s - use as follows:
-
-$ python mira4_convert.py ...
-
-This will run the MIRA miraconvert binary and collect its output files as directed.
-""" % WRAPPER_VER
-parser = OptionParser(usage=usage)
-parser.add_option("--input", dest="input",
- default=None, metavar="FILE",
- help="MIRA input filename")
-parser.add_option("-x", "--min_length", dest="min_length",
- default="0",
- help="Minimum contig length")
-parser.add_option("-y", "--min_cover", dest="min_cover",
- default="0",
- help="Minimum average contig coverage")
-parser.add_option("-z", "--min_reads", dest="min_reads",
- default="0",
- help="Minimum reads per contig")
-parser.add_option("--maf", dest="maf",
- default="", metavar="FILE",
- help="MIRA MAF output filename")
-parser.add_option("--ace", dest="ace",
- default="", metavar="FILE",
- help="ACE output filename")
-parser.add_option("--bam", dest="bam",
- default="", metavar="FILE",
- help="Unpadded BAM output filename")
-parser.add_option("--fasta", dest="fasta",
- default="", metavar="FILE",
- help="Unpadded FASTA output filename")
-parser.add_option("--cstats", dest="cstats",
- default="", metavar="FILE",
- help="Contig statistics filename")
-parser.add_option("-v", "--version", dest="version",
- default=False, action="store_true",
- help="Show version and quit")
-options, args = parser.parse_args()
-if args:
- stop_err("Expected options (e.g. --input example.maf), not arguments")
-
-input_maf = options.input
-out_maf = options.maf
-out_bam = options.bam
-out_fasta = options.fasta
-out_ace = options.ace
-out_cstats = options.cstats
-
-try:
- mira_path = os.environ["MIRA4"]
-except KeyError:
- stop_err("Environment variable $MIRA4 not set")
-mira_convert = os.path.join(mira_path, "miraconvert")
-if not os.path.isfile(mira_convert):
- stop_err("Missing miraconvert under $MIRA4, %r\nFolder contained: %s"
- % (mira_convert, ", ".join(os.listdir(mira_path))))
-
-mira_convert_ver = get_version(mira_convert)
-if not mira_convert_ver.strip().startswith("4.0"):
- stop_err("This wrapper is for MIRA V4.0, not:\n%s\n%s" % (mira_ver, mira_convert))
-if options.version:
- print "%s, MIRA wrapper version %s" % (mira_convert_ver, WRAPPER_VER)
- sys.exit(0)
-
-if not input_maf:
- stop_err("Input MIRA file is required")
-elif not os.path.isfile(input_maf):
- stop_err("Missing input MIRA file: %r" % input_maf)
-
-if not (out_maf or out_bam or out_fasta or out_ace or out_cstats):
- stop_err("No output requested")
-
-
-def check_min_int(value, name):
- try:
- i = int(value)
- except:
- stop_err("Bad %s setting, %r" % (name, value))
- if i < 0:
- stop_err("Negative %s setting, %r" % (name, value))
- return i
-
-min_length = check_min_int(options.min_length, "minimum length")
-min_cover = check_min_int(options.min_cover, "minimum cover")
-min_reads = check_min_int(options.min_reads, "minimum reads")
-
-#TODO - Run MIRA in /tmp or a configurable directory?
-#Currently Galaxy puts us somewhere safe like:
-#/opt/galaxy-dist/database/job_working_directory/846/
-temp = "."
-
-
-cmd_list = [mira_convert]
-if min_length:
- cmd_list.extend(["-x", str(min_length)])
-if min_cover:
- cmd_list.extend(["-y", str(min_cover)])
-if min_reads:
- cmd_list.extend(["-z", str(min_reads)])
-cmd_list.extend(["-f", "maf", input_maf, os.path.join(temp, "converted")])
-if out_maf:
- cmd_list.append("maf")
-if out_bam:
- cmd_list.append("samnbb")
- if not out_fasta:
- #Need this for samtools depad
- out_fasta = os.path.join(temp, "depadded.fasta")
-if out_fasta:
- cmd_list.append("fasta")
-if out_ace:
- cmd_list.append("ace")
-if out_cstats:
- cmd_list.append("cstats")
-run(cmd_list)
-
-def collect(old, new):
- if not os.path.isfile(old):
- stop_err("Missing expected output file %s" % old)
- shutil.move(old, new)
-
-if out_maf:
- collect(os.path.join(temp, "converted.maf"), out_maf)
-if out_fasta:
- #Can we look at the MAF file to see if there are multiple strains?
- old = os.path.join(temp, "converted_AllStrains.unpadded.fasta")
- if os.path.isfile(old):
- collect(old, out_fasta)
- else:
- #Might the output be filtered down to zero contigs?
- old = os.path.join(temp, "converted.fasta")
- if not os.path.isfile(old):
- stop_err("Missing expected output FASTA file")
- elif os.path.getsize(old) == 0:
- print("Warning - no contigs (harsh filters?)")
- collect(old, out_fasta)
- else:
- stop_err("Missing expected output FASTA file (only generic file present)")
-if out_ace:
- collect(os.path.join(temp, "converted.maf"), out_ace)
-if out_cstats:
- collect(os.path.join(temp, "converted_info_contigstats.txt"), out_cstats)
-
-if out_bam:
- assert os.path.isfile(out_fasta)
- old = os.path.join(temp, "converted.samnbb")
- if not os.path.isfile(old):
- old = os.path.join(temp, "converted.sam")
- if not os.path.isfile(old):
- stop_err("Missing expected intermediate file %s" % old)
- h = BytesIO()
- msg = depad(out_fasta, old, out_bam, h)
- if msg:
- print(msg)
- print(h.getvalue())
- h.close()
- sys.exit(1)
- h.close()
- if out_fasta == os.path.join(temp, "depadded.fasta"):
- #Not asked for by Galaxy, no longer needed
- os.remove(out_fasta)
-
-if min_length or min_cover or min_reads:
- print("Filtered.")
-else:
- print("Converted.")
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4/mira4_convert.xml
--- a/tools/mira4/mira4_convert.xml Tue Oct 28 08:29:59 2014 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,114 +0,0 @@
-
- Convert MIRA assembly to FASTA/SAM/BAM
-
- miraconvert
- MIRA
-
- mira4_convert.py --version
-
-mira4_convert.py
---input "$mira_file"
---min_length $min_length
---min_cover $min_cover
---min_reads $min_reads
-#if str($maf_wanted)=="true":
---maf "$out_maf"
-#end if
-#if str($fasta_wanted)=="true":
---fasta "$out_fasta"
-#end if
-#if str($bam_wanted)=="true":
---bam "$out_bam"
-#end if
-##Don't yet have a Galaxy datatype defined for ace:
-## #if str($ace_wanted)=="true":
-## --ace "$out_ace"
-## #end if
-#if str($cstats_wanted)=="true":
---cstats "$out_cstats"
-#end if
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- maf_wanted is True
-
-
- fasta_wanted is True
-
-
- bam_wanted is True
-
-
-
- cstats_wanted is True
-
-
-
-
-
-
-**What it does**
-
-Runs the ``miraconvert`` utility from MIRA v4.0 to filter and/or convert
-a MIRA Assembly Format file produced by a *mapping* or *de novo* assembly.
-
-**Example Usage**
-
-You want to remove all the low coverage contigs from a transcriptome
-assembly to focus on those with higher coverage.
-
-You want to convert your MIRA assembly into SAM/BAM to run a standard
-SNP finding tool.
-
-You've lost the FASTA consensus from your MIRA assembly and need to
-regenerate it.
-
-
-**Citation**
-
-If you use this Galaxy tool in work leading to a scientific publication please
-cite the following papers:
-
-Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013).
-Galaxy tools and workflows for sequence analysis with applications
-in molecular plant pathology. PeerJ 1:e167
-http://dx.doi.org/10.7717/peerj.167
-
-Bastien Chevreux, Thomas Wetter and Sándor Suhai (1999).
-Genome Sequence Assembly Using Trace Signals and Additional Sequence Information.
-Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56.
-http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html
-
-This wrapper is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler
-
-
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4/mira4_de_novo.xml
--- a/tools/mira4/mira4_de_novo.xml Tue Oct 28 08:29:59 2014 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,263 +0,0 @@
-
- Takes Sanger, Roche 454, Solexa/Illumina, Ion Torrent and PacBio reads
-
- mira
- miraconvert
- MIRA
- samtools
- samtools
-
- mira4.py --version
- mira4.py
---manifest "$manifest"
-#if str($maf_wanted)=="true":
---maf "$out_maf"
-#end if
-#if str($bam_wanted)=="true":
---bam "$out_bam"
-#end if
---fasta "$out_fasta"
---log "$out_log"
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- bam_wanted is True
-
-
- maf_wanted is True
-
-
-
-
-
-
-project = MIRA
-job = denovo,${job_type},${job_quality}
-parameters = -NW:cmrnl=no -DI:trt=/tmp -OUT:orc=no
-## -GE:not is short for -GENERAL:number_of_threads and using one (1)
-## can be useful for repeatability of assemblies and bug hunting.
-## This is overriden by the command line -t switch which is easier
-## to set from within Galaxy.
-##
-## -NW:cmrnl is short for -NAG_AND_WARN:check_maxreadnamelength
-## and without this MIRA aborts with read names over 40 characters
-## due to limitations of some downstream tools.
-##
-## -DI:trt is short for -DIRECTORY:tmp_redirected_to and should
-## point to a local hard drive (not something like NFS on network).
-## We replace /tmp with an environment variable via mira4.py
-##
-## -OUT:orc=no is short for -OUTPUT:output_result_caf=no
-## which turns off an output file we don't want anyway.
-
-#for $rg in $read_group
-
-##This bar goes into the manifest as a comment line
-#------------------------------------------------------------------------------
-
-readgroup
-technology = ${rg.technology}
-##Record the segment placement (if any)
-#if str($rg.segments.type) == "paired"
-segment_placement = ${rg.segments.placement}
-segment_naming = ${rg.segments.naming}
-#if str($rg.segments.min_size) != "" or str($rg.segments.max_size) != ""
-##If our min/max validation failed I trust MIRA to give an error message...
-template_size = $rg.segments.min_size $rg.segments.max_size
-#end if
-#end if
-##if str($rg.segments.type) == "none"
-##MIRA4 manual says use segment_placement = unknown or ? for unpaired data
-##but this stopped working in MIRA 4.0 RC5 and 4.0 (final). See:
-##http://www.freelists.org/post/mira_talk/Unpaired-reads-and-segment-placement--or-unknown
-##segment_placement = ?
-##end if
-##MIRA will accept multiple filenames on one data line, or multiple data lines
-#for $f in $rg.filenames
-##Must now map Galaxy datatypes to MIRA file types...
-#if $f.ext.startswith("fastq")
-##MIRA doesn't like fastqsanger etc, just plain old fastq:
-data = fastq::$f
-#elif $f.ext == "mira"
-##We're calling *.maf the "mira" format in Galaxy (name space collision)
-data = maf::$f
-#else
-##MIRA is happy with fasta as name,
-data = ${f.ext}::$f
-#end if
-#end for
-#end for
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-**What it does**
-
-Runs MIRA v4.0 in de novo mode, collects the output, generates a sorted BAM
-file, and then throws away all the temporary files.
-
-MIRA is an open source assembly tool capable of handling sequence data from
-a range of platforms (Sanger capillary, Solexa/Illumina, Roche 454, Ion Torrent
-and also PacBio).
-
-It is particularly suited to small genomes such as bacteria.
-
-
-**Notes on paired reads**
-
-.. class:: warningmark
-
-MIRA uses read naming conventions to identify paired read partners
-(and does not care about their order in the input files). In most cases,
-the Solexa/Illumina setting is fine. For Sanger capillary sequencing,
-you may need to rename your reads to match one of the standard conventions
-supported by MIRA. For Roche 454 or Ion Torrent the appropriate settings
-depend on how the FASTQ file was produced:
-
-* If using Roche's ``sffinfo`` or older versions of ``sff_extract``
- to convert SFF files to FASTQ, your reads will probably have the
- ``---> <---`` orientation and use the ``.f`` and ``.r``
- suffixes (FR naming).
-
-* If using a recent version of ``sff_extract``, then the ``/1`` and ``/2``
- suffixes are used (Solexa/Illumina style naming) and the original
- ``2---> 1--->`` orientation is preserved.
-
-The reason for this is the raw data for Roche 454 and Ion Torrent paired-end
-libraries sequences a circularised fragment such that the raw data begins
-with the end of the fragment, a linker, then the start of the fragment.
-This means both the start and end are sequenced from the same strand, and
-have the orientation ``2---> 1--->``. However, in order to use the data
-with traditional tools expecting Sanger capillary style ``---> <---``
-orientation it was common to reverse complement one of the pair to mimic this.
-
-
-**Citation**
-
-If you use this Galaxy tool in work leading to a scientific publication please
-cite the following papers:
-
-Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013).
-Galaxy tools and workflows for sequence analysis with applications
-in molecular plant pathology. PeerJ 1:e167
-http://dx.doi.org/10.7717/peerj.167
-
-Bastien Chevreux, Thomas Wetter and Sándor Suhai (1999).
-Genome Sequence Assembly Using Trace Signals and Additional Sequence Information.
-Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56.
-http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html
-
-This wrapper is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler
-
-
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4/mira4_make_bam.py
--- a/tools/mira4/mira4_make_bam.py Tue Oct 28 08:29:59 2014 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,92 +0,0 @@
-#!/usr/bin/env python
-"""Wrapper script using miraconvert & samtools to get BAM from MIRA.
-"""
-import os
-import sys
-import shutil
-import subprocess
-import tempfile
-
-def stop_err(msg, err=1):
- sys.stderr.write(msg+"\n")
- sys.exit(err)
-
-def run(cmd, log_handle):
- try:
- child = subprocess.Popen(cmd, shell=True,
- stdout=subprocess.PIPE,
- stderr=subprocess.STDOUT)
- except Exception, err:
- sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
- #TODO - call clean up?
- log_handle.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
- sys.exit(1)
- #Use .communicate as can get deadlocks with .wait(),
- stdout, stderr = child.communicate()
- assert not stderr #Should be empty as sent to stdout
- if len(stdout) > 10000:
- #miraconvert can be very verbose (is holding stdout in RAM a problem?)
- stdout = stdout.split("\n")
- stdout = stdout[:10] + ["...", "", "..."] + stdout[-10:]
- stdout = "\n".join(stdout)
- log_handle.write(stdout)
- return child.returncode
-
-def depad(fasta_file, sam_file, bam_file, log_handle):
- log_handle.write("\n================= Converting MIRA assembly from SAM to BAM ===================\n")
- #Also doing SAM to (uncompressed) BAM during depad
- bam_stem = bam_file + ".tmp" # Have write permissions and want final file in this folder
- cmd = 'samtools depad -S -u -T "%s" "%s" | samtools sort - "%s"' % (fasta_file, sam_file, bam_stem)
- return_code = run(cmd, log_handle)
- if return_code:
- return "Error %i from command:\n%s" % (return_code, cmd)
- if not os.path.isfile(bam_stem + ".bam"):
- return "samtools depad or sort failed to produce BAM file"
-
- log_handle.write("\n====================== Indexing MIRA assembly BAM file =======================\n")
- cmd = 'samtools index "%s.bam"' % bam_stem
- return_code = run(cmd, log_handle)
- if return_code:
- return "Error %i from command:\n%s" % (return_code, cmd)
- if not os.path.isfile(bam_stem + ".bam.bai"):
- return "samtools indexing of BAM file failed to produce BAI file"
-
- shutil.move(bam_stem + ".bam", bam_file)
- os.remove(bam_stem + ".bam.bai") #Let Galaxy handle that...
-
-
-def make_bam(mira_convert, maf_file, fasta_file, bam_file, log_handle):
- if not os.path.isfile(mira_convert):
- return "Missing binary %r" % mira_convert
- if not os.path.isfile(maf_file):
- return "Missing input MIRA file: %r" % maf_file
- if not os.path.isfile(fasta_file):
- return "Missing padded FASTA file: %r" % fasta_file
-
- log_handle.write("\n====================== Converting MIRA assembly to SAM =======================\n")
- tmp_dir = tempfile.mkdtemp()
- sam_file = os.path.join(tmp_dir, "x.sam")
-
- # Note add nbb to the template name, possible MIRA 4.0 RC4 bug
- cmd = '"%s" -f maf -t samnbb "%s" "%snbb"' % (mira_convert, maf_file, sam_file)
- return_code = run(cmd, log_handle)
- if return_code:
- return "Error %i from command:\n%s" % (return_code, cmd)
- if not os.path.isfile(sam_file):
- return "Conversion from MIRA to SAM failed"
-
- #Also doing SAM to (uncompressed) BAM during depad
- msg = depad(fasta_file, sam_file, bam_file, log_handle)
- if msg:
- return msg
-
- os.remove(sam_file)
- os.rmdir(tmp_dir)
-
- return None #Good :)
-
-if __name__ == "__main__":
- mira_convert, maf_file, fasta_file, bam_file = sys.argv[1:]
- msg = make_bam(mira_convert, maf_file, fasta_file, bam_file, sys.stdout)
- if msg:
- stop_err(msg)
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4/mira4_mapping.xml
--- a/tools/mira4/mira4_mapping.xml Tue Oct 28 08:29:59 2014 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,267 +0,0 @@
-
- Maps Sanger, Roche 454, Solexa/Illumina, Ion Torrent and PacBio reads
-
- mira
- miraconvert
- MIRA
- samtools
- samtools
-
- mira4.py --version
- mira4.py
---manifest "$manifest"
-#if str($maf_wanted) == "true":
---maf "$out_maf"
-#end if
-#if str($bam_wanted) == "true":
---bam "$out_bam"
-#end if
---fasta "$out_fasta"
---log "$out_log"
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- bam_wanted is True
-
-
- maf_wanted is True
-
-
-
-
-
-project = MIRA
-job = mapping,${job_type},${job_quality}
-parameters = -NW:cmrnl=no -DI:trt=/tmp -OUT:orc=no
-## -GE:not is short for -GENERAL:number_of_threads and using one (1)
-## can be useful for repeatability of assemblies and bug hunting.
-## This is overriden by the command line -t switch which is easier
-## to set from within Galaxy.
-##
-## -NW:cmrnl is short for -NAG_AND_WARN:check_maxreadnamelength
-## and without this MIRA aborts with read names over 40 characters
-## due to limitations of some downstream tools.
-##
-## -DI:trt is short for -DIRECTORY:tmp_redirected_to and should
-## point to a local hard drive (not something like NFS on network).
-## We replace /tmp with an environment variable via mira4.py
-##
-## -OUT:orc=no is short for -OUTPUT:output_result_caf=no
-## which turns off an output file we don't want anyway.
-
-##This bar goes into the manifest as a comment line
-#------------------------------------------------------------------------------
-
-readgroup
-is_reference
-#if str($strain_setup)=="same"
-strain = StrainX
-#end if
-#for $f in $references
-##Must now map Galaxy datatypes to MIRA file types...
-#if $f.ext.startswith("fastq")
-##MIRA doesn't like fastqsanger etc, just plain old fastq:
-data = fastq::$f
-#elif $f.ext == "mira"
-##We're calling *.maf the "mira" format in Galaxy (name space collision)
-data = maf::$f
-#elif $f.ext == "fasta"
-##We're calling MIRA with the file type as "fna" as otherwise it wants quals
-data = fna::$f
-#else
-##Currently don't expect anything else...
-data = ${f.ext}::$f
-#end if
-#end for
-#for $rg in $read_group
-
-##This bar goes into the manifest as a comment line
-#------------------------------------------------------------------------------
-
-readgroup
-technology = ${rg.technology}
-#if str($strain_setup)=="same"
-##This is perhaps redundant as MIRA defaults to StrainX for the reads:
-strain = StrainX
-#end if
-##Record the segment placement (if any)
-#if str($rg.segments.type) == "paired"
-segment_placement = ${rg.segments.placement}
-segment_naming = ${rg.segments.naming}
-#end if
-##if str($rg.segments.type) == "none"
-##MIRA4 manual says use segment_placement = unknown or ? for unpaired data
-##but this stopped working in MIRA 4.0 RC5 and 4.0 (final). See:
-##http://www.freelists.org/post/mira_talk/Unpaired-reads-and-segment-placement--or-unknown
-##segment_placement = ?
-##end if
-##MIRA will accept multiple filenames on one data line, or multiple data lines
-#for $f in $rg.filenames
-##Must now map Galaxy datatypes to MIRA file types...
-#if $f.ext.startswith("fastq")
-##MIRA doesn't like fastqsanger etc, just plain old fastq:
-data = fastq::$f
-#elif $f.ext == "mira"
-##We're calling *.maf the "mira" format in Galaxy (name space collision)
-data = maf::$f
-#else
-##Currently don't expect anything else...
-data = ${f.ext}::$f
-#end if
-#end for
-#end for
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-**What it does**
-
-Runs MIRA v4.0 in mapping mode, collects the output, generates a sorted BAM
-file, and throws away all the temporary files.
-
-MIRA is an open source assembly tool capable of handling sequence data from
-a range of platforms (Sanger capillary, Solexa/Illumina, Roche 454, Ion Torrent
-and also PacBio).
-
-It is particularly suited to small genomes such as bacteria.
-
-
-**Notes on paired reads**
-
-.. class:: warningmark
-
-MIRA uses read naming conventions to identify paired read partners
-(and does not care about their order in the input files). In most cases,
-the Solexa/Illumina setting is fine. For Sanger capillary sequencing,
-you may need to rename your reads to match one of the standard conventions
-supported by MIRA. For Roche 454 or Ion Torrent the appropriate settings
-depend on how the FASTQ file was produced:
-
-* If using Roche's ``sffinfo`` or older versions of ``sff_extract``
- to convert SFF files to FASTQ, your reads will probably have the
- ``---> <---`` orientation and use the ``.f`` and ``.r``
- suffixes (FR naming).
-
-* If using a recent version of ``sff_extract``, then the ``/1`` and ``/2``
- suffixes are used (Solexa/Illumina style naming) and the original
- ``2---> 1--->`` orientation is preserved.
-
-The reason for this is the raw data for Roche 454 and Ion Torrent paired-end
-libraries sequences a circularised fragment such that the raw data begins
-with the end of the fragment, a linker, then the start of the fragment.
-This means both the start and end are sequenced from the same strand, and
-have the orientation ``2---> 1--->``. However, in order to use the data
-with traditional tools expecting Sanger capillary style ``---> <---``
-orientation it was common to reverse complement one of the pair to mimic this.
-
-
-**Citation**
-
-If you use this Galaxy tool in work leading to a scientific publication please
-cite the following papers:
-
-Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013).
-Galaxy tools and workflows for sequence analysis with applications
-in molecular plant pathology. PeerJ 1:e167
-http://dx.doi.org/10.7717/peerj.167
-
-Bastien Chevreux, Thomas Wetter and Sándor Suhai (1999).
-Genome Sequence Assembly Using Trace Signals and Additional Sequence Information.
-Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56.
-http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html
-
-This wrapper is available to install into other Galaxy Instances via the Galaxy
-Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler
-
-
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4/mira4_validator.py
--- a/tools/mira4/mira4_validator.py Tue Oct 28 08:29:59 2014 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,64 +0,0 @@
-#Called from the Galaxy Tool XML file
-#import sys
-
-def validate_input(trans, error_map, param_values, page_param_map):
- """Validates the min_size/max_size user input, before execution."""
- err_list = []
- for read_group in param_values["read_group"]:
- err = dict()
- segments = read_group["segments"]
- if str(segments["type"]) != "paired":
- err_list.append(dict())
- continue
-
- min_size = str(segments["min_size"]).strip()
- max_size = str(segments["max_size"]).strip()
- #sys.stderr.write("DEBUG min_size=%r, max_size=%r\n" % (min_size, max_size))
-
- #Somehow Galaxy seems to turn an empty field into string "None"...
- if min_size=="None":
- min_size = ""
- if max_size=="None":
- max_size = ""
-
- if min_size=="" and max_size=="":
- #Both missing is good
- pass
- elif min_size=="":
- err["min_size"] = "Minimum size required if maximum size given"
- elif max_size=="":
- err["max_size"] = "Maximum size required if minimum size given"
-
- if min_size:
- try:
- min_size_int = int(min_size)
- if min_size_int < 0:
- err["min_size"] = "Minumum size must not be negative (%i)" % min_size_int
- min_size = None # Avoid doing comparison below
- except ValueError:
- err["min_size"] = "Minimum size is not an integer (%s)" % min_size
- min_size = None # Avoid doing comparison below
-
- if max_size:
- try:
- max_size_int = int(max_size)
- if max_size_int< 0:
- err["max_size"] = "Maximum size must not be negative (%i)" % max_size_int
- max_size = None # Avoid doing comparison below
- except ValueError:
- err["max_size"] = "Maximum size is not an integer (%s)" % max_size
- max_size = None # Avoid doing comparison below
-
- if min_size and max_size and min_size_int > max_size_int:
- msg = "Minimum size must be less than maximum size (%i vs %i)" % (min_size_int, max_size_int)
- err["min_size"] = msg
- err["max_size"] = msg
-
- if err:
- err_list.append({"segments":err})
- else:
- err_list.append(dict())
-
- if any(err_list):
- #Return an error map only if any readgroup gave errors
- error_map["read_group"] = err_list
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4/repository_dependencies.xml
--- a/tools/mira4/repository_dependencies.xml Tue Oct 28 08:29:59 2014 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,4 +0,0 @@
-
-
-
-
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4/tool_dependencies.xml
--- a/tools/mira4/tool_dependencies.xml Tue Oct 28 08:29:59 2014 -0400
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,55 +0,0 @@
-
-
-
-
-
-
-
-
-
-
- http://downloads.sourceforge.net/project/mira-assembler/MIRA/stable/mira_4.0.2_darwin13.1.0_x86_64_static.tar.bz2
-
- bin
- $INSTALL_DIR
-
-
-
-
- http://downloads.sourceforge.net/project/mira-assembler/MIRA/stable/mira_4.0.2_linux-gnu_x86_64_static.tar.bz2
-
- bin
- $INSTALL_DIR
-
-
-
-
- echo "ERROR: Automated installation on your operating system and CPU architecture combination is not yet supported."
- echo "Your machine details (the output from 'uname' and 'arch'):"
- uname
- arch
- echo "If pre-compiled MIRA binaries are now available for this, please report this"
- echo "via https://github.com/peterjc/pico_galaxyt/issues - thank you!"
- false
-
-
-
-
- $INSTALL_DIR
-
-
- $INSTALL_DIR
-
-
-
-
-Downloads MIRA v4.0.2 from Sourceforge, requesting Bastien's precompiled binaries
-for 64 bit (x86_64) Linux or Mac OS X. Other platforms where compilation from
-source would be required (e.g. 32 bit Linux) are not supported by this automated
-installation script.
-
-http://chevreux.org/projects_mira.html
-http://sourceforge.net/projects/mira-assembler/
-
-
-
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4_assembler/README.rst
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_assembler/README.rst Wed May 13 05:55:43 2015 -0400
@@ -0,0 +1,176 @@
+Galaxy wrapper for the MIRA assembly program (v4.0)
+===================================================
+
+This tool is copyright 2011-2014 by Peter Cock, The James Hutton Institute
+(formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved.
+See the licence text below (MIT licence).
+
+This tool is a short Python script (to collect the MIRA output and move it
+to where Galaxy expects the files) and associated Galaxy wrapper XML file.
+
+It is available from the Galaxy Tool Shed at:
+http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler
+
+It uses a Galaxy datatype definition 'mira' for the MIRA Assembly Format,
+http://toolshed.g2.bx.psu.edu/view/peterjc/mira_datatypes
+
+A separate wrapper for MIRA v3.4 is available from the Galaxy Tool Shed at:
+http://toolshed.g2.bx.psu.edu/view/peterjc/mira_assembler
+
+Automated Installation
+======================
+
+This should be straightforward. Via the Tool Shed, Galaxy should automatically
+install the 'mira' datatype, samtools, and download and install the precompiled
+binary for MIRA v4.0.2 for the Galaxy wrapper, and run any tests.
+
+For MIRA 4, the Galaxy wrapper has been split in two, allowing separate
+cluster settings for de novo usage (high RAM) and mapping (lower RAM).
+Consult the Galaxy adminstration documentation for your cluster setup.
+
+WARNING: For larger tasks, be aware that MIRA can require vast amounts
+of RAM and run-times of over a week are possible. This tool wrapper makes
+no attempt to spot and reject such large jobs.
+
+
+Manual Installation
+===================
+
+First install the 'mira' datatype for Galaxy, available here:
+
+* http://toolshed.g2.bx.psu.edu/view/peterjc/mira_datatypes
+
+There are four Galaxy files to install:
+
+* ``mira4_de_novo.xml`` (the Galaxy tool definition for de novo usage)
+* ``mira4_mapping.xml`` (the Galaxy tool definition for mapping usage)
+* ``mira4_convert.xml`` (the Galaxy tool definition for converting MIRA files)
+* ``mira4_bait.xml`` (the Galaxy tool definition for mirabait)
+* ``mira4.py`` (the Python wrapper script)
+* ``mira4_convert.py`` (the Python wrapper script for miraconvert)
+* ``mira4_bait.py`` (the Python wrapper script for mirabait)
+* ``mira4_validator.py`` (the XML parameter validation script)
+
+The suggested location is a new ``tools/mira4`` folder. You will also need to
+modify the ``tools_conf.xml`` file to tell Galaxy to offer the tool::
+
+
+
+
+You will also need to install MIRA, we used version 4.0.2, and define the
+environment variable ``$MIRA4`` pointing at the folder containing the binaries.
+See:
+
+* http://chevreux.org/projects_mira.html
+* http://sourceforge.net/projects/mira-assembler/
+
+You may wish to use different cluster setups for the de novo and mapping
+tools, see above.
+
+You will also need to install samtools (for generating a BAM file from MIRA's
+SAM output).
+
+If you wish to run the unit tests, also move/copy the ``test-data/`` files
+under Galaxy's ``test-data/`` folder. Then::
+
+ $ ./run_tests.sh -id mira_4_0_bait
+ $ ./run_tests.sh -id mira_4_0_de_novo
+ $ ./run_tests.sh -id mira_4_0_mapping
+ $ ./run_tests.sh -id mira_4_0_convert
+
+
+History
+=======
+
+======= ======================================================================
+Version Changes
+------- ----------------------------------------------------------------------
+v0.0.1 - Initial version (prototype for MIRA 4.0 RC4, based on wrapper for v3.4)
+v0.0.2 - Include BAM output (using ``miraconvert`` and ``samtools``).
+ - Updated to target MIRA 4.0.1
+ - Simplified XML to apply input format to output data.
+ - Sets temporary folder at run time to respect environment variables
+ (``$TMPDIR``, ``$TEMP``, or ``$TMP`` in that order). This was
+ previously hard coded as ``/tmp``.
+v0.0.3 - Updated to target MIRA 4.0.2
+v0.0.4 - Using ``optparse`` for the Python wrapper script API
+ - Made MAF and BAM outputs optional
+ - Include wrapper for ``miraconvert``
+v0.0.5 - Tool definition now embeds citation information.
+v0.0.6 - Fixed error handling in ``mira4_convert.py``.
+v0.0.7 - Renamed folder (internal change only).
+ - Reorder XML elements (internal change only).
+ - Use the ``format_source=...`` tag in the MIRA bait wrapper.
+ - Planemo for Tool Shed upload (``.shed.yml``, internal change only).
+======= ======================================================================
+
+
+Developers
+==========
+
+Development is on a dedicated GitHub repository:
+https://github.com/peterjc/pico_galaxy/tree/master/tools/mira4_assembler
+
+For pushing a release to the test or main "Galaxy Tool Shed", use the following
+Planemo commands (which requires you have set your Tool Shed access details in
+``~/.planemo.yml`` and that you have access rights on the Tool Shed)::
+
+ $ planemo shed_upload --shed_target testtoolshed --check_diff ~/repositories/pico_galaxy/tools/mira4_assembler/
+ ...
+
+or::
+
+ $ planemo shed_upload --shed_target toolshed --check_diff ~/repositories/pico_galaxy/tools/mira4_assembler/
+ ...
+
+To just build and check the tar ball, use::
+
+ $ planemo shed_upload --tar_only ~/repositories/pico_galaxy/tools/mira4_assembler/
+ ...
+ $ tar -tzf shed_upload.tar.gz
+ test-data/U13small_m.fastq
+ test-data/U13small_m.mira4_de_novo.fasta
+ test-data/ecoli.fastq
+ test-data/ecoli.mira4_de_novo.fasta
+ test-data/empty_file.dat
+ test-data/header.mira
+ test-data/tvc_mini.fastq
+ test-data/tvc_contigs.fasta
+ test-data/tvc_map_ref_strain.fasta
+ test-data/tvc_map_same_strain.fasta
+ test-data/tvc_bait.fasta
+ test-data/tvc_mini_bait_neg.fastq
+ test-data/tvc_mini_bait_pos.fastq
+ test-data/tvc_mini_bait_strict.fastq
+ tools/mira4_assembler/README.rst
+ tools/mira4_assembler/mira4.py
+ tools/mira4_assembler/mira4_bait.py
+ tools/mira4_assembler/mira4_convert.py
+ tools/mira4_assembler/mira4_de_novo.xml
+ tools/mira4_assembler/mira4_make_bam.py
+ tools/mira4_assembler/mira4_mapping.xml
+ tools/mira4_assembler/mira4_validator.py
+ tools/mira4_assembler/repository_dependencies.xml
+ tools/mira4_assembler/tool_dependencies.xml
+
+
+Licence (MIT)
+=============
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4_assembler/mira4.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_assembler/mira4.py Wed May 13 05:55:43 2015 -0400
@@ -0,0 +1,313 @@
+#!/usr/bin/env python
+"""A simple wrapper script to call MIRA and collect its output.
+"""
+import os
+import sys
+import subprocess
+import shutil
+import time
+import tempfile
+from optparse import OptionParser
+
+#Do we need any PYTHONPATH magic?
+from mira4_make_bam import make_bam
+
+WRAPPER_VER = "0.0.4" #Keep in sync with the XML file
+
+def sys_exit(msg, err=1):
+ sys.stderr.write(msg+"\n")
+ sys.exit(err)
+
+
+def get_version(mira_binary):
+ """Run MIRA to find its version number"""
+ # At the commend line I would use: mira -v | head -n 1
+ # however there is some pipe error when doing that here.
+ cmd = [mira_binary, "-v"]
+ try:
+ child = subprocess.Popen(cmd,
+ stdout=subprocess.PIPE,
+ stderr=subprocess.STDOUT)
+ except Exception, err:
+ sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
+ sys.exit(1)
+ ver, tmp = child.communicate()
+ del child
+ return ver.split("\n", 1)[0].strip()
+
+#Parse Command Line
+usage = """Galaxy MIRA4 wrapper script v%s - use as follows:
+
+$ python mira4.py ...
+
+This will run the MIRA binary and collect its output files as directed.
+""" % WRAPPER_VER
+parser = OptionParser(usage=usage)
+parser.add_option("-m", "--manifest", dest="manifest",
+ default=None, metavar="FILE",
+ help="MIRA manifest filename")
+parser.add_option("--maf", dest="maf",
+ default="-", metavar="FILE",
+ help="MIRA MAF output filename")
+parser.add_option("--bam", dest="bam",
+ default="-", metavar="FILE",
+ help="Unpadded BAM output filename")
+parser.add_option("--fasta", dest="fasta",
+ default="-", metavar="FILE",
+ help="Unpadded FASTA output filename")
+parser.add_option("--log", dest="log",
+ default="-", metavar="FILE",
+ help="MIRA logging output filename")
+parser.add_option("-v", "--version", dest="version",
+ default=False, action="store_true",
+ help="Show version and quit")
+options, args = parser.parse_args()
+manifest = options.manifest
+out_maf = options.maf
+out_bam = options.bam
+out_fasta = options.fasta
+out_log = options.log
+
+try:
+ mira_path = os.environ["MIRA4"]
+except KeyError:
+ sys_exit("Environment variable $MIRA4 not set")
+mira_binary = os.path.join(mira_path, "mira")
+if not os.path.isfile(mira_binary):
+ sys_exit("Missing mira under $MIRA4, %r\nFolder contained: %s"
+ % (mira_binary, ", ".join(os.listdir(mira_path))))
+mira_convert = os.path.join(mira_path, "miraconvert")
+if not os.path.isfile(mira_convert):
+ sys_exit("Missing miraconvert under $MIRA4, %r\nFolder contained: %s"
+ % (mira_convert, ", ".join(os.listdir(mira_path))))
+
+mira_ver = get_version(mira_binary)
+if not mira_ver.strip().startswith("4.0"):
+ sys_exit("This wrapper is for MIRA V4.0, not:\n%s\n%s" % (mira_ver, mira_binary))
+mira_convert_ver = get_version(mira_convert)
+if not mira_convert_ver.strip().startswith("4.0"):
+ sys_exit("This wrapper is for MIRA V4.0, not:\n%s\n%s" % (mira_ver, mira_convert))
+if options.version:
+ print "%s, MIRA wrapper version %s" % (mira_ver, WRAPPER_VER)
+ if mira_ver != mira_convert_ver:
+ print "WARNING: miraconvert %s" % mira_convert_ver
+ sys.exit(0)
+
+if not manifest:
+ sys_exit("Manifest is required")
+elif not os.path.isfile(manifest):
+ sys_exit("Missing input MIRA manifest file: %r" % manifest)
+
+
+try:
+ threads = int(os.environ.get("GALAXY_SLOTS", "1"))
+except ValueError:
+ threads = 1
+assert 1 <= threads, threads
+
+
+def override_temp(manifest):
+ """Override ``-DI:trt=/tmp`` in manifest with environment variable.
+
+ Currently MIRA 4 does not allow envronment variables like ``$TMP``
+ inside the manifest, which is a problem if you need to override
+ the default at run time.
+
+ The tool XML will ``/tmp`` and we replace that here with
+ ``tempfile.gettempdir()`` which will respect $TMPDIR, $TEMP, $TMP
+ as explained in the Python standard library documentation:
+ http://docs.python.org/2/library/tempfile.html#tempfile.tempdir
+
+ By default MIRA 4 would write its temporary files within the output
+ folder, which is a problem if that is a network drive.
+ """
+ handle = open(manifest, "r")
+ text = handle.read()
+ handle.close()
+
+ #At time of writing, this is at the end of a file,
+ #but could be followed by a space in future...
+ text = text.replace("-DI:trt=/tmp", "-DI:trt=" + tempfile.gettempdir())
+
+ #Want to try to ensure this gets written to disk before MIRA attempts
+ #to open it - any networked file system may impose a delay...
+ handle = open(manifest, "w")
+ handle.write(text)
+ handle.flush()
+ os.fsync(handle.fileno())
+ handle.close()
+
+
+def log_manifest(manifest):
+ """Write the manifest file to stderr."""
+ sys.stderr.write("\n%s\nManifest file\n%s\n" % ("="*60, "="*60))
+ with open(manifest) as h:
+ for line in h:
+ sys.stderr.write(line)
+ sys.stderr.write("\n%s\nEnd of manifest\n%s\n" % ("="*60, "="*60))
+
+
+def collect_output(temp, name, handle):
+ """Moves files to the output filenames (global variables)."""
+ n3 = (temp, name, name, name)
+ f = "%s/%s_assembly/%s_d_results" % (temp, name, name)
+ if not os.path.isdir(f):
+ log_manifest(manifest)
+ sys_exit("Missing output folder")
+ if not os.listdir(f):
+ log_manifest(manifest)
+ sys_exit("Empty output folder")
+ missing = []
+
+ old_maf = "%s/%s_out.maf" % (f, name)
+ if not os.path.isfile(old_maf):
+ #Triggered extractLargeContigs.sh?
+ old_maf = "%s/%s_LargeContigs_out.maf" % (f, name)
+
+ #De novo or single strain mapping,
+ old_fasta = "%s/%s_out.unpadded.fasta" % (f, name)
+ ref_fasta = "%s/%s_out.padded.fasta" % (f, name)
+ if not os.path.isfile(old_fasta):
+ #Mapping (StrainX versus reference) or de novo
+ old_fasta = "%s/%s_out_StrainX.unpadded.fasta" % (f, name)
+ ref_fasta = "%s/%s_out_StrainX.padded.fasta" % (f, name)
+ if not os.path.isfile(old_fasta):
+ old_fasta = "%s/%s_out_ReferenceStrain.unpadded.fasta" % (f, name)
+ ref_fasta = "%s/%s_out_ReferenceStrain.padded.fasta" % (f, name)
+
+
+ missing = False
+ for old, new in [(old_maf, out_maf),
+ (old_fasta, out_fasta)]:
+ if not os.path.isfile(old):
+ missing = True
+ elif not new or new == "-":
+ handle.write("Ignoring %s\n" % old)
+ else:
+ handle.write("Capturing %s\n" % old)
+ shutil.move(old, new)
+ if missing:
+ log_manifest(manifest)
+ sys.stderr.write("Contents of %r:\n" % f)
+ for filename in sorted(os.listdir(f)):
+ sys.stderr.write("%s\n" % filename)
+
+ #For mapping mode, probably most people would expect a BAM file
+ #using the reference FASTA file...
+ if out_bam and out_bam != "-":
+ if out_maf and out_maf != "-":
+ msg = make_bam(mira_convert, out_maf, ref_fasta, out_bam, handle)
+ else:
+ #Not collecting the MAF file, use original location
+ msg = make_bam(mira_convert, old_maf, ref_fasta, out_bam, handle)
+ if msg:
+ sys_exit(msg)
+
+def clean_up(temp, name):
+ folder = "%s/%s_assembly" % (temp, name)
+ if os.path.isdir(folder):
+ shutil.rmtree(folder)
+
+#TODO - Run MIRA in /tmp or a configurable directory?
+#Currently Galaxy puts us somewhere safe like:
+#/opt/galaxy-dist/database/job_working_directory/846/
+temp = "."
+
+name = "MIRA"
+
+override_temp(manifest)
+
+start_time = time.time()
+cmd_list = [mira_binary, "-t", str(threads), manifest]
+cmd = " ".join(cmd_list)
+
+assert os.path.isdir(temp)
+d = "%s_assembly" % name
+#This can fail on my development machine if stale folders exist
+#under Galaxy's .../database/job_working_directory/ tree:
+assert not os.path.isdir(d), "Path %r already exists:\n%s" % (d, os.path.abspath(d))
+try:
+ #Check path access
+ os.mkdir(d)
+except Exception, err:
+ log_manifest(manifest)
+ sys.stderr.write("Error making directory %s\n%s" % (d, err))
+ sys.exit(1)
+
+#print os.path.abspath(".")
+#print cmd
+
+if out_log and out_log != "-":
+ handle = open(out_log, "w")
+else:
+ handle = open(os.devnull, "w")
+handle.write("======================== MIRA manifest (instructions) ========================\n")
+m = open(manifest, "rU")
+for line in m:
+ handle.write(line)
+m.close()
+del m
+handle.write("\n")
+handle.write("============================ Starting MIRA now ===============================\n")
+handle.flush()
+try:
+ #Run MIRA
+ child = subprocess.Popen(cmd_list,
+ stdout=handle,
+ stderr=subprocess.STDOUT)
+except Exception, err:
+ log_manifest(manifest)
+ sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
+ #TODO - call clean up?
+ handle.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
+ handle.close()
+ sys.exit(1)
+#Use .communicate as can get deadlocks with .wait(),
+stdout, stderr = child.communicate()
+assert not stdout and not stderr #Should be empty as sent to handle
+run_time = time.time() - start_time
+return_code = child.returncode
+handle.write("\n")
+handle.write("============================ MIRA has finished ===============================\n")
+handle.write("MIRA took %0.2f hours\n" % (run_time / 3600.0))
+if return_code:
+ print "MIRA took %0.2f hours" % (run_time / 3600.0)
+ handle.write("Return error code %i from command:\n" % return_code)
+ handle.write(cmd + "\n")
+ handle.close()
+ clean_up(temp, name)
+ log_manifest(manifest)
+ sys_exit("Return error code %i from command:\n%s" % (return_code, cmd),
+ return_code)
+handle.flush()
+
+if os.path.isfile("MIRA_assembly/MIRA_d_results/ec.log"):
+ handle.write("\n")
+ handle.write("====================== Extract Large Contigs failed ==========================\n")
+ e = open("MIRA_assembly/MIRA_d_results/ec.log", "rU")
+ for line in e:
+ handle.write(line)
+ e.close()
+ handle.write("============================ (end of ec.log) =================================\n")
+ handle.flush()
+
+#print "Collecting output..."
+start_time = time.time()
+collect_output(temp, name, handle)
+collect_time = time.time() - start_time
+handle.write("MIRA took %0.2f hours; collecting output %0.2f minutes\n" % (run_time / 3600.0, collect_time / 60.0))
+print("MIRA took %0.2f hours; collecting output %0.2f minutes\n" % (run_time / 3600.0, collect_time / 60.0))
+
+if os.path.isfile("MIRA_assembly/MIRA_d_results/ec.log"):
+ #Treat as an error, but doing this AFTER collect_output
+ sys.stderr.write("Extract Large Contigs failed\n")
+ handle.write("Extract Large Contigs failed\n")
+ handle.close()
+ sys.exit(1)
+
+#print "Cleaning up..."
+clean_up(temp, name)
+
+handle.write("\nDone\n")
+handle.close()
+print("Done")
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4_assembler/mira4_bait.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_assembler/mira4_bait.py Wed May 13 05:55:43 2015 -0400
@@ -0,0 +1,114 @@
+#!/usr/bin/env python
+"""A simple wrapper script to call MIRA4's mirabait and collect its output.
+"""
+import os
+import sys
+import subprocess
+import shutil
+import time
+
+WRAPPER_VER = "0.0.5" #Keep in sync with the XML file
+
+def sys_exit(msg, err=1):
+ sys.stderr.write(msg+"\n")
+ sys.exit(err)
+
+
+def get_version(mira_binary):
+ """Run MIRA to find its version number"""
+ # At the commend line I would use: mira -v | head -n 1
+ # however there is some pipe error when doing that here.
+ cmd = [mira_binary, "-v"]
+ try:
+ child = subprocess.Popen(cmd,
+ stdout=subprocess.PIPE,
+ stderr=subprocess.STDOUT)
+ except Exception, err:
+ sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
+ sys.exit(1)
+ ver, tmp = child.communicate()
+ del child
+ #Workaround for -v not working in mirabait 4.0RC4
+ if "invalid option" in ver.split("\n", 1)[0]:
+ for line in ver.split("\n", 1):
+ if " version " in line:
+ line = line.split()
+ return line[line.index("version")+1].rstrip(")")
+ sys_exit("Could not determine MIRA version:\n%s" % ver)
+ return ver.split("\n", 1)[0]
+
+try:
+ mira_path = os.environ["MIRA4"]
+except KeyError:
+ sys_exit("Environment variable $MIRA4 not set")
+mira_binary = os.path.join(mira_path, "mirabait")
+if not os.path.isfile(mira_binary):
+ sys_exit("Missing mirabait under $MIRA4, %r\nFolder contained: %s"
+ % (mira_binary, ", ".join(os.listdir(mira_path))))
+mira_ver = get_version(mira_binary)
+if not mira_ver.strip().startswith("4.0"):
+ sys_exit("This wrapper is for MIRA V4.0, not:\n%s" % mira_ver)
+if "-v" in sys.argv or "--version" in sys.argv:
+ print "%s, MIRA wrapper version %s" % (mira_ver, WRAPPER_VER)
+ sys.exit(0)
+
+
+format, output_choice, strand_choice, kmer_length, min_occurance, bait_file, in_file, out_file = sys.argv[1:]
+
+if format.startswith("fastq"):
+ format = "fastq"
+elif format == "mira":
+ format = "maf"
+elif format != "fasta":
+ sys_exit("Was not expected format %r" % format)
+
+assert out_file.endswith(".dat")
+out_file_stem = out_file[:-4]
+
+cmd_list = [mira_binary, "-f", format, "-t", format,
+ "-k", kmer_length, "-n", min_occurance,
+ bait_file, in_file, out_file_stem]
+if output_choice == "pos":
+ pass
+elif output_choice == "neg":
+ #Invert the selection...
+ cmd_list.insert(1, "-i")
+else:
+ sys_exit("Output choice should be 'pos' or 'neg', not %r" % output_choice)
+if strand_choice == "both":
+ pass
+elif strand_choice == "fwd":
+ #Ingore reverse strand...
+ cmd_list.insert(1, "-r")
+else:
+ sys_exit("Strand choice should be 'both' or 'fwd', not %r" % strand_choice)
+
+cmd = " ".join(cmd_list)
+#print cmd
+start_time = time.time()
+try:
+ #Run MIRA
+ child = subprocess.Popen(cmd_list,
+ stdout=subprocess.PIPE,
+ stderr=subprocess.STDOUT)
+except Exception, err:
+ sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
+ sys.exit(1)
+#Use .communicate as can get deadlocks with .wait(),
+stdout, stderr = child.communicate()
+assert stderr is None # Due to way we ran with subprocess
+run_time = time.time() - start_time
+return_code = child.returncode
+print "mirabait took %0.2f minutes" % (run_time / 60.0)
+
+if return_code:
+ sys.stderr.write(stdout)
+ sys_exit("Return error code %i from command:\n%s" % (return_code, cmd),
+ return_code)
+
+#Capture output
+out_tmp = out_file_stem + "." + format
+if not os.path.isfile(out_tmp):
+ sys.stderr.write(stdout)
+ sys_exit("Missing output file from mirabait: %s" % out_tmp)
+shutil.move(out_tmp, out_file)
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4_assembler/mira4_convert.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_assembler/mira4_convert.py Wed May 13 05:55:43 2015 -0400
@@ -0,0 +1,226 @@
+#!/usr/bin/env python
+"""A simple wrapper script to call MIRA and collect its output.
+
+This focuses on the miraconvert binary.
+"""
+import os
+import sys
+import subprocess
+import shutil
+import time
+import tempfile
+from optparse import OptionParser
+try:
+ from io import BytesIO
+except ImportError:
+ #Should we worry about Python 2.5 or older?
+ from StringIO import StringIO as BytesIO
+
+#Do we need any PYTHONPATH magic?
+from mira4_make_bam import depad
+
+WRAPPER_VER = "0.0.7" # Keep in sync with the XML file
+
+def sys_exit(msg, err=1):
+ sys.stderr.write(msg+"\n")
+ sys.exit(err)
+
+def run(cmd):
+ #Avoid using shell=True when we call subprocess to ensure if the Python
+ #script is killed, so too is the child process.
+ try:
+ child = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
+ except Exception, err:
+ sys_exit("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
+ #Use .communicate as can get deadlocks with .wait(),
+ stdout, stderr = child.communicate()
+ return_code = child.returncode
+ if return_code:
+ cmd_str = " ".join(cmd) # doesn't quote spaces etc
+ if stderr and stdout:
+ sys_exit("Return code %i from command:\n%s\n\n%s\n\n%s" % (return_code, cmd_str, stdout, stderr))
+ else:
+ sys_exit("Return code %i from command:\n%s\n%s" % (return_code, cmd_str, stderr))
+
+def get_version(mira_binary):
+ """Run MIRA to find its version number"""
+ # At the commend line I would use: mira -v | head -n 1
+ # however there is some pipe error when doing that here.
+ cmd = [mira_binary, "-v"]
+ try:
+ child = subprocess.Popen(cmd,
+ stdout=subprocess.PIPE,
+ stderr=subprocess.STDOUT)
+ except Exception, err:
+ sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (" ".join(cmd), err))
+ sys.exit(1)
+ ver, tmp = child.communicate()
+ del child
+ return ver.split("\n", 1)[0].strip()
+
+#Parse Command Line
+usage = """Galaxy MIRA4 wrapper script v%s - use as follows:
+
+$ python mira4_convert.py ...
+
+This will run the MIRA miraconvert binary and collect its output files as directed.
+""" % WRAPPER_VER
+parser = OptionParser(usage=usage)
+parser.add_option("--input", dest="input",
+ default=None, metavar="FILE",
+ help="MIRA input filename")
+parser.add_option("-x", "--min_length", dest="min_length",
+ default="0",
+ help="Minimum contig length")
+parser.add_option("-y", "--min_cover", dest="min_cover",
+ default="0",
+ help="Minimum average contig coverage")
+parser.add_option("-z", "--min_reads", dest="min_reads",
+ default="0",
+ help="Minimum reads per contig")
+parser.add_option("--maf", dest="maf",
+ default="", metavar="FILE",
+ help="MIRA MAF output filename")
+parser.add_option("--ace", dest="ace",
+ default="", metavar="FILE",
+ help="ACE output filename")
+parser.add_option("--bam", dest="bam",
+ default="", metavar="FILE",
+ help="Unpadded BAM output filename")
+parser.add_option("--fasta", dest="fasta",
+ default="", metavar="FILE",
+ help="Unpadded FASTA output filename")
+parser.add_option("--cstats", dest="cstats",
+ default="", metavar="FILE",
+ help="Contig statistics filename")
+parser.add_option("-v", "--version", dest="version",
+ default=False, action="store_true",
+ help="Show version and quit")
+options, args = parser.parse_args()
+if args:
+ sys_exit("Expected options (e.g. --input example.maf), not arguments")
+
+input_maf = options.input
+out_maf = options.maf
+out_bam = options.bam
+out_fasta = options.fasta
+out_ace = options.ace
+out_cstats = options.cstats
+
+try:
+ mira_path = os.environ["MIRA4"]
+except KeyError:
+ sys_exit("Environment variable $MIRA4 not set")
+mira_convert = os.path.join(mira_path, "miraconvert")
+if not os.path.isfile(mira_convert):
+ sys_exit("Missing miraconvert under $MIRA4, %r\nFolder contained: %s"
+ % (mira_convert, ", ".join(os.listdir(mira_path))))
+
+mira_convert_ver = get_version(mira_convert)
+if not mira_convert_ver.strip().startswith("4.0"):
+ sys_exit("This wrapper is for MIRA V4.0, not:\n%s\n%s" % (mira_convert_ver, mira_convert))
+if options.version:
+ print("%s, MIRA wrapper version %s" % (mira_convert_ver, WRAPPER_VER))
+ sys.exit(0)
+
+if not input_maf:
+ sys_exit("Input MIRA file is required")
+elif not os.path.isfile(input_maf):
+ sys_exit("Missing input MIRA file: %r" % input_maf)
+
+if not (out_maf or out_bam or out_fasta or out_ace or out_cstats):
+ sys_exit("No output requested")
+
+
+def check_min_int(value, name):
+ try:
+ i = int(value)
+ except:
+ sys_exit("Bad %s setting, %r" % (name, value))
+ if i < 0:
+ sys_exit("Negative %s setting, %r" % (name, value))
+ return i
+
+min_length = check_min_int(options.min_length, "minimum length")
+min_cover = check_min_int(options.min_cover, "minimum cover")
+min_reads = check_min_int(options.min_reads, "minimum reads")
+
+#TODO - Run MIRA in /tmp or a configurable directory?
+#Currently Galaxy puts us somewhere safe like:
+#/opt/galaxy-dist/database/job_working_directory/846/
+temp = "."
+
+
+cmd_list = [mira_convert]
+if min_length:
+ cmd_list.extend(["-x", str(min_length)])
+if min_cover:
+ cmd_list.extend(["-y", str(min_cover)])
+if min_reads:
+ cmd_list.extend(["-z", str(min_reads)])
+cmd_list.extend(["-f", "maf", input_maf, os.path.join(temp, "converted")])
+if out_maf:
+ cmd_list.append("maf")
+if out_bam:
+ cmd_list.append("samnbb")
+ if not out_fasta:
+ #Need this for samtools depad
+ out_fasta = os.path.join(temp, "depadded.fasta")
+if out_fasta:
+ cmd_list.append("fasta")
+if out_ace:
+ cmd_list.append("ace")
+if out_cstats:
+ cmd_list.append("cstats")
+run(cmd_list)
+
+def collect(old, new):
+ if not os.path.isfile(old):
+ sys_exit("Missing expected output file %s" % old)
+ shutil.move(old, new)
+
+if out_maf:
+ collect(os.path.join(temp, "converted.maf"), out_maf)
+if out_fasta:
+ #Can we look at the MAF file to see if there are multiple strains?
+ old = os.path.join(temp, "converted_AllStrains.unpadded.fasta")
+ if os.path.isfile(old):
+ collect(old, out_fasta)
+ else:
+ #Might the output be filtered down to zero contigs?
+ old = os.path.join(temp, "converted.fasta")
+ if not os.path.isfile(old):
+ sys_exit("Missing expected output FASTA file")
+ elif os.path.getsize(old) == 0:
+ print("Warning - no contigs (harsh filters?)")
+ collect(old, out_fasta)
+ else:
+ sys_exit("Missing expected output FASTA file (only generic file present)")
+if out_ace:
+ collect(os.path.join(temp, "converted.maf"), out_ace)
+if out_cstats:
+ collect(os.path.join(temp, "converted_info_contigstats.txt"), out_cstats)
+
+if out_bam:
+ assert os.path.isfile(out_fasta)
+ old = os.path.join(temp, "converted.samnbb")
+ if not os.path.isfile(old):
+ old = os.path.join(temp, "converted.sam")
+ if not os.path.isfile(old):
+ sys_exit("Missing expected intermediate file %s" % old)
+ h = BytesIO()
+ msg = depad(out_fasta, old, out_bam, h)
+ if msg:
+ print(msg)
+ print(h.getvalue())
+ h.close()
+ sys.exit(1)
+ h.close()
+ if out_fasta == os.path.join(temp, "depadded.fasta"):
+ #Not asked for by Galaxy, no longer needed
+ os.remove(out_fasta)
+
+if min_length or min_cover or min_reads:
+ print("Filtered.")
+else:
+ print("Converted.")
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4_assembler/mira4_de_novo.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_assembler/mira4_de_novo.xml Wed May 13 05:55:43 2015 -0400
@@ -0,0 +1,275 @@
+
+ Takes Sanger, Roche 454, Solexa/Illumina, Ion Torrent and PacBio reads
+
+ mira
+ miraconvert
+ MIRA
+ samtools
+ samtools
+
+
+
+
+
+
+
+ mira4.py --version
+ mira4.py
+--manifest "$manifest"
+#if str($maf_wanted)=="true":
+--maf "$out_maf"
+#end if
+#if str($bam_wanted)=="true":
+--bam "$out_bam"
+#end if
+--fasta "$out_fasta"
+--log "$out_log"
+
+
+
+project = MIRA
+job = denovo,${job_type},${job_quality}
+parameters = -NW:cmrnl=no -DI:trt=/tmp -OUT:orc=no
+## -GE:not is short for -GENERAL:number_of_threads and using one (1)
+## can be useful for repeatability of assemblies and bug hunting.
+## This is overriden by the command line -t switch which is easier
+## to set from within Galaxy.
+##
+## -NW:cmrnl is short for -NAG_AND_WARN:check_maxreadnamelength
+## and without this MIRA aborts with read names over 40 characters
+## due to limitations of some downstream tools.
+##
+## -DI:trt is short for -DIRECTORY:tmp_redirected_to and should
+## point to a local hard drive (not something like NFS on network).
+## We replace /tmp with an environment variable via mira4.py
+##
+## -OUT:orc=no is short for -OUTPUT:output_result_caf=no
+## which turns off an output file we don't want anyway.
+
+#for $rg in $read_group
+
+##This bar goes into the manifest as a comment line
+#------------------------------------------------------------------------------
+
+readgroup
+technology = ${rg.technology}
+##Record the segment placement (if any)
+#if str($rg.segments.type) == "paired"
+segment_placement = ${rg.segments.placement}
+segment_naming = ${rg.segments.naming}
+#if str($rg.segments.min_size) != "" or str($rg.segments.max_size) != ""
+##If our min/max validation failed I trust MIRA to give an error message...
+template_size = $rg.segments.min_size $rg.segments.max_size
+#end if
+#end if
+##if str($rg.segments.type) == "none"
+##MIRA4 manual says use segment_placement = unknown or ? for unpaired data
+##but this stopped working in MIRA 4.0 RC5 and 4.0 (final). See:
+##http://www.freelists.org/post/mira_talk/Unpaired-reads-and-segment-placement--or-unknown
+##segment_placement = ?
+##end if
+##MIRA will accept multiple filenames on one data line, or multiple data lines
+#for $f in $rg.filenames
+##Must now map Galaxy datatypes to MIRA file types...
+#if $f.ext.startswith("fastq")
+##MIRA doesn't like fastqsanger etc, just plain old fastq:
+data = fastq::$f
+#elif $f.ext == "mira"
+##We're calling *.maf the "mira" format in Galaxy (name space collision)
+data = maf::$f
+#else
+##MIRA is happy with fasta as name,
+data = ${f.ext}::$f
+#end if
+#end for
+#end for
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ bam_wanted is True
+
+
+ maf_wanted is True
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+**What it does**
+
+Runs MIRA v4.0 in de novo mode, collects the output, generates a sorted BAM
+file, and then throws away all the temporary files.
+
+MIRA is an open source assembly tool capable of handling sequence data from
+a range of platforms (Sanger capillary, Solexa/Illumina, Roche 454, Ion Torrent
+and also PacBio).
+
+It is particularly suited to small genomes such as bacteria.
+
+
+**Notes on paired reads**
+
+.. class:: warningmark
+
+MIRA uses read naming conventions to identify paired read partners
+(and does not care about their order in the input files). In most cases,
+the Solexa/Illumina setting is fine. For Sanger capillary sequencing,
+you may need to rename your reads to match one of the standard conventions
+supported by MIRA. For Roche 454 or Ion Torrent the appropriate settings
+depend on how the FASTQ file was produced:
+
+* If using Roche's ``sffinfo`` or older versions of ``sff_extract``
+ to convert SFF files to FASTQ, your reads will probably have the
+ ``---> <---`` orientation and use the ``.f`` and ``.r``
+ suffixes (FR naming).
+
+* If using a recent version of ``sff_extract``, then the ``/1`` and ``/2``
+ suffixes are used (Solexa/Illumina style naming) and the original
+ ``2---> 1--->`` orientation is preserved.
+
+The reason for this is the raw data for Roche 454 and Ion Torrent paired-end
+libraries sequences a circularised fragment such that the raw data begins
+with the end of the fragment, a linker, then the start of the fragment.
+This means both the start and end are sequenced from the same strand, and
+have the orientation ``2---> 1--->``. However, in order to use the data
+with traditional tools expecting Sanger capillary style ``---> <---``
+orientation it was common to reverse complement one of the pair to mimic this.
+
+
+**Citation**
+
+If you use this Galaxy tool in work leading to a scientific publication please
+cite the following papers:
+
+Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013).
+Galaxy tools and workflows for sequence analysis with applications
+in molecular plant pathology. PeerJ 1:e167
+http://dx.doi.org/10.7717/peerj.167
+
+Bastien Chevreux, Thomas Wetter and Sándor Suhai (1999).
+Genome Sequence Assembly Using Trace Signals and Additional Sequence Information.
+Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56.
+http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler
+
+
+ 10.7717/peerj.167
+ @ARTICLE{Chevreux1999-mira3,
+ author = {B. Chevreux and T. Wetter and S. Suhai},
+ year = {1999},
+ title = {Genome Sequence Assembly Using Trace Signals and Additional Sequence Information},
+ journal = {Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB)}
+ volume = {99},
+ pages = {45-56},
+ url = {http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html}
+ }
+
+
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4_assembler/mira4_make_bam.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_assembler/mira4_make_bam.py Wed May 13 05:55:43 2015 -0400
@@ -0,0 +1,92 @@
+#!/usr/bin/env python
+"""Wrapper script using miraconvert & samtools to get BAM from MIRA.
+"""
+import os
+import sys
+import shutil
+import subprocess
+import tempfile
+
+def sys_exit(msg, err=1):
+ sys.stderr.write(msg+"\n")
+ sys.exit(err)
+
+def run(cmd, log_handle):
+ try:
+ child = subprocess.Popen(cmd, shell=True,
+ stdout=subprocess.PIPE,
+ stderr=subprocess.STDOUT)
+ except Exception, err:
+ sys.stderr.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
+ #TODO - call clean up?
+ log_handle.write("Error invoking command:\n%s\n\n%s\n" % (cmd, err))
+ sys.exit(1)
+ #Use .communicate as can get deadlocks with .wait(),
+ stdout, stderr = child.communicate()
+ assert not stderr #Should be empty as sent to stdout
+ if len(stdout) > 10000:
+ #miraconvert can be very verbose (is holding stdout in RAM a problem?)
+ stdout = stdout.split("\n")
+ stdout = stdout[:10] + ["...", "", "..."] + stdout[-10:]
+ stdout = "\n".join(stdout)
+ log_handle.write(stdout)
+ return child.returncode
+
+def depad(fasta_file, sam_file, bam_file, log_handle):
+ log_handle.write("\n================= Converting MIRA assembly from SAM to BAM ===================\n")
+ #Also doing SAM to (uncompressed) BAM during depad
+ bam_stem = bam_file + ".tmp" # Have write permissions and want final file in this folder
+ cmd = 'samtools depad -S -u -T "%s" "%s" | samtools sort - "%s"' % (fasta_file, sam_file, bam_stem)
+ return_code = run(cmd, log_handle)
+ if return_code:
+ return "Error %i from command:\n%s" % (return_code, cmd)
+ if not os.path.isfile(bam_stem + ".bam"):
+ return "samtools depad or sort failed to produce BAM file"
+
+ log_handle.write("\n====================== Indexing MIRA assembly BAM file =======================\n")
+ cmd = 'samtools index "%s.bam"' % bam_stem
+ return_code = run(cmd, log_handle)
+ if return_code:
+ return "Error %i from command:\n%s" % (return_code, cmd)
+ if not os.path.isfile(bam_stem + ".bam.bai"):
+ return "samtools indexing of BAM file failed to produce BAI file"
+
+ shutil.move(bam_stem + ".bam", bam_file)
+ os.remove(bam_stem + ".bam.bai") #Let Galaxy handle that...
+
+
+def make_bam(mira_convert, maf_file, fasta_file, bam_file, log_handle):
+ if not os.path.isfile(mira_convert):
+ return "Missing binary %r" % mira_convert
+ if not os.path.isfile(maf_file):
+ return "Missing input MIRA file: %r" % maf_file
+ if not os.path.isfile(fasta_file):
+ return "Missing padded FASTA file: %r" % fasta_file
+
+ log_handle.write("\n====================== Converting MIRA assembly to SAM =======================\n")
+ tmp_dir = tempfile.mkdtemp()
+ sam_file = os.path.join(tmp_dir, "x.sam")
+
+ # Note add nbb to the template name, possible MIRA 4.0 RC4 bug
+ cmd = '"%s" -f maf -t samnbb "%s" "%snbb"' % (mira_convert, maf_file, sam_file)
+ return_code = run(cmd, log_handle)
+ if return_code:
+ return "Error %i from command:\n%s" % (return_code, cmd)
+ if not os.path.isfile(sam_file):
+ return "Conversion from MIRA to SAM failed"
+
+ #Also doing SAM to (uncompressed) BAM during depad
+ msg = depad(fasta_file, sam_file, bam_file, log_handle)
+ if msg:
+ return msg
+
+ os.remove(sam_file)
+ os.rmdir(tmp_dir)
+
+ return None #Good :)
+
+if __name__ == "__main__":
+ mira_convert, maf_file, fasta_file, bam_file = sys.argv[1:]
+ msg = make_bam(mira_convert, maf_file, fasta_file, bam_file, sys.stdout)
+ if msg:
+ sys_exit(msg)
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4_assembler/mira4_mapping.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_assembler/mira4_mapping.xml Wed May 13 05:55:43 2015 -0400
@@ -0,0 +1,279 @@
+
+ Maps Sanger, Roche 454, Solexa/Illumina, Ion Torrent and PacBio reads
+
+ mira
+ miraconvert
+ MIRA
+ samtools
+ samtools
+
+
+
+
+
+
+ mira4.py --version
+ mira4.py
+--manifest "$manifest"
+#if str($maf_wanted) == "true":
+--maf "$out_maf"
+#end if
+#if str($bam_wanted) == "true":
+--bam "$out_bam"
+#end if
+--fasta "$out_fasta"
+--log "$out_log"
+
+
+
+project = MIRA
+job = mapping,${job_type},${job_quality}
+parameters = -NW:cmrnl=no -DI:trt=/tmp -OUT:orc=no
+## -GE:not is short for -GENERAL:number_of_threads and using one (1)
+## can be useful for repeatability of assemblies and bug hunting.
+## This is overriden by the command line -t switch which is easier
+## to set from within Galaxy.
+##
+## -NW:cmrnl is short for -NAG_AND_WARN:check_maxreadnamelength
+## and without this MIRA aborts with read names over 40 characters
+## due to limitations of some downstream tools.
+##
+## -DI:trt is short for -DIRECTORY:tmp_redirected_to and should
+## point to a local hard drive (not something like NFS on network).
+## We replace /tmp with an environment variable via mira4.py
+##
+## -OUT:orc=no is short for -OUTPUT:output_result_caf=no
+## which turns off an output file we don't want anyway.
+
+##This bar goes into the manifest as a comment line
+#------------------------------------------------------------------------------
+
+readgroup
+is_reference
+#if str($strain_setup)=="same"
+strain = StrainX
+#end if
+#for $f in $references
+##Must now map Galaxy datatypes to MIRA file types...
+#if $f.ext.startswith("fastq")
+##MIRA doesn't like fastqsanger etc, just plain old fastq:
+data = fastq::$f
+#elif $f.ext == "mira"
+##We're calling *.maf the "mira" format in Galaxy (name space collision)
+data = maf::$f
+#elif $f.ext == "fasta"
+##We're calling MIRA with the file type as "fna" as otherwise it wants quals
+data = fna::$f
+#else
+##Currently don't expect anything else...
+data = ${f.ext}::$f
+#end if
+#end for
+#for $rg in $read_group
+
+##This bar goes into the manifest as a comment line
+#------------------------------------------------------------------------------
+
+readgroup
+technology = ${rg.technology}
+#if str($strain_setup)=="same"
+##This is perhaps redundant as MIRA defaults to StrainX for the reads:
+strain = StrainX
+#end if
+##Record the segment placement (if any)
+#if str($rg.segments.type) == "paired"
+segment_placement = ${rg.segments.placement}
+segment_naming = ${rg.segments.naming}
+#end if
+##if str($rg.segments.type) == "none"
+##MIRA4 manual says use segment_placement = unknown or ? for unpaired data
+##but this stopped working in MIRA 4.0 RC5 and 4.0 (final). See:
+##http://www.freelists.org/post/mira_talk/Unpaired-reads-and-segment-placement--or-unknown
+##segment_placement = ?
+##end if
+##MIRA will accept multiple filenames on one data line, or multiple data lines
+#for $f in $rg.filenames
+##Must now map Galaxy datatypes to MIRA file types...
+#if $f.ext.startswith("fastq")
+##MIRA doesn't like fastqsanger etc, just plain old fastq:
+data = fastq::$f
+#elif $f.ext == "mira"
+##We're calling *.maf the "mira" format in Galaxy (name space collision)
+data = maf::$f
+#else
+##Currently don't expect anything else...
+data = ${f.ext}::$f
+#end if
+#end for
+#end for
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ bam_wanted is True
+
+
+ maf_wanted is True
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+**What it does**
+
+Runs MIRA v4.0 in mapping mode, collects the output, generates a sorted BAM
+file, and throws away all the temporary files.
+
+MIRA is an open source assembly tool capable of handling sequence data from
+a range of platforms (Sanger capillary, Solexa/Illumina, Roche 454, Ion Torrent
+and also PacBio).
+
+It is particularly suited to small genomes such as bacteria.
+
+
+**Notes on paired reads**
+
+.. class:: warningmark
+
+MIRA uses read naming conventions to identify paired read partners
+(and does not care about their order in the input files). In most cases,
+the Solexa/Illumina setting is fine. For Sanger capillary sequencing,
+you may need to rename your reads to match one of the standard conventions
+supported by MIRA. For Roche 454 or Ion Torrent the appropriate settings
+depend on how the FASTQ file was produced:
+
+* If using Roche's ``sffinfo`` or older versions of ``sff_extract``
+ to convert SFF files to FASTQ, your reads will probably have the
+ ``---> <---`` orientation and use the ``.f`` and ``.r``
+ suffixes (FR naming).
+
+* If using a recent version of ``sff_extract``, then the ``/1`` and ``/2``
+ suffixes are used (Solexa/Illumina style naming) and the original
+ ``2---> 1--->`` orientation is preserved.
+
+The reason for this is the raw data for Roche 454 and Ion Torrent paired-end
+libraries sequences a circularised fragment such that the raw data begins
+with the end of the fragment, a linker, then the start of the fragment.
+This means both the start and end are sequenced from the same strand, and
+have the orientation ``2---> 1--->``. However, in order to use the data
+with traditional tools expecting Sanger capillary style ``---> <---``
+orientation it was common to reverse complement one of the pair to mimic this.
+
+
+**Citation**
+
+If you use this Galaxy tool in work leading to a scientific publication please
+cite the following papers:
+
+Peter J.A. Cock, Björn A. Grüning, Konrad Paszkiewicz and Leighton Pritchard (2013).
+Galaxy tools and workflows for sequence analysis with applications
+in molecular plant pathology. PeerJ 1:e167
+http://dx.doi.org/10.7717/peerj.167
+
+Bastien Chevreux, Thomas Wetter and Sándor Suhai (1999).
+Genome Sequence Assembly Using Trace Signals and Additional Sequence Information.
+Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) 99, pp. 45-56.
+http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html
+
+This wrapper is available to install into other Galaxy Instances via the Galaxy
+Tool Shed at http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler
+
+
+ 10.7717/peerj.167
+ @ARTICLE{Chevreux1999-mira3,
+ author = {B. Chevreux and T. Wetter and S. Suhai},
+ year = {1999},
+ title = {Genome Sequence Assembly Using Trace Signals and Additional Sequence Information},
+ journal = {Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB)}
+ volume = {99},
+ pages = {45-56},
+ url = {http://www.bioinfo.de/isb/gcb99/talks/chevreux/main.html}
+ }
+
+
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4_assembler/mira4_validator.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_assembler/mira4_validator.py Wed May 13 05:55:43 2015 -0400
@@ -0,0 +1,64 @@
+#Called from the Galaxy Tool XML file
+#import sys
+
+def validate_input(trans, error_map, param_values, page_param_map):
+ """Validates the min_size/max_size user input, before execution."""
+ err_list = []
+ for read_group in param_values["read_group"]:
+ err = dict()
+ segments = read_group["segments"]
+ if str(segments["type"]) != "paired":
+ err_list.append(dict())
+ continue
+
+ min_size = str(segments["min_size"]).strip()
+ max_size = str(segments["max_size"]).strip()
+ #sys.stderr.write("DEBUG min_size=%r, max_size=%r\n" % (min_size, max_size))
+
+ #Somehow Galaxy seems to turn an empty field into string "None"...
+ if min_size=="None":
+ min_size = ""
+ if max_size=="None":
+ max_size = ""
+
+ if min_size=="" and max_size=="":
+ #Both missing is good
+ pass
+ elif min_size=="":
+ err["min_size"] = "Minimum size required if maximum size given"
+ elif max_size=="":
+ err["max_size"] = "Maximum size required if minimum size given"
+
+ if min_size:
+ try:
+ min_size_int = int(min_size)
+ if min_size_int < 0:
+ err["min_size"] = "Minumum size must not be negative (%i)" % min_size_int
+ min_size = None # Avoid doing comparison below
+ except ValueError:
+ err["min_size"] = "Minimum size is not an integer (%s)" % min_size
+ min_size = None # Avoid doing comparison below
+
+ if max_size:
+ try:
+ max_size_int = int(max_size)
+ if max_size_int< 0:
+ err["max_size"] = "Maximum size must not be negative (%i)" % max_size_int
+ max_size = None # Avoid doing comparison below
+ except ValueError:
+ err["max_size"] = "Maximum size is not an integer (%s)" % max_size
+ max_size = None # Avoid doing comparison below
+
+ if min_size and max_size and min_size_int > max_size_int:
+ msg = "Minimum size must be less than maximum size (%i vs %i)" % (min_size_int, max_size_int)
+ err["min_size"] = msg
+ err["max_size"] = msg
+
+ if err:
+ err_list.append({"segments":err})
+ else:
+ err_list.append(dict())
+
+ if any(err_list):
+ #Return an error map only if any readgroup gave errors
+ error_map["read_group"] = err_list
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4_assembler/repository_dependencies.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_assembler/repository_dependencies.xml Wed May 13 05:55:43 2015 -0400
@@ -0,0 +1,4 @@
+
+
+
+
diff -r 4abe8d59a438 -r efeff76741b8 tools/mira4_assembler/tool_dependencies.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/mira4_assembler/tool_dependencies.xml Wed May 13 05:55:43 2015 -0400
@@ -0,0 +1,55 @@
+
+
+
+
+
+
+
+
+
+
+ http://downloads.sourceforge.net/project/mira-assembler/MIRA/stable/mira_4.0.2_darwin13.1.0_x86_64_static.tar.bz2
+
+ bin
+ $INSTALL_DIR
+
+
+
+
+ http://downloads.sourceforge.net/project/mira-assembler/MIRA/stable/mira_4.0.2_linux-gnu_x86_64_static.tar.bz2
+
+ bin
+ $INSTALL_DIR
+
+
+
+
+ echo "ERROR: Automated installation on your operating system and CPU architecture combination is not yet supported."
+ echo "Your machine details (the output from 'uname' and 'arch'):"
+ uname
+ arch
+ echo "If pre-compiled MIRA binaries are now available for this, please report this"
+ echo "via https://github.com/peterjc/pico_galaxyt/issues - thank you!"
+ false
+
+
+
+
+ $INSTALL_DIR
+
+
+ $INSTALL_DIR
+
+
+
+
+Downloads MIRA v4.0.2 from Sourceforge, requesting Bastien's precompiled binaries
+for 64 bit (x86_64) Linux or Mac OS X. Other platforms where compilation from
+source would be required (e.g. 32 bit Linux) are not supported by this automated
+installation script.
+
+http://chevreux.org/projects_mira.html
+http://sourceforge.net/projects/mira-assembler/
+
+
+