Mercurial > repos > peterjc > mira4_assembler
changeset 5:ffefb87bd414 draft
Uploaded v0.0.1 preview 5, using MIRA 4.0 RC4, supports segment_placement (pairing type)
author | peterjc |
---|---|
date | Tue, 15 Oct 2013 12:07:34 -0400 |
parents | df86ed992a1b |
children | 626d5cfd01aa |
files | tools/mira4/README.rst tools/mira4/mira4.py tools/mira4/mira4_de_novo.xml tools/mira4/mira4_mapping.xml tools/mira4/tool_dependencies.xml |
diffstat | 5 files changed, 65 insertions(+), 15 deletions(-) [+] |
line wrap: on
line diff
--- a/tools/mira4/README.rst Fri Oct 11 04:28:45 2013 -0400 +++ b/tools/mira4/README.rst Tue Oct 15 12:07:34 2013 -0400 @@ -1,5 +1,5 @@ -Galaxy tool to wrap the MIRA sequence assembly program (v4.0) -============================================================= +Galaxy wrapper for the MIRA assembly program (v4.0) +=================================================== This tool is copyright 2011-2013 by Peter Cock, The James Hutton Institute (formerly SCRI, Scottish Crop Research Institute), UK. All rights reserved. @@ -11,6 +11,11 @@ It is available from the Galaxy Tool Shed at: http://toolshed.g2.bx.psu.edu/view/peterjc/mira4_assembler +It uses a Galaxy datatype definition 'mira' for the MIRA Assembly Format, +http://toolshed.g2.bx.psu.edu/view/peterjc/mira_datatypes + +A separate wrapper for MIRA v3.4 is available from the Galaxy Tool Shed at: +http://toolshed.g2.bx.psu.edu/view/peterjc/mira_assembler Automated Installation ====================== @@ -23,9 +28,7 @@ cluster settings for de novo usage (high RAM) and mapping (lower RAM). Consult the Galaxy adminstration documentation for your cluster setup. -WARNING: This tool was developed to construct viral genome assembly and -mapping pipelines, for which the run time and memory requirements are -negligible. For larger tasks, be aware that MIRA can require vast amounts +WARNING: For larger tasks, be aware that MIRA can require vast amounts of RAM and run-times of over a week are possible. This tool wrapper makes no attempt to spot and reject such large jobs. @@ -50,7 +53,7 @@ <tool file="mira4/mira4_de_novo.xml" /> <tool file="mira4/mira4_mapping.xml" /> -You will also need to install MIRA, we used version 4.0 RC3. See: +You will also need to install MIRA, we used version 4.0 RC4. See: * http://chevreux.org/projects_mira.html * http://sourceforge.net/projects/mira-assembler/ @@ -65,7 +68,7 @@ ======= ====================================================================== Version Changes ------- ---------------------------------------------------------------------- -v0.0.1 - Initial version (prototype for MIRA 4.0 RC3, based on wrapper for v3.4) +v0.0.1 - Initial version (prototype for MIRA 4.0 RC4, based on wrapper for v3.4) ======= ======================================================================
--- a/tools/mira4/mira4.py Fri Oct 11 04:28:45 2013 -0400 +++ b/tools/mira4/mira4.py Tue Oct 15 12:07:34 2013 -0400 @@ -31,14 +31,13 @@ return ver.split("\n", 1)[0] -os.environ["PATH"] = "/mnt/galaxy/downloads/mira_4.0rc3_linux-gnu_x86_64_static/bin/:%s" % os.environ["PATH"] +os.environ["PATH"] = "/mnt/galaxy/downloads/mira_4.0rc4_linux-gnu_x86_64_static/bin/:%s" % os.environ["PATH"] mira_binary = "mira" mira_ver = get_version(mira_binary) if not mira_ver.strip().startswith("4.0"): stop_err("This wrapper is for MIRA V4.0, not:\n%s" % mira_ver) -if "-v" in sys.argv: - print "MIRA wrapper version %s," % WRAPPER_VER - print mira_ver +if "-v" in sys.argv or "--version" in sys.argv: + print "%s, MIRA wrapper version %s" % (mira_ver, WRAPPER_VER) sys.exit(0)
--- a/tools/mira4/mira4_de_novo.xml Fri Oct 11 04:28:45 2013 -0400 +++ b/tools/mira4/mira4_de_novo.xml Tue Oct 15 12:07:34 2013 -0400 @@ -5,7 +5,7 @@ <requirement type="binary">mira</requirement> <requirement type="package" version="4.0">MIRA</requirement> </requirements> - <version_command interpreter="python">mira4.py -v</version_command> + <version_command interpreter="python">mira4.py --version</version_command> <command interpreter="python"> mira4.py $manifest $out_maf $out_fasta $out_log </command> @@ -29,6 +29,13 @@ <option value="text">Synthetic reads (database entries, consensus sequences, artifical reads, etc)</option> <!-- TODO reference/backbone as an entry here? --> </param> + <param name="segment_placement" type="select" label="Pairing type (segment placing)"> + <option value="">None (e.g. single end sequencing)</option> + <option value="FR">---> <--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option> + <option value="RF"><--- ---> (e.g. Solexa/Illumina mate-pair library)</option> + <option value="SB">2---> 1---> (e.g. Roche 454 paired-end libraries or IonTorrent long-mate; see note)</option> + <option value="?">Unknown or not relevant (e.g. primer walking with Sanger capillary sequencing)</option> + </param> <param name="filenames" type="data" format="fastq,mira" multiple="true" required="true" label="Read file(s)" help="Multiple files allowed, for example paired reads can be given as two files (MIRA looks at read names to identify pairs)." /> </repeat> @@ -62,6 +69,10 @@ technology = ${rg.technology} ##MIRA will accept multiple filenames on one data line, or multiple data lines #for $f in $rg.filenames +#if str($rg.segment_placement) != "" +##Record the segment placement (if any) +segmentplacement = ${rg.segment_placement} +#end if ##Must now map Galaxy datatypes to MIRA file types... #if $f.ext.startswith("fastq") ##MIRA doesn't like fastqsanger etc, just plain old fastq: @@ -109,6 +120,19 @@ It is particularly suited to small genomes such as bacteria. +**Notes** + +.. class:: warningmark + +Note that the raw data for Roche 454 and Ion Torrent paired-end libraries +sequences a circularised fragment such that the raw data starts with the +end of the fragment, a linker, then the start of the fragment. This means +both the start and end are sequenced from the same strand, and thus should +be given to MIRA as orientation "2---> 1--->". However, in order to +use this data with traditional tools expecting Sanger capillary style +libraries which expect "---> <---" your FASTQ files may have been +pre-processed to mimic this by reverse complementing one of the pair. + **Citation** If you use this Galaxy tool in work leading to a scientific publication please
--- a/tools/mira4/mira4_mapping.xml Fri Oct 11 04:28:45 2013 -0400 +++ b/tools/mira4/mira4_mapping.xml Tue Oct 15 12:07:34 2013 -0400 @@ -5,7 +5,7 @@ <requirement type="binary">mira</requirement> <requirement type="package" version="4.0">MIRA</requirement> </requirements> - <version_command interpreter="python">mira4.py -v</version_command> + <version_command interpreter="python">mira4.py --version</version_command> <command interpreter="python"> mira4.py $manifest $out_maf $out_fasta $out_log </command> @@ -38,6 +38,13 @@ <option value="pcbiohq">PacBio high quality (corrected)</option> <option value="text">Synthetic reads (database entries, consensus sequences, artifical reads, etc)</option> </param> + <param name="segment_placement" type="select" label="Pairing type (segment placing)"> + <option value="">None (e.g. single end sequencing)</option> + <option value="FR">---> <--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option> + <option value="RF"><--- ---> (e.g. Solexa/Illumina mate-pair library)</option> + <option value="SB">2---> 1---> (e.g. Roche 454 paired-end libraries or IonTorrent long-mate; see note)</option> + <option value="?">Unknown or not relevant (e.g. primer walking with Sanger capillary sequencing)</option> + </param> <param name="filenames" type="data" format="fastq,mira" multiple="true" required="true" label="Read file(s)" help="Multiple files allowed, for example paired reads can be given as two files (MIRA looks at read names to identify pairs)." /> </repeat> @@ -97,6 +104,10 @@ ##This is perhaps redundant as MIRA defaults to StrainX for the reads: strain = StrainX #end if +#if str($rg.segment_placement) != "" +##Record the segment placement (if any) +segmentplacement = ${rg.segment_placement} +#end if ##MIRA will accept multiple filenames on one data line, or multiple data lines #for $f in $rg.filenames ##Must now map Galaxy datatypes to MIRA file types... @@ -149,6 +160,19 @@ It is particularly suited to small genomes such as bacteria. +**Notes** + +.. class:: warningmark + +Note that the raw data for Roche 454 and Ion Torrent paired-end libraries +sequences a circularised fragment such that the raw data starts with the +end of the fragment, a linker, then the start of the fragment. This means +both the start and end are sequenced from the same strand, and thus should +be given to MIRA as orientation "2---> 1--->". However, in order to +use this data with traditional tools expecting Sanger capillary style +libraries which expect "---> <---" your FASTQ files may have been +pre-processed to mimic this by reverse complementing one of the pair. + **Citation** If you use this Galaxy tool in work leading to a scientific publication please
--- a/tools/mira4/tool_dependencies.xml Fri Oct 11 04:28:45 2013 -0400 +++ b/tools/mira4/tool_dependencies.xml Tue Oct 15 12:07:34 2013 -0400 @@ -3,9 +3,9 @@ <package name="MIRA" version="4.0"> <install version="1.0"> <actions> - <action type="download_by_url">https://downloads.sourceforge.net/project/mira-assembler/MIRA/stable/mira_4.0rc3_linux-gnu_x86_64_static.tar.bz2</action> + <action type="download_by_url">https://downloads.sourceforge.net/project/mira-assembler/MIRA/stable/mira_4.0rc4_linux-gnu_x86_64_static.tar.bz2</action> <action type="move_directory_files"> - <source_directory>mira_4.0rc3_linux-gnu_x86_64_static/bin</source_directory> + <source_directory>mira_4.0rc4_linux-gnu_x86_64_static/bin</source_directory> <destination_directory>$INSTALL_DIR</destination_directory> </action> <action type="set_environment">