comparison tools/mira4/mira4_de_novo.xml @ 6:626d5cfd01aa draft

Uploaded v0.0.1 preview 6, support for fragment length (using mira4_validator.py)
author peterjc
date Mon, 21 Oct 2013 12:01:47 -0400
parents ffefb87bd414
children 902f01c1084b
comparison
equal deleted inserted replaced
5:ffefb87bd414 6:626d5cfd01aa
1 <tool id="mira_4_0_de_novo" name="MIRA v4.0 de novo assember" version="0.0.1"> 1 <tool id="mira_4_0_de_novo" name="MIRA v4.0 de novo assember" version="0.0.1">
2 <description>Takes Sanger, Roche 454, Solexa/Illumina, Ion Torrent and PacBio reads</description> 2 <description>Takes Sanger, Roche 454, Solexa/Illumina, Ion Torrent and PacBio reads</description>
3 <requirements> 3 <requirements>
4 <requirement type="python-module">Bio</requirement>
5 <requirement type="binary">mira</requirement> 4 <requirement type="binary">mira</requirement>
6 <requirement type="package" version="4.0">MIRA</requirement> 5 <requirement type="package" version="4.0">MIRA</requirement>
7 </requirements> 6 </requirements>
8 <version_command interpreter="python">mira4.py --version</version_command> 7 <version_command interpreter="python">mira4.py --version</version_command>
9 <command interpreter="python"> 8 <command interpreter="python">
27 <option value="pcbiolq">PacBio low quality (raw)</option> 26 <option value="pcbiolq">PacBio low quality (raw)</option>
28 <option value="pcbiohq">PacBio high quality (corrected)</option> 27 <option value="pcbiohq">PacBio high quality (corrected)</option>
29 <option value="text">Synthetic reads (database entries, consensus sequences, artifical reads, etc)</option> 28 <option value="text">Synthetic reads (database entries, consensus sequences, artifical reads, etc)</option>
30 <!-- TODO reference/backbone as an entry here? --> 29 <!-- TODO reference/backbone as an entry here? -->
31 </param> 30 </param>
32 <param name="segment_placement" type="select" label="Pairing type (segment placing)"> 31 <conditional name="segments">
33 <option value="">None (e.g. single end sequencing)</option> 32 <param name="type" type="select" label="Are these paired reads?">
34 <option value="FR">---&gt; &lt;--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option> 33 <option value="paired">Paired reads</option>
35 <option value="RF">&lt;--- ---&gt; (e.g. Solexa/Illumina mate-pair library)</option> 34 <option value="none">Single reads or not relevant (e.g. primer walking with Sanger capillary sequencing)</option>
36 <option value="SB">2---&gt; 1---&gt; (e.g. Roche 454 paired-end libraries or IonTorrent long-mate; see note)</option> 35 </param>
37 <option value="?">Unknown or not relevant (e.g. primer walking with Sanger capillary sequencing)</option> 36 <when value="paired">
38 </param> 37 <param name="placement" type="select" label="Pairing type (segment placing)">
38 <option value="FR">---&gt; &lt;--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option>
39 <option value="RF">&lt;--- ---&gt; (e.g. Solexa/Illumina mate-pair library)</option>
40 <option value="SB">2---&gt; 1---&gt; (e.g. Roche 454 paired-end libraries or IonTorrent long-mate; see note)</option>
41 </param>
42 <!-- min/max validation is done via the <code> tag -->
43 <param name="min_size" type="integer" optional="true" min="0" value=""
44 label="Minimum size of 'good' DNA templates in the library preparation"
45 help="Optional, but if used you must also supply a maximum value." />
46 <param name="max_size" type="integer" optional="true" min="0" value=""
47 label="Maximum size of 'good' DNA templates in the library preparation"
48 help="Optional, but if used you must also supply a minimum value." />
49 <param name="naming" type="select" label="Pair naming convention">
50 <option value="solexa">Solexa/Illumina (using '/1' and '/2' suffixes)</option>
51 <option value="FR">Forward/Reverse scheme (using '.f*' and '.r*' suffixes)</option>
52 <option value="tigr">TIGR scheme (using 'TF*' and 'TR*' suffixes)</option>
53 <option value="sanger">Sanger scheme (see notes)</option>
54 <option value="stlouis">St. Louis scheme (see notes)</option>
55 </param>
56 </when>
57 <when value="none" /><!-- no further questions -->
58 </conditional>
39 <param name="filenames" type="data" format="fastq,mira" multiple="true" required="true" label="Read file(s)" 59 <param name="filenames" type="data" format="fastq,mira" multiple="true" required="true" label="Read file(s)"
40 help="Multiple files allowed, for example paired reads can be given as two files (MIRA looks at read names to identify pairs)." /> 60 help="Multiple files allowed, for example paired reads can be given as two files (MIRA looks at read names to identify pairs)." />
41 </repeat> 61 </repeat>
42 </inputs> 62 </inputs>
63 <code file="mira4_validator.py" />
43 <outputs> 64 <outputs>
44 <data name="out_fasta" format="fasta" label="MIRA de novo contigs (FASTA)" /> 65 <data name="out_fasta" format="fasta" label="MIRA de novo contigs (FASTA)" />
45 <data name="out_maf" format="mira" label="MIRA de novo assembly" /> 66 <data name="out_maf" format="mira" label="MIRA de novo assembly" />
46 <data name="out_log" format="txt" label="MIRA de novo log" /> 67 <data name="out_log" format="txt" label="MIRA de novo log" />
47 </outputs> 68 </outputs>
65 ##This bar goes into the manifest as a comment line 86 ##This bar goes into the manifest as a comment line
66 #------------------------------------------------------------------------------ 87 #------------------------------------------------------------------------------
67 88
68 readgroup 89 readgroup
69 technology = ${rg.technology} 90 technology = ${rg.technology}
91 ##Record the segment placement (if any)
92 #if str($rg.segments.type) == "paired"
93 segmentplacement = ${rg.segments.placement}
94 segmentnaming = ${rg.segments.naming}
95 #if str($rg.segments.min_size) != "" or str($rg.segments.max_size) != ""
96 ##If our min/max validation failed I trust MIRA to give an error message...
97 templatesize = $rg.segments.min_size $rg.segments.max_size
98 #end if
99 #end if
100 #if str($rg.segments.type) == "none"
101 segmentplacement = ?
102 #end if
70 ##MIRA will accept multiple filenames on one data line, or multiple data lines 103 ##MIRA will accept multiple filenames on one data line, or multiple data lines
71 #for $f in $rg.filenames 104 #for $f in $rg.filenames
72 #if str($rg.segment_placement) != ""
73 ##Record the segment placement (if any)
74 segmentplacement = ${rg.segment_placement}
75 #end if
76 ##Must now map Galaxy datatypes to MIRA file types... 105 ##Must now map Galaxy datatypes to MIRA file types...
77 #if $f.ext.startswith("fastq") 106 #if $f.ext.startswith("fastq")
78 ##MIRA doesn't like fastqsanger etc, just plain old fastq: 107 ##MIRA doesn't like fastqsanger etc, just plain old fastq:
79 data = fastq::$f 108 data = fastq::$f
80 #elif $f.ext == "mira" 109 #elif $f.ext == "mira"
118 a range of platforms (Sanger capillary, Solexa/Illumina, Roche 454, Ion Torrent 147 a range of platforms (Sanger capillary, Solexa/Illumina, Roche 454, Ion Torrent
119 and also PacBio). 148 and also PacBio).
120 149
121 It is particularly suited to small genomes such as bacteria. 150 It is particularly suited to small genomes such as bacteria.
122 151
123 **Notes** 152
153 **Notes on paired reads**
124 154
125 .. class:: warningmark 155 .. class:: warningmark
126 156
127 Note that the raw data for Roche 454 and Ion Torrent paired-end libraries 157 MIRA uses read naming conventions to identify paired read partners
128 sequences a circularised fragment such that the raw data starts with the 158 (and does not care about their order in the input files). In most cases,
129 end of the fragment, a linker, then the start of the fragment. This means 159 the Solexa/Illumina setting is fine. For Sanger capillary sequencing,
130 both the start and end are sequenced from the same strand, and thus should 160 you may need to rename your reads to match one of the standard conventions
131 be given to MIRA as orientation "2---&gt; 1---&gt;". However, in order to 161 supported by MIRA. For Roche 454 or Ion Torrent the appropriate settings
132 use this data with traditional tools expecting Sanger capillary style 162 depend on how the FASTQ file was produced:
133 libraries which expect "---&gt; &lt;---" your FASTQ files may have been 163
134 pre-processed to mimic this by reverse complementing one of the pair. 164 * If using Roche's ``sffinfo`` or older versions of ``sff_extract``
165 to convert SFF files to FASTQ, your reads will probably have the
166 ``---&gt; &lt;---`` orientation and use the ``.f`` and ``.r``
167 suffixes (FR naming).
168
169 * If using a recent version of ``sff_extract``, then the ``/1`` and ``/2``
170 suffixes are used (Solexa/Illumina style naming) and the original
171 ``2---&gt; 1---&gt;`` orientation is preserved.
172
173 The reason for this is the raw data for Roche 454 and Ion Torrent paired-end
174 libraries sequences a circularised fragment such that the raw data begins
175 with the end of the fragment, a linker, then the start of the fragment.
176 This means both the start and end are sequenced from the same strand, and
177 have the orientation ``2---&gt; 1---&gt;``. However, in order to use the data
178 with traditional tools expecting Sanger capillary style ``---&gt; &lt;---``
179 orientation it was common to reverse complement one of the pair to mimic this.
180
135 181
136 **Citation** 182 **Citation**
137 183
138 If you use this Galaxy tool in work leading to a scientific publication please 184 If you use this Galaxy tool in work leading to a scientific publication please
139 cite the following papers: 185 cite the following papers: