Mercurial > repos > peterjc > mira4_assembler
comparison tools/mira4/mira4_de_novo.xml @ 6:626d5cfd01aa draft
Uploaded v0.0.1 preview 6, support for fragment length (using mira4_validator.py)
author | peterjc |
---|---|
date | Mon, 21 Oct 2013 12:01:47 -0400 |
parents | ffefb87bd414 |
children | 902f01c1084b |
comparison
equal
deleted
inserted
replaced
5:ffefb87bd414 | 6:626d5cfd01aa |
---|---|
1 <tool id="mira_4_0_de_novo" name="MIRA v4.0 de novo assember" version="0.0.1"> | 1 <tool id="mira_4_0_de_novo" name="MIRA v4.0 de novo assember" version="0.0.1"> |
2 <description>Takes Sanger, Roche 454, Solexa/Illumina, Ion Torrent and PacBio reads</description> | 2 <description>Takes Sanger, Roche 454, Solexa/Illumina, Ion Torrent and PacBio reads</description> |
3 <requirements> | 3 <requirements> |
4 <requirement type="python-module">Bio</requirement> | |
5 <requirement type="binary">mira</requirement> | 4 <requirement type="binary">mira</requirement> |
6 <requirement type="package" version="4.0">MIRA</requirement> | 5 <requirement type="package" version="4.0">MIRA</requirement> |
7 </requirements> | 6 </requirements> |
8 <version_command interpreter="python">mira4.py --version</version_command> | 7 <version_command interpreter="python">mira4.py --version</version_command> |
9 <command interpreter="python"> | 8 <command interpreter="python"> |
27 <option value="pcbiolq">PacBio low quality (raw)</option> | 26 <option value="pcbiolq">PacBio low quality (raw)</option> |
28 <option value="pcbiohq">PacBio high quality (corrected)</option> | 27 <option value="pcbiohq">PacBio high quality (corrected)</option> |
29 <option value="text">Synthetic reads (database entries, consensus sequences, artifical reads, etc)</option> | 28 <option value="text">Synthetic reads (database entries, consensus sequences, artifical reads, etc)</option> |
30 <!-- TODO reference/backbone as an entry here? --> | 29 <!-- TODO reference/backbone as an entry here? --> |
31 </param> | 30 </param> |
32 <param name="segment_placement" type="select" label="Pairing type (segment placing)"> | 31 <conditional name="segments"> |
33 <option value="">None (e.g. single end sequencing)</option> | 32 <param name="type" type="select" label="Are these paired reads?"> |
34 <option value="FR">---> <--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option> | 33 <option value="paired">Paired reads</option> |
35 <option value="RF"><--- ---> (e.g. Solexa/Illumina mate-pair library)</option> | 34 <option value="none">Single reads or not relevant (e.g. primer walking with Sanger capillary sequencing)</option> |
36 <option value="SB">2---> 1---> (e.g. Roche 454 paired-end libraries or IonTorrent long-mate; see note)</option> | 35 </param> |
37 <option value="?">Unknown or not relevant (e.g. primer walking with Sanger capillary sequencing)</option> | 36 <when value="paired"> |
38 </param> | 37 <param name="placement" type="select" label="Pairing type (segment placing)"> |
38 <option value="FR">---> <--- (e.g. Sanger capillary or Solexa/Illumina paired-end library)</option> | |
39 <option value="RF"><--- ---> (e.g. Solexa/Illumina mate-pair library)</option> | |
40 <option value="SB">2---> 1---> (e.g. Roche 454 paired-end libraries or IonTorrent long-mate; see note)</option> | |
41 </param> | |
42 <!-- min/max validation is done via the <code> tag --> | |
43 <param name="min_size" type="integer" optional="true" min="0" value="" | |
44 label="Minimum size of 'good' DNA templates in the library preparation" | |
45 help="Optional, but if used you must also supply a maximum value." /> | |
46 <param name="max_size" type="integer" optional="true" min="0" value="" | |
47 label="Maximum size of 'good' DNA templates in the library preparation" | |
48 help="Optional, but if used you must also supply a minimum value." /> | |
49 <param name="naming" type="select" label="Pair naming convention"> | |
50 <option value="solexa">Solexa/Illumina (using '/1' and '/2' suffixes)</option> | |
51 <option value="FR">Forward/Reverse scheme (using '.f*' and '.r*' suffixes)</option> | |
52 <option value="tigr">TIGR scheme (using 'TF*' and 'TR*' suffixes)</option> | |
53 <option value="sanger">Sanger scheme (see notes)</option> | |
54 <option value="stlouis">St. Louis scheme (see notes)</option> | |
55 </param> | |
56 </when> | |
57 <when value="none" /><!-- no further questions --> | |
58 </conditional> | |
39 <param name="filenames" type="data" format="fastq,mira" multiple="true" required="true" label="Read file(s)" | 59 <param name="filenames" type="data" format="fastq,mira" multiple="true" required="true" label="Read file(s)" |
40 help="Multiple files allowed, for example paired reads can be given as two files (MIRA looks at read names to identify pairs)." /> | 60 help="Multiple files allowed, for example paired reads can be given as two files (MIRA looks at read names to identify pairs)." /> |
41 </repeat> | 61 </repeat> |
42 </inputs> | 62 </inputs> |
63 <code file="mira4_validator.py" /> | |
43 <outputs> | 64 <outputs> |
44 <data name="out_fasta" format="fasta" label="MIRA de novo contigs (FASTA)" /> | 65 <data name="out_fasta" format="fasta" label="MIRA de novo contigs (FASTA)" /> |
45 <data name="out_maf" format="mira" label="MIRA de novo assembly" /> | 66 <data name="out_maf" format="mira" label="MIRA de novo assembly" /> |
46 <data name="out_log" format="txt" label="MIRA de novo log" /> | 67 <data name="out_log" format="txt" label="MIRA de novo log" /> |
47 </outputs> | 68 </outputs> |
65 ##This bar goes into the manifest as a comment line | 86 ##This bar goes into the manifest as a comment line |
66 #------------------------------------------------------------------------------ | 87 #------------------------------------------------------------------------------ |
67 | 88 |
68 readgroup | 89 readgroup |
69 technology = ${rg.technology} | 90 technology = ${rg.technology} |
91 ##Record the segment placement (if any) | |
92 #if str($rg.segments.type) == "paired" | |
93 segmentplacement = ${rg.segments.placement} | |
94 segmentnaming = ${rg.segments.naming} | |
95 #if str($rg.segments.min_size) != "" or str($rg.segments.max_size) != "" | |
96 ##If our min/max validation failed I trust MIRA to give an error message... | |
97 templatesize = $rg.segments.min_size $rg.segments.max_size | |
98 #end if | |
99 #end if | |
100 #if str($rg.segments.type) == "none" | |
101 segmentplacement = ? | |
102 #end if | |
70 ##MIRA will accept multiple filenames on one data line, or multiple data lines | 103 ##MIRA will accept multiple filenames on one data line, or multiple data lines |
71 #for $f in $rg.filenames | 104 #for $f in $rg.filenames |
72 #if str($rg.segment_placement) != "" | |
73 ##Record the segment placement (if any) | |
74 segmentplacement = ${rg.segment_placement} | |
75 #end if | |
76 ##Must now map Galaxy datatypes to MIRA file types... | 105 ##Must now map Galaxy datatypes to MIRA file types... |
77 #if $f.ext.startswith("fastq") | 106 #if $f.ext.startswith("fastq") |
78 ##MIRA doesn't like fastqsanger etc, just plain old fastq: | 107 ##MIRA doesn't like fastqsanger etc, just plain old fastq: |
79 data = fastq::$f | 108 data = fastq::$f |
80 #elif $f.ext == "mira" | 109 #elif $f.ext == "mira" |
118 a range of platforms (Sanger capillary, Solexa/Illumina, Roche 454, Ion Torrent | 147 a range of platforms (Sanger capillary, Solexa/Illumina, Roche 454, Ion Torrent |
119 and also PacBio). | 148 and also PacBio). |
120 | 149 |
121 It is particularly suited to small genomes such as bacteria. | 150 It is particularly suited to small genomes such as bacteria. |
122 | 151 |
123 **Notes** | 152 |
153 **Notes on paired reads** | |
124 | 154 |
125 .. class:: warningmark | 155 .. class:: warningmark |
126 | 156 |
127 Note that the raw data for Roche 454 and Ion Torrent paired-end libraries | 157 MIRA uses read naming conventions to identify paired read partners |
128 sequences a circularised fragment such that the raw data starts with the | 158 (and does not care about their order in the input files). In most cases, |
129 end of the fragment, a linker, then the start of the fragment. This means | 159 the Solexa/Illumina setting is fine. For Sanger capillary sequencing, |
130 both the start and end are sequenced from the same strand, and thus should | 160 you may need to rename your reads to match one of the standard conventions |
131 be given to MIRA as orientation "2---> 1--->". However, in order to | 161 supported by MIRA. For Roche 454 or Ion Torrent the appropriate settings |
132 use this data with traditional tools expecting Sanger capillary style | 162 depend on how the FASTQ file was produced: |
133 libraries which expect "---> <---" your FASTQ files may have been | 163 |
134 pre-processed to mimic this by reverse complementing one of the pair. | 164 * If using Roche's ``sffinfo`` or older versions of ``sff_extract`` |
165 to convert SFF files to FASTQ, your reads will probably have the | |
166 ``---> <---`` orientation and use the ``.f`` and ``.r`` | |
167 suffixes (FR naming). | |
168 | |
169 * If using a recent version of ``sff_extract``, then the ``/1`` and ``/2`` | |
170 suffixes are used (Solexa/Illumina style naming) and the original | |
171 ``2---> 1--->`` orientation is preserved. | |
172 | |
173 The reason for this is the raw data for Roche 454 and Ion Torrent paired-end | |
174 libraries sequences a circularised fragment such that the raw data begins | |
175 with the end of the fragment, a linker, then the start of the fragment. | |
176 This means both the start and end are sequenced from the same strand, and | |
177 have the orientation ``2---> 1--->``. However, in order to use the data | |
178 with traditional tools expecting Sanger capillary style ``---> <---`` | |
179 orientation it was common to reverse complement one of the pair to mimic this. | |
180 | |
135 | 181 |
136 **Citation** | 182 **Citation** |
137 | 183 |
138 If you use this Galaxy tool in work leading to a scientific publication please | 184 If you use this Galaxy tool in work leading to a scientific publication please |
139 cite the following papers: | 185 cite the following papers: |