annotate demultiplex.xml.v1 @ 0:da4101033e10 draft default tip

planemo upload
author oinizan
date Wed, 18 Oct 2017 05:30:40 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
da4101033e10 planemo upload
oinizan
parents:
diff changeset
1 <?xml version="1.0"?>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
2 <!--
da4101033e10 planemo upload
oinizan
parents:
diff changeset
3 # Copyright (C) 2015 INRA
da4101033e10 planemo upload
oinizan
parents:
diff changeset
4 #
da4101033e10 planemo upload
oinizan
parents:
diff changeset
5 # This program is free software: you can redistribute it and/or modify
da4101033e10 planemo upload
oinizan
parents:
diff changeset
6 # it under the terms of the GNU General Public License as published by
da4101033e10 planemo upload
oinizan
parents:
diff changeset
7 # the Free Software Foundation, either version 3 of the License, or
da4101033e10 planemo upload
oinizan
parents:
diff changeset
8 # (at your option) any later version.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
9 #
da4101033e10 planemo upload
oinizan
parents:
diff changeset
10 # This program is distributed in the hope that it will be useful,
da4101033e10 planemo upload
oinizan
parents:
diff changeset
11 # but WITHOUT ANY WARRANTY; without even the implied warranty of
da4101033e10 planemo upload
oinizan
parents:
diff changeset
12 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
da4101033e10 planemo upload
oinizan
parents:
diff changeset
13 # GNU General Public License for more details.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
14 #
da4101033e10 planemo upload
oinizan
parents:
diff changeset
15 # You should have received a copy of the GNU General Public License
da4101033e10 planemo upload
oinizan
parents:
diff changeset
16 # along with this program. If not, see <http://www.gnu.org/licenses/>.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
17 -->
da4101033e10 planemo upload
oinizan
parents:
diff changeset
18 <tool id="FROGS_demultiplex" name="FROGS Demultiplex reads" version="2.0.0">
da4101033e10 planemo upload
oinizan
parents:
diff changeset
19 <description>Attribute reads to samples in function of inner barcode.</description>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
20 <requirements>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
21 <requirement type="package" version="0.20">perl-io-gzip</requirement>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
22 </requirements>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
23 <command interpreter="python2.7">
da4101033e10 planemo upload
oinizan
parents:
diff changeset
24 demultiplex.py
da4101033e10 planemo upload
oinizan
parents:
diff changeset
25 #if str( $fastq_input.fastq_input_selector ) == "paired":
da4101033e10 planemo upload
oinizan
parents:
diff changeset
26 --input-R1 "${fastq_input.fastq_input1}"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
27 --input-R2 "${fastq_input.fastq_input2}"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
28 #else:
da4101033e10 planemo upload
oinizan
parents:
diff changeset
29 --input-R1 "${fastq_input.fastq_input1}"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
30 #end if
da4101033e10 planemo upload
oinizan
parents:
diff changeset
31 --input-barcode $barcode_file
da4101033e10 planemo upload
oinizan
parents:
diff changeset
32 --mismatches $mismatches
da4101033e10 planemo upload
oinizan
parents:
diff changeset
33 --end $end
da4101033e10 planemo upload
oinizan
parents:
diff changeset
34 --summary $summary
da4101033e10 planemo upload
oinizan
parents:
diff changeset
35 --output-demultiplexed $demultiplexed_archive
da4101033e10 planemo upload
oinizan
parents:
diff changeset
36 --output-excluded $undemultiplexed_archive
da4101033e10 planemo upload
oinizan
parents:
diff changeset
37 </command>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
38 <inputs>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
39 <!-- Input file -->
da4101033e10 planemo upload
oinizan
parents:
diff changeset
40 <param format="tabular" name="barcode_file" type="data" label="Barcode file" help="This file describes barcodes and samples (one line by sample tabulated separated from barcode sequence(s)). See Help section" optional="false" />
da4101033e10 planemo upload
oinizan
parents:
diff changeset
41
da4101033e10 planemo upload
oinizan
parents:
diff changeset
42 <conditional name="fastq_input">
da4101033e10 planemo upload
oinizan
parents:
diff changeset
43 <param name="fastq_input_selector" type="select" label="Single or Paired-end reads" help="Select between paired and single-end data">
da4101033e10 planemo upload
oinizan
parents:
diff changeset
44 <option value="single">Single</option>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
45 <option value="paired">Paired</option>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
46 </param>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
47 <when value="paired">
da4101033e10 planemo upload
oinizan
parents:
diff changeset
48 <param name="fastq_input1" type="data" format="fastq" label="Select first set of reads" help="Specify dataset of your forward reads"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
49 <param name="fastq_input2" type="data" format="fastq" label="Select second set of reads" help="Specify dataset of your reverse reads"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
50 </when>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
51 <when value="single">
da4101033e10 planemo upload
oinizan
parents:
diff changeset
52 <param name="fastq_input1" type="data" format="fastq" label="Select fastq dataset" help="Specify dataset of your single end reads"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
53 </when>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
54 </conditional>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
55
da4101033e10 planemo upload
oinizan
parents:
diff changeset
56 <!-- Option -->
da4101033e10 planemo upload
oinizan
parents:
diff changeset
57 <param name="mismatches" type="integer" label="Barcode mismatches" help="Number of mismatches allowed in barcode" value="0" optional="false" />
da4101033e10 planemo upload
oinizan
parents:
diff changeset
58 <param name="end" type="select" label="Barcode on which end ?" help="The barcode is placed either at the beginning of the forward end or of the reverse end or both?">
da4101033e10 planemo upload
oinizan
parents:
diff changeset
59 <option value="bol" selected="true">Forward</option>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
60 <option value="eol">Reverse</option>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
61 <option value="both">Both ends</option>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
62 </param>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
63 </inputs>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
64 <outputs>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
65 <data format="tar.gz" name="demultiplexed_archive" label="${tool.name}: demultiplexed.tar.gz" from_work_dir="demultiplexed.tar.gz"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
66 <data format="tar.gz" name="undemultiplexed_archive" label="${tool.name}: undemultiplexed.tar.gz" from_work_dir="undemultiplexed.tar.gz"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
67 <data format="tabular" name="summary" label="${tool.name}: report" from_work_dir="report.tsv"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
68 </outputs>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
69 <tests>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
70 <test>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
71 <param name="fastq_input1" value="multiplex.fastq"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
72 <param name="barcode_file" value="barcode.tabular"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
73 <output name="demultiplexed_archive" file="FROGS_Demultiplex_reads__demultiplexed.tar.gz"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
74 <output name="undemultiplexed_archive" file="FROGS_Demultiplex_reads__undemultiplexed.tar.gz"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
75 <output name="summary" file="FROGS_Demultiplex_reads__report.tabular"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
76 </test>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
77 </tests>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
78 <help>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
79 .. class:: infomark page-header h2
da4101033e10 planemo upload
oinizan
parents:
diff changeset
80
da4101033e10 planemo upload
oinizan
parents:
diff changeset
81 What it does
da4101033e10 planemo upload
oinizan
parents:
diff changeset
82
da4101033e10 planemo upload
oinizan
parents:
diff changeset
83 This tool classifies single or paired-end reads in function of barcode forward or reverse in the first or both reads.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
84
da4101033e10 planemo upload
oinizan
parents:
diff changeset
85 **Command line**::
da4101033e10 planemo upload
oinizan
parents:
diff changeset
86
da4101033e10 planemo upload
oinizan
parents:
diff changeset
87 demultiplex.py --input-R1 *FQ_INPUT1* [--input-R2 *FQ_INPUT2*] --input-barcode *TXT_BARCODE* --mismatches *MISMATCH* --end *END* --summary *TXT_SUMMARY_OUTPUT* --output-demultiplexed *TARGZ_DEMULT_ARCHIVE_OUTPUT* --output-excluded *TARGZ_UNDEMULT_ARCHIVE_OUTPUT*
da4101033e10 planemo upload
oinizan
parents:
diff changeset
88
da4101033e10 planemo upload
oinizan
parents:
diff changeset
89 .. csv-table:: Inputs
da4101033e10 planemo upload
oinizan
parents:
diff changeset
90 :header: "Input name", "Meaning"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
91 :widths: 20, 80
da4101033e10 planemo upload
oinizan
parents:
diff changeset
92 :class: table table-striped
da4101033e10 planemo upload
oinizan
parents:
diff changeset
93
da4101033e10 planemo upload
oinizan
parents:
diff changeset
94 "FQ_INPUT1", "Fastq input file for the first read (single-end or forward read of paired-end sequences)"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
95 "FQ_INPUT2", "Fastq input file for the second read (only for paired-end sequences)"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
96 "TXT_BARCODE", "Tabulated text file that describes barcode sequences used to multiplexe samples: SAMPLE_NAME BARCODE1 [BARCODE2]"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
97
da4101033e10 planemo upload
oinizan
parents:
diff changeset
98 .. csv-table:: Options
da4101033e10 planemo upload
oinizan
parents:
diff changeset
99 :header: "Option name", "Meaning"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
100 :widths: 20, 80
da4101033e10 planemo upload
oinizan
parents:
diff changeset
101 :class: table table-striped
da4101033e10 planemo upload
oinizan
parents:
diff changeset
102
da4101033e10 planemo upload
oinizan
parents:
diff changeset
103 "-m/--mismatches MISMATCH", "Number of allowed mismatch in each barcode"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
104 "-e/--end END", "To which end must the barcode be found : forward (begin of the (first) read), reverse (end of the (second) read) or both"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
105
da4101033e10 planemo upload
oinizan
parents:
diff changeset
106 .. csv-table:: Outputs
da4101033e10 planemo upload
oinizan
parents:
diff changeset
107 :header: "Output name", "Meaning"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
108 :widths: 20, 80
da4101033e10 planemo upload
oinizan
parents:
diff changeset
109 :class: table table-striped
da4101033e10 planemo upload
oinizan
parents:
diff changeset
110
da4101033e10 planemo upload
oinizan
parents:
diff changeset
111 "TXT_SUMMARY_OUTPUT", "A tabulated text file which summarises the number of sequences (single or paired) for each sample"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
112 "TARGZ_DEMULT_ARCHIVE_OUTPUT", "A TAR.GZ archive that contains all fastq files for each sample"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
113 "TARGZ_UNDEMULT_ARCHIVE_OUTPUT", "A TAR.GZ archive that contains all fastq files for undemultiplexed reads"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
114
da4101033e10 planemo upload
oinizan
parents:
diff changeset
115 .. class:: h3
da4101033e10 planemo upload
oinizan
parents:
diff changeset
116
da4101033e10 planemo upload
oinizan
parents:
diff changeset
117 Format
da4101033e10 planemo upload
oinizan
parents:
diff changeset
118
da4101033e10 planemo upload
oinizan
parents:
diff changeset
119 BARCODE_FILE :
da4101033e10 planemo upload
oinizan
parents:
diff changeset
120 This file is expected to be tabulated
da4101033e10 planemo upload
oinizan
parents:
diff changeset
121
da4101033e10 planemo upload
oinizan
parents:
diff changeset
122 -first column corresponds to the sample name
da4101033e10 planemo upload
oinizan
parents:
diff changeset
123
da4101033e10 planemo upload
oinizan
parents:
diff changeset
124 -second column corresponds to the sequence barcode used
da4101033e10 planemo upload
oinizan
parents:
diff changeset
125
da4101033e10 planemo upload
oinizan
parents:
diff changeset
126 -third column (optional) corresponds to the reverse sequence barcode
da4101033e10 planemo upload
oinizan
parents:
diff changeset
127
da4101033e10 planemo upload
oinizan
parents:
diff changeset
128 .. class:: warningmark
da4101033e10 planemo upload
oinizan
parents:
diff changeset
129
da4101033e10 planemo upload
oinizan
parents:
diff changeset
130 Take care to indicate sequence barcode in the strand of the read, so you may need to reverse complement the reverse barcode sequence
da4101033e10 planemo upload
oinizan
parents:
diff changeset
131
da4101033e10 planemo upload
oinizan
parents:
diff changeset
132 .. class:: warningmark
da4101033e10 planemo upload
oinizan
parents:
diff changeset
133
da4101033e10 planemo upload
oinizan
parents:
diff changeset
134 All barcode sequences must have the same length
da4101033e10 planemo upload
oinizan
parents:
diff changeset
135
da4101033e10 planemo upload
oinizan
parents:
diff changeset
136 Example of barcode file: Here the sample is multiplexed by both fragment ends.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
137
da4101033e10 planemo upload
oinizan
parents:
diff changeset
138 .. image:: ${static_path}/images/tools/frogs/demultiplex_barcode.png
da4101033e10 planemo upload
oinizan
parents:
diff changeset
139 :height: 18
da4101033e10 planemo upload
oinizan
parents:
diff changeset
140 :width: 286
da4101033e10 planemo upload
oinizan
parents:
diff changeset
141
da4101033e10 planemo upload
oinizan
parents:
diff changeset
142 FASTQ :
da4101033e10 planemo upload
oinizan
parents:
diff changeset
143 Text file describing biological sequences in a 4 line format:
da4101033e10 planemo upload
oinizan
parents:
diff changeset
144
da4101033e10 planemo upload
oinizan
parents:
diff changeset
145 -first line starts by "@" corresponds to the sequence identifier and optionally the sequence description
da4101033e10 planemo upload
oinizan
parents:
diff changeset
146
da4101033e10 planemo upload
oinizan
parents:
diff changeset
147 -second line is the sequence itself
da4101033e10 planemo upload
oinizan
parents:
diff changeset
148
da4101033e10 planemo upload
oinizan
parents:
diff changeset
149 -third line is a "+" following by the sequence identifier or not depending on the version
da4101033e10 planemo upload
oinizan
parents:
diff changeset
150
da4101033e10 planemo upload
oinizan
parents:
diff changeset
151 -fourth line is the quality sequence, one code per base. The code depends on its version and the sequencer
da4101033e10 planemo upload
oinizan
parents:
diff changeset
152
da4101033e10 planemo upload
oinizan
parents:
diff changeset
153 `Click here for more details on the fastq format &lt;https://en.wikipedia.org/wiki/FASTQ_format&gt;`_
da4101033e10 planemo upload
oinizan
parents:
diff changeset
154
da4101033e10 planemo upload
oinizan
parents:
diff changeset
155 Example of fastq read corresponding to the previous barcode file
da4101033e10 planemo upload
oinizan
parents:
diff changeset
156
da4101033e10 planemo upload
oinizan
parents:
diff changeset
157 .. image:: ${static_path}/images/tools/frogs/demultiplex_fastq_ex.png
da4101033e10 planemo upload
oinizan
parents:
diff changeset
158 :height: 57
da4101033e10 planemo upload
oinizan
parents:
diff changeset
159 :width: 420
da4101033e10 planemo upload
oinizan
parents:
diff changeset
160
da4101033e10 planemo upload
oinizan
parents:
diff changeset
161
da4101033e10 planemo upload
oinizan
parents:
diff changeset
162 .. class:: infomark page-header h2
da4101033e10 planemo upload
oinizan
parents:
diff changeset
163
da4101033e10 planemo upload
oinizan
parents:
diff changeset
164 How it works
da4101033e10 planemo upload
oinizan
parents:
diff changeset
165
da4101033e10 planemo upload
oinizan
parents:
diff changeset
166 For each sequence or sequence pair, the sequence fragment at the beginning (forward multiplexing) of the (first) read or at the end (reverse multiplexing) of the (second) read will be compared to all barcodes of the barecode file.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
167
da4101033e10 planemo upload
oinizan
parents:
diff changeset
168 If this fragment is found once and only once (regarding the mismatch threshold), the fragment is trimmed and the sequence will be attributed to the corresponding sample.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
169
da4101033e10 planemo upload
oinizan
parents:
diff changeset
170 Finally fastq files (or pair of fastq files) for each sample are included in an archive and a report, describing how many sequences are attributed for each sample, is created.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
171
da4101033e10 planemo upload
oinizan
parents:
diff changeset
172
da4101033e10 planemo upload
oinizan
parents:
diff changeset
173 .. class:: infomark page-header h2
da4101033e10 planemo upload
oinizan
parents:
diff changeset
174
da4101033e10 planemo upload
oinizan
parents:
diff changeset
175 Advices
da4101033e10 planemo upload
oinizan
parents:
diff changeset
176
da4101033e10 planemo upload
oinizan
parents:
diff changeset
177 Do not forget to indicate barcode sequence as they really are in the fastq sequence file, especially if you have multiplexed data via the reverse strand.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
178
da4101033e10 planemo upload
oinizan
parents:
diff changeset
179 For the mismatch threshold, we advised to let the threshold to 0. Then if you are not satisfied by the result try with 1. The number of mismatches depends on the length of the barcode, but frequently this sequences are very short so 1 mismatch is already more than the sequencing error rate.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
180
da4101033e10 planemo upload
oinizan
parents:
diff changeset
181 If you have different barcode lengths, you must demultiplex your data in several steps, beginning by the longest barcode set. Then to trim the barcodes with smaller lengths, you use the "unmatched" or "ambiguous" sequence file with smaller barcodes and so on.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
182
da4101033e10 planemo upload
oinizan
parents:
diff changeset
183 If you have Roche 454 sequences in sff format, you must convert them with some programs like `sff2fastq &lt;https://github.com/indraniel/sff2fastq&gt;`_ or sff_to_fastq (installable in Galaxy)
da4101033e10 planemo upload
oinizan
parents:
diff changeset
184
da4101033e10 planemo upload
oinizan
parents:
diff changeset
185
da4101033e10 planemo upload
oinizan
parents:
diff changeset
186 ----
da4101033e10 planemo upload
oinizan
parents:
diff changeset
187
da4101033e10 planemo upload
oinizan
parents:
diff changeset
188 **Contact**
da4101033e10 planemo upload
oinizan
parents:
diff changeset
189
da4101033e10 planemo upload
oinizan
parents:
diff changeset
190 Contacts: frogs@inra.fr
da4101033e10 planemo upload
oinizan
parents:
diff changeset
191
da4101033e10 planemo upload
oinizan
parents:
diff changeset
192 Repository: https://github.com/geraldinepascal/FROGS
da4101033e10 planemo upload
oinizan
parents:
diff changeset
193
da4101033e10 planemo upload
oinizan
parents:
diff changeset
194 Please cite the FROGS Publication: *Escudie F., Auer L., Bernard M., Cauquil L., Vidal K., Maman S., Mariadassou M., Combes S., Hernandez-Raquet G., Pascal G., 2016. FROGS: Find Rapidly OTU with Galaxy Solution. In: ISME-2016 Montreal, CANADA ,* http://bioinfo.genotoul.fr/wp-content/uploads/FROGS_ISME2016_poster.pdf
da4101033e10 planemo upload
oinizan
parents:
diff changeset
195
da4101033e10 planemo upload
oinizan
parents:
diff changeset
196 Depending on the help provided you can cite us in acknowledgements, references or both.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
197 </help>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
198 </tool>