annotate demultiplex.xml @ 0:da4101033e10 draft default tip

planemo upload
author oinizan
date Wed, 18 Oct 2017 05:30:40 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
da4101033e10 planemo upload
oinizan
parents:
diff changeset
1 <?xml version="1.0"?>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
2 <!--
da4101033e10 planemo upload
oinizan
parents:
diff changeset
3 # Copyright (C) 2015 INRA
da4101033e10 planemo upload
oinizan
parents:
diff changeset
4 #
da4101033e10 planemo upload
oinizan
parents:
diff changeset
5 # This program is free software: you can redistribute it and/or modify
da4101033e10 planemo upload
oinizan
parents:
diff changeset
6 # it under the terms of the GNU General Public License as published by
da4101033e10 planemo upload
oinizan
parents:
diff changeset
7 # the Free Software Foundation, either version 3 of the License, or
da4101033e10 planemo upload
oinizan
parents:
diff changeset
8 # (at your option) any later version.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
9 #
da4101033e10 planemo upload
oinizan
parents:
diff changeset
10 # This program is distributed in the hope that it will be useful,
da4101033e10 planemo upload
oinizan
parents:
diff changeset
11 # but WITHOUT ANY WARRANTY; without even the implied warranty of
da4101033e10 planemo upload
oinizan
parents:
diff changeset
12 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
da4101033e10 planemo upload
oinizan
parents:
diff changeset
13 # GNU General Public License for more details.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
14 #
da4101033e10 planemo upload
oinizan
parents:
diff changeset
15 # You should have received a copy of the GNU General Public License
da4101033e10 planemo upload
oinizan
parents:
diff changeset
16 # along with this program. If not, see <http://www.gnu.org/licenses/>.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
17 -->
da4101033e10 planemo upload
oinizan
parents:
diff changeset
18 <tool id="FROGS_demultiplex" name="FROGS Demultiplex reads" version="2.0.0">
da4101033e10 planemo upload
oinizan
parents:
diff changeset
19 <description>Attribute reads to samples in function of inner barcode.</description>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
20 <requirements>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
21 <requirement type="package">perl-io-zlib</requirement>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
22 <requirement type="package" version="0.20">perl-io-gzip</requirement>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
23 </requirements>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
24 <command>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
25 <![CDATA[
da4101033e10 planemo upload
oinizan
parents:
diff changeset
26 export TOOL_DIRECTORY=$__tool_directory__ &&
da4101033e10 planemo upload
oinizan
parents:
diff changeset
27 python "$__tool_directory__"/demultiplex.py
da4101033e10 planemo upload
oinizan
parents:
diff changeset
28 #if str( $fastq_input.fastq_input_selector ) == "paired":
da4101033e10 planemo upload
oinizan
parents:
diff changeset
29 --input-R1 "${fastq_input.fastq_input1}"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
30 --input-R2 "${fastq_input.fastq_input2}"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
31 #else:
da4101033e10 planemo upload
oinizan
parents:
diff changeset
32 --input-R1 "${fastq_input.fastq_input1}"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
33 #end if
da4101033e10 planemo upload
oinizan
parents:
diff changeset
34 --input-barcode $barcode_file
da4101033e10 planemo upload
oinizan
parents:
diff changeset
35 --mismatches $mismatches
da4101033e10 planemo upload
oinizan
parents:
diff changeset
36 --end $end
da4101033e10 planemo upload
oinizan
parents:
diff changeset
37 --summary $summary
da4101033e10 planemo upload
oinizan
parents:
diff changeset
38 --output-demultiplexed $demultiplexed_archive
da4101033e10 planemo upload
oinizan
parents:
diff changeset
39 --output-excluded $undemultiplexed_archive
da4101033e10 planemo upload
oinizan
parents:
diff changeset
40 ]]>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
41 </command>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
42 <inputs>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
43 <!-- Input file -->
da4101033e10 planemo upload
oinizan
parents:
diff changeset
44 <param format="tabular" name="barcode_file" type="data" label="Barcode file" help="This file describes barcodes and samples (one line by sample tabulated separated from barcode sequence(s)). See Help section" optional="false" />
da4101033e10 planemo upload
oinizan
parents:
diff changeset
45
da4101033e10 planemo upload
oinizan
parents:
diff changeset
46 <conditional name="fastq_input">
da4101033e10 planemo upload
oinizan
parents:
diff changeset
47 <param name="fastq_input_selector" type="select" label="Single or Paired-end reads" help="Select between paired and single-end data">
da4101033e10 planemo upload
oinizan
parents:
diff changeset
48 <option value="single">Single</option>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
49 <option value="paired">Paired</option>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
50 </param>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
51 <when value="paired">
da4101033e10 planemo upload
oinizan
parents:
diff changeset
52 <param name="fastq_input1" type="data" format="fastq" label="Select first set of reads" help="Specify dataset of your forward reads"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
53 <param name="fastq_input2" type="data" format="fastq" label="Select second set of reads" help="Specify dataset of your reverse reads"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
54 </when>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
55 <when value="single">
da4101033e10 planemo upload
oinizan
parents:
diff changeset
56 <param name="fastq_input1" type="data" format="fastq" label="Select fastq dataset" help="Specify dataset of your single end reads"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
57 </when>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
58 </conditional>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
59
da4101033e10 planemo upload
oinizan
parents:
diff changeset
60 <!-- Option -->
da4101033e10 planemo upload
oinizan
parents:
diff changeset
61 <param name="mismatches" type="integer" label="Barcode mismatches" help="Number of mismatches allowed in barcode" value="0" optional="false" />
da4101033e10 planemo upload
oinizan
parents:
diff changeset
62 <param name="end" type="select" label="Barcode on which end ?" help="The barcode is placed either at the beginning of the forward end or of the reverse end or both?">
da4101033e10 planemo upload
oinizan
parents:
diff changeset
63 <option value="bol" selected="true">Forward</option>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
64 <option value="eol">Reverse</option>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
65 <option value="both">Both ends</option>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
66 </param>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
67 </inputs>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
68 <outputs>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
69 <data format="tar" name="demultiplexed_archive" label="${tool.name}: demultiplexed.tar.gz" from_work_dir="demultiplexed.tar.gz"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
70 <data format="tar" name="undemultiplexed_archive" label="${tool.name}: undemultiplexed.tar.gz" from_work_dir="undemultiplexed.tar.gz"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
71 <data format="tabular" name="summary" label="${tool.name}: report" from_work_dir="report.tsv"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
72 </outputs>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
73 <tests>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
74 <test>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
75 <param name="fastq_input1" value="multiplex.fastq"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
76 <param name="barcode_file" value="barcode.tabular"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
77 <param name="end" value="both"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
78 <!--<output name="demultiplexed_archive" file="FROGS_Demultiplex_reads__demultiplexed.tar.gz"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
79 <output name="undemultiplexed_archive" file="FROGS_Demultiplex_reads__undemultiplexed.tar.gz"/>-->
da4101033e10 planemo upload
oinizan
parents:
diff changeset
80 <output name="summary" file="FROGS_Demultiplex_reads__report.tabular"/>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
81 </test>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
82 </tests>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
83 <help>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
84 .. class:: infomark page-header h2
da4101033e10 planemo upload
oinizan
parents:
diff changeset
85
da4101033e10 planemo upload
oinizan
parents:
diff changeset
86 What it does
da4101033e10 planemo upload
oinizan
parents:
diff changeset
87
da4101033e10 planemo upload
oinizan
parents:
diff changeset
88 This tool classifies single or paired-end reads in function of barcode forward or reverse in the first or both reads.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
89
da4101033e10 planemo upload
oinizan
parents:
diff changeset
90 **Command line**::
da4101033e10 planemo upload
oinizan
parents:
diff changeset
91
da4101033e10 planemo upload
oinizan
parents:
diff changeset
92 demultiplex.py --input-R1 *FQ_INPUT1* [--input-R2 *FQ_INPUT2*] --input-barcode *TXT_BARCODE* --mismatches *MISMATCH* --end *END* --summary *TXT_SUMMARY_OUTPUT* --output-demultiplexed *TARGZ_DEMULT_ARCHIVE_OUTPUT* --output-excluded *TARGZ_UNDEMULT_ARCHIVE_OUTPUT*
da4101033e10 planemo upload
oinizan
parents:
diff changeset
93
da4101033e10 planemo upload
oinizan
parents:
diff changeset
94 .. csv-table:: Inputs
da4101033e10 planemo upload
oinizan
parents:
diff changeset
95 :header: "Input name", "Meaning"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
96 :widths: 20, 80
da4101033e10 planemo upload
oinizan
parents:
diff changeset
97 :class: table table-striped
da4101033e10 planemo upload
oinizan
parents:
diff changeset
98
da4101033e10 planemo upload
oinizan
parents:
diff changeset
99 "FQ_INPUT1", "Fastq input file for the first read (single-end or forward read of paired-end sequences)"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
100 "FQ_INPUT2", "Fastq input file for the second read (only for paired-end sequences)"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
101 "TXT_BARCODE", "Tabulated text file that describes barcode sequences used to multiplexe samples: SAMPLE_NAME BARCODE1 [BARCODE2]"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
102
da4101033e10 planemo upload
oinizan
parents:
diff changeset
103 .. csv-table:: Options
da4101033e10 planemo upload
oinizan
parents:
diff changeset
104 :header: "Option name", "Meaning"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
105 :widths: 20, 80
da4101033e10 planemo upload
oinizan
parents:
diff changeset
106 :class: table table-striped
da4101033e10 planemo upload
oinizan
parents:
diff changeset
107
da4101033e10 planemo upload
oinizan
parents:
diff changeset
108 "-m/--mismatches MISMATCH", "Number of allowed mismatch in each barcode"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
109 "-e/--end END", "To which end must the barcode be found : forward (begin of the (first) read), reverse (end of the (second) read) or both"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
110
da4101033e10 planemo upload
oinizan
parents:
diff changeset
111 .. csv-table:: Outputs
da4101033e10 planemo upload
oinizan
parents:
diff changeset
112 :header: "Output name", "Meaning"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
113 :widths: 20, 80
da4101033e10 planemo upload
oinizan
parents:
diff changeset
114 :class: table table-striped
da4101033e10 planemo upload
oinizan
parents:
diff changeset
115
da4101033e10 planemo upload
oinizan
parents:
diff changeset
116 "TXT_SUMMARY_OUTPUT", "A tabulated text file which summarises the number of sequences (single or paired) for each sample"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
117 "TARGZ_DEMULT_ARCHIVE_OUTPUT", "A TAR.GZ archive that contains all fastq files for each sample"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
118 "TARGZ_UNDEMULT_ARCHIVE_OUTPUT", "A TAR.GZ archive that contains all fastq files for undemultiplexed reads"
da4101033e10 planemo upload
oinizan
parents:
diff changeset
119
da4101033e10 planemo upload
oinizan
parents:
diff changeset
120 .. class:: h3
da4101033e10 planemo upload
oinizan
parents:
diff changeset
121
da4101033e10 planemo upload
oinizan
parents:
diff changeset
122 Format
da4101033e10 planemo upload
oinizan
parents:
diff changeset
123
da4101033e10 planemo upload
oinizan
parents:
diff changeset
124 BARCODE_FILE :
da4101033e10 planemo upload
oinizan
parents:
diff changeset
125 This file is expected to be tabulated
da4101033e10 planemo upload
oinizan
parents:
diff changeset
126
da4101033e10 planemo upload
oinizan
parents:
diff changeset
127 -first column corresponds to the sample name
da4101033e10 planemo upload
oinizan
parents:
diff changeset
128
da4101033e10 planemo upload
oinizan
parents:
diff changeset
129 -second column corresponds to the sequence barcode used
da4101033e10 planemo upload
oinizan
parents:
diff changeset
130
da4101033e10 planemo upload
oinizan
parents:
diff changeset
131 -third column (optional) corresponds to the reverse sequence barcode
da4101033e10 planemo upload
oinizan
parents:
diff changeset
132
da4101033e10 planemo upload
oinizan
parents:
diff changeset
133 .. class:: warningmark
da4101033e10 planemo upload
oinizan
parents:
diff changeset
134
da4101033e10 planemo upload
oinizan
parents:
diff changeset
135 Take care to indicate sequence barcode in the strand of the read, so you may need to reverse complement the reverse barcode sequence
da4101033e10 planemo upload
oinizan
parents:
diff changeset
136
da4101033e10 planemo upload
oinizan
parents:
diff changeset
137 .. class:: warningmark
da4101033e10 planemo upload
oinizan
parents:
diff changeset
138
da4101033e10 planemo upload
oinizan
parents:
diff changeset
139 All barcode sequences must have the same length
da4101033e10 planemo upload
oinizan
parents:
diff changeset
140
da4101033e10 planemo upload
oinizan
parents:
diff changeset
141 Example of barcode file: Here the sample is multiplexed by both fragment ends.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
142
da4101033e10 planemo upload
oinizan
parents:
diff changeset
143 .. image:: ${static_path}/images/tools/frogs/demultiplex_barcode.png
da4101033e10 planemo upload
oinizan
parents:
diff changeset
144 :height: 18
da4101033e10 planemo upload
oinizan
parents:
diff changeset
145 :width: 286
da4101033e10 planemo upload
oinizan
parents:
diff changeset
146
da4101033e10 planemo upload
oinizan
parents:
diff changeset
147 FASTQ :
da4101033e10 planemo upload
oinizan
parents:
diff changeset
148 Text file describing biological sequences in a 4 line format:
da4101033e10 planemo upload
oinizan
parents:
diff changeset
149
da4101033e10 planemo upload
oinizan
parents:
diff changeset
150 -first line starts by "@" corresponds to the sequence identifier and optionally the sequence description
da4101033e10 planemo upload
oinizan
parents:
diff changeset
151
da4101033e10 planemo upload
oinizan
parents:
diff changeset
152 -second line is the sequence itself
da4101033e10 planemo upload
oinizan
parents:
diff changeset
153
da4101033e10 planemo upload
oinizan
parents:
diff changeset
154 -third line is a "+" following by the sequence identifier or not depending on the version
da4101033e10 planemo upload
oinizan
parents:
diff changeset
155
da4101033e10 planemo upload
oinizan
parents:
diff changeset
156 -fourth line is the quality sequence, one code per base. The code depends on its version and the sequencer
da4101033e10 planemo upload
oinizan
parents:
diff changeset
157
da4101033e10 planemo upload
oinizan
parents:
diff changeset
158 `Click here for more details on the fastq format &lt;https://en.wikipedia.org/wiki/FASTQ_format&gt;`_
da4101033e10 planemo upload
oinizan
parents:
diff changeset
159
da4101033e10 planemo upload
oinizan
parents:
diff changeset
160 Example of fastq read corresponding to the previous barcode file
da4101033e10 planemo upload
oinizan
parents:
diff changeset
161
da4101033e10 planemo upload
oinizan
parents:
diff changeset
162 .. image:: ${static_path}/images/tools/frogs/demultiplex_fastq_ex.png
da4101033e10 planemo upload
oinizan
parents:
diff changeset
163 :height: 57
da4101033e10 planemo upload
oinizan
parents:
diff changeset
164 :width: 420
da4101033e10 planemo upload
oinizan
parents:
diff changeset
165
da4101033e10 planemo upload
oinizan
parents:
diff changeset
166
da4101033e10 planemo upload
oinizan
parents:
diff changeset
167 .. class:: infomark page-header h2
da4101033e10 planemo upload
oinizan
parents:
diff changeset
168
da4101033e10 planemo upload
oinizan
parents:
diff changeset
169 How it works
da4101033e10 planemo upload
oinizan
parents:
diff changeset
170
da4101033e10 planemo upload
oinizan
parents:
diff changeset
171 For each sequence or sequence pair, the sequence fragment at the beginning (forward multiplexing) of the (first) read or at the end (reverse multiplexing) of the (second) read will be compared to all barcodes of the barecode file.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
172
da4101033e10 planemo upload
oinizan
parents:
diff changeset
173 If this fragment is found once and only once (regarding the mismatch threshold), the fragment is trimmed and the sequence will be attributed to the corresponding sample.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
174
da4101033e10 planemo upload
oinizan
parents:
diff changeset
175 Finally fastq files (or pair of fastq files) for each sample are included in an archive and a report, describing how many sequences are attributed for each sample, is created.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
176
da4101033e10 planemo upload
oinizan
parents:
diff changeset
177
da4101033e10 planemo upload
oinizan
parents:
diff changeset
178 .. class:: infomark page-header h2
da4101033e10 planemo upload
oinizan
parents:
diff changeset
179
da4101033e10 planemo upload
oinizan
parents:
diff changeset
180 Advices
da4101033e10 planemo upload
oinizan
parents:
diff changeset
181
da4101033e10 planemo upload
oinizan
parents:
diff changeset
182 Do not forget to indicate barcode sequence as they really are in the fastq sequence file, especially if you have multiplexed data via the reverse strand.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
183
da4101033e10 planemo upload
oinizan
parents:
diff changeset
184 For the mismatch threshold, we advised to let the threshold to 0. Then if you are not satisfied by the result try with 1. The number of mismatches depends on the length of the barcode, but frequently this sequences are very short so 1 mismatch is already more than the sequencing error rate.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
185
da4101033e10 planemo upload
oinizan
parents:
diff changeset
186 If you have different barcode lengths, you must demultiplex your data in several steps, beginning by the longest barcode set. Then to trim the barcodes with smaller lengths, you use the "unmatched" or "ambiguous" sequence file with smaller barcodes and so on.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
187
da4101033e10 planemo upload
oinizan
parents:
diff changeset
188 If you have Roche 454 sequences in sff format, you must convert them with some programs like `sff2fastq &lt;https://github.com/indraniel/sff2fastq&gt;`_ or sff_to_fastq (installable in Galaxy)
da4101033e10 planemo upload
oinizan
parents:
diff changeset
189
da4101033e10 planemo upload
oinizan
parents:
diff changeset
190
da4101033e10 planemo upload
oinizan
parents:
diff changeset
191 ----
da4101033e10 planemo upload
oinizan
parents:
diff changeset
192
da4101033e10 planemo upload
oinizan
parents:
diff changeset
193 **Contact**
da4101033e10 planemo upload
oinizan
parents:
diff changeset
194
da4101033e10 planemo upload
oinizan
parents:
diff changeset
195 Contacts: frogs@inra.fr
da4101033e10 planemo upload
oinizan
parents:
diff changeset
196
da4101033e10 planemo upload
oinizan
parents:
diff changeset
197 Repository: https://github.com/geraldinepascal/FROGS
da4101033e10 planemo upload
oinizan
parents:
diff changeset
198
da4101033e10 planemo upload
oinizan
parents:
diff changeset
199 Please cite the FROGS Publication: *Escudie F., Auer L., Bernard M., Cauquil L., Vidal K., Maman S., Mariadassou M., Combes S., Hernandez-Raquet G., Pascal G., 2016. FROGS: Find Rapidly OTU with Galaxy Solution. In: ISME-2016 Montreal, CANADA ,* http://bioinfo.genotoul.fr/wp-content/uploads/FROGS_ISME2016_poster.pdf
da4101033e10 planemo upload
oinizan
parents:
diff changeset
200
da4101033e10 planemo upload
oinizan
parents:
diff changeset
201 Depending on the help provided you can cite us in acknowledgements, references or both.
da4101033e10 planemo upload
oinizan
parents:
diff changeset
202 </help>
da4101033e10 planemo upload
oinizan
parents:
diff changeset
203 </tool>