annotate snp_caller_caller.xml @ 3:d6ec32ce882b draft default tip

Uploaded
author wolma
date Tue, 28 Mar 2017 04:34:04 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
3
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
1 <tool id="variant_calling" name="Variant Calling" version="0.1.7.3">
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
2 <description>From a reference and aligned reads generate a BCF file with position-specific variant likelihoods and coverage information</description>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
3 <macros>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
4 <import>toolshed_macros.xml</import>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
5 </macros>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
6 <expand macro="requirements" />
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
7 <version_command>mimodd version -q</version_command>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
8 <command>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
9 mimodd varcall
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
10
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
11 "$ref_genome"
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
12 #for $l in $list_input
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
13 "${l.inputfile}"
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
14 #end for
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
15 --ofile "$output_vcf"
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
16 --depth "$depth"
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
17 $group_by_id
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
18 $no_md5_check
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
19 --verbose
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
20 --quiet
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
21 </command>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
22
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
23 <inputs>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
24 <param format="fasta" label="reference genome" name="ref_genome" type="data" />
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
25 <repeat default="1" min="1" name="list_input" title="Aligned reads input source">
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
26 <param format="bam" label="input file" name="inputfile" type="data" />
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
27 </repeat>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
28 <param checked="false" falsevalue="" help="If selected, this option ensures that only the read group id (but not the sample name) is considered in grouping reads in the input file(s). If turned off, read groups with identical sample names are automatically pooled and analyzed together even if they come from different NGS runs." label="group reads based on read group id only" name="group_by_id" truevalue="-i" type="boolean" />
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
29 <param checked="false" falsevalue="" help="leave turned on to avoid accidental variant calling against a wrong reference genome version (see the tool help below)." label="turn off md5 sum verification" name="no_md5_check" truevalue="-x" type="boolean" />
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
30 <param help="to avoid excessive use of memory" label="maximum per-BAM depth (default: 250)" name="depth" type="integer" value="250" />
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
31 </inputs>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
32
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
33 <outputs>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
34 <data format="bcf" label="Variant Calls from MiModd Variant Calling on ${on_string}" name="output_vcf" />
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
35 </outputs>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
36
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
37 <help>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
38 .. class:: infomark
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
39
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
40 **What it does**
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
41
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
42 The tool transforms the read-centered information of its aligned reads input files into position-centered information.
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
43
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
44 **It produces a BCF file that serves as the basis for all further variant analyses with MiModD**.
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
45
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
46 **Notes:**
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
47
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
48 By default, the tool will check whether the input BAM file(s) provide(s) MD5 checksums for the reference genome sequences used during read alignment (the *SNAP Read Alignment* tool stores these in the BAM file header). If it finds MD5 sums for all sequences, it will compare them to the actual checksums of the sequences in the specified reference genome and
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
49 check that every sequence mentioned in any BAM input file has a counterpart with matching MD5 sum in the reference genome and abort with an error message if that is not the case. If it finds sequences with matching checksum, but different names in the reference genome, it will use the name from the reference genome file in its output.
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
50
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
51 This behavior has two benefits:
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
52
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
53 1) It protects from accidental variant calling against a wrong reference genome (i.e., a different one than that used during the alignment step), which would result in wrong calls. This is the primary reason why we recommend to leave the check activated
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
54
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
55 2) It provides an opportunity to change sequence names between aligned reads files and variant call files by providing a reference genome file with altered sequence names (but identical sequence data).
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
56
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
57 Since there may be rare cases where you *really* want to align against a reference genome with different checksums (e.g., you may have edited the reference sequence based on the alignment results), the check can be turned off, but only do this if you know exactly why.
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
58
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
59 -----------
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
60
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
61 Internally, the tool uses samtools mpileup combined with bcftools to do all per-nucleotide calculations.
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
62
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
63 It exposes just a single configuration parameter of these tools - the *maximum per-BAM depth*. Through this parameter, the maximum number of reads considered for variant calling at any site can be controlled. Its default value of 250 is taken from *samtools mpileup* and usually suitable. Consider, however, that this gives the maximum read number per input file, so if you have a large number of samples in one input file, it could become necessary to increase the value to get sufficient reads considered per sample.
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
64
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
65 </help>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
66 </tool>