annotate deletion_predictor.xml @ 3:d6ec32ce882b draft default tip

Uploaded
author wolma
date Tue, 28 Mar 2017 04:34:04 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
3
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
1 <tool id="deletion_prediction" name="Deletion Prediction for paired-end data" version="0.1.7.3">
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
2 <description>Predicts deletions in one or more aligned read samples based on coverage of the reference genome and on insert sizes</description>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
3 <macros>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
4 <import>toolshed_macros.xml</import>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
5 </macros>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
6 <expand macro="requirements" />
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
7 <version_command>mimodd version -q</version_command>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
8 <command>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
9 mimodd delcall
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
10 #for $l in $list_input
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
11 "${l.bamfile}"
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
12 #end for
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
13 "$covfile" -o "$outputfile"
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
14 --max-cov "$max_cov" --min-size "$min_size" $include_uncovered $group_by_id --verbose
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
15 </command>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
16
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
17 <inputs>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
18 <repeat default="1" min="1" name="list_input" title="Aligned reads input source">
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
19 <param format="bam" label="input BAM file" name="bamfile" type="data" />
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
20 </repeat>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
21 <param format="bcf" help="Use the Variant Calling tool to generate this file." label="BCF variant call file to extract coverage from" name="covfile" type="data" />
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
22 <param checked="false" falsevalue="" help="If selected, reads from different read groups will be treated strictly separate. If turned off, read groups with identical sample names are used together for identifying uncovered regions, but are still treated separately for the prediction of deletions." label="group reads based on read group id only" name="group_by_id" truevalue="-i" type="boolean" />
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
23 <param checked="false" falsevalue="" help="If selected, regions that fulfill the coverage criteria below, but are not statistically significant deletions, will be included in the output." label="include low-coverage regions" name="include_uncovered" truevalue="-u" type="boolean" />
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
24 <param help="The maximal coverage at a site allowed to consider it as part of a low-coverage region" label="maximal coverage allowed inside a low-coverage region (default: 0)" name="max_cov" type="integer" value="0" />
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
25 <param help="A low-coverage region must consist of at least this number of consecutive bases below the maximal coverage to consider it in further analyses." label="minimal deletion size (default: 100)" name="min_size" type="integer" value="100" />
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
26 </inputs>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
27
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
28 <outputs>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
29 <data format="gff" name="outputfile" />
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
30 </outputs>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
31
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
32 <help>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
33 .. class:: infomark
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
34
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
35 **What it does**
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
36
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
37 The tool predicts deletions from paired-end data in a two-step process:
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
38
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
39 1) It finds regions of low-coverage, i.e., candidate regions for deletions, by scanning a BCF file produced by the *Variant Calling* tool.
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
40
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
41 The *maximal coverage allowed inside a low-coverage region* and the *minimal deletion size* parameters are used at this step to define what is considered a low-coverage region.
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
42
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
43 .. class:: warningmark
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
44
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
45 The tool treats genome positions missing from the BCF input as zero coverage, so it is safe to use ONLY with BCF files produced by the *Variant Calling* tool or through other commands that keep the information for all sites.
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
46
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
47 2) It assesses every low-coverage region statistically for evidence of it being a real deletion. **This step requires paired-end data** since it relies on shifts in the distribution of read pair insert sizes around real deletions.
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
48
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
49 By default, the tool only reports Deletions, i.e., the subset of low-coverage regions that pass the statistical test.
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
50 If *include low-coverage regions* is selected, regions that failed the test will also be reported.
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
51
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
52 With *group reads based on read group id only* selected, as it is by default, grouping of reads into samples is done strictly based on their read group IDs.
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
53 With the option deselected, grouping is done based on sample names in the first step of the analysis, i.e. the reads of all samples with a shared sample name are used to identify low-coverage regions.
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
54 In the second step, however, reads will be regrouped by their read group IDs again, i.e. the statistical assessment for real deletions is always done on a per read group basis.
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
55
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
56 **TIP:**
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
57 Deselecting *group reads based on read group id only* can be useful, for example, if you have both paired-end and single-end sequencing data for the same sample.
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
58
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
59 In this case, the two sets of reads will usually share a common sample name, but differ in their read groups.
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
60 With grouping based on sample names, the single-end data can be used together with the paired-end data to identify low-coverage regions, thus increasing overall coverage and reliability of this step.
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
61 Still, the assessment of deletions will use only the paired-end data (auto-detecting that the single-end reads do not provide insert size information).
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
62
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
63 </help>
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
64
d6ec32ce882b Uploaded
wolma
parents:
diff changeset
65 </tool>