comparison vt_decompose.xml @ 0:26babe3a66f1 draft default tip

planemo upload for repository https://github.com/atks/vt commit d4f5de5f229f503deb66a708f864cf380c900ce0
author bgruening
date Sat, 04 Jun 2016 10:41:29 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:26babe3a66f1
1 <tool id="vt_@BINARY@" name="VT @BINARY@" version="@VERSION@.0">
2 <description>decomposes multiallelic variants into biallelic ones</description>
3 <macros>
4 <import>vt_macros.xml</import>
5 <token name="@BINARY@">decompose</token>
6 </macros>
7 <expand macro="requirements" />
8 <expand macro="stdio" />
9 <expand macro="version_command" />
10 <command>
11 <![CDATA[
12
13 ln -s "${ infile }" infile.vcf &&
14
15
16 vt @BINARY@
17 #if str($output_format) == 'bcf':
18 -o decompose.bcf
19 #else:
20 -o decompose.vcf
21 #end if
22 $s
23 infile.vcf
24
25 &&
26 ## For some reason, the file move will randomly produce empty files.
27 ## Wait two seconds to let the system close file handlers and clean up.
28 sleep 2
29 &&
30
31 #if str($output_format) == 'bcf':
32 mv decompose.bcf "${ outfile }";
33 #else:
34 mv decompose.vcf "${ outfile }";
35 #end if
36
37 ]]>
38 </command>
39 <inputs>
40 <param name="infile" type="data" format="vcf" label="VCF file to be normalised" />
41
42 <param argument="-s" type="boolean" truevalue="-s" falsevalue=""
43 selected="false" label="Smart decomposition"
44 help="Splits up INFO and GENOTYPE fields that have number counts of R and A appropriately."/>
45
46 <param name="output_format" type="select" label="Choose the output format" help="">
47 <option value="bcf">BCF</option>
48 <option value="vcf" selected="true">VCF</option>
49 </param>
50 </inputs>
51 <outputs>
52 <data name="outfile" format="vcf" label="${tool.name} on ${on_string}">
53 <change_format>
54 <when input="output_format" value="bcf" format="bcf" />
55 </change_format>
56 </data>
57 </outputs>
58 <tests>
59 <test>
60 <param name="infile" value="infile01.vcf" />
61 <output name="outfile" file="decompose_result01.vcf" ftype="vcf" />
62 </test>
63 <test>
64 <param name="infile" value="infile02.vcf" />
65 <param name="s" value="True" />
66 <output name="outfile" file="decompose_result02.vcf" ftype="vcf" />
67 </test>
68 </tests>
69 <help>
70 <![CDATA[
71
72 **What it does**
73
74 Decompose multiallelic variants in a VCF file.
75 If the VCF file has genotype fields GT,PL, GL or DP, they are modified to reflect the change in alleles.
76 All other genotype fields are removed. The -s option will retain the fields and decompose fields of counts R and A accordingingly.
77
78 Decomposition and combining variants is a complex operation where the correctness is dependent on:
79
80 * whether the observed variants are seen in the same sample
81 * if same sample, whether they are homozygous or heterozygous
82 * if both heterozygous, whether they are in the same haplotype or not (if known)
83
84 and one should be aware of the issues in handling variants resulting from such operations.
85 The original purpose of this tool is to allow for allelic comparisons between call sets.
86
87 Standard option:
88
89 Before decomposition
90
91 .. code::
92
93 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1 S2
94 1 3759889 . TA TAA,TAAA,T . PASS AF=0.342,0.173,0.037 GT:DP:PL 1/2:81:281,5,9,58,0,115,338,46,116,809 0/0:86:0,30,323,31,365,483,38,291,325,567
95
96 After decomposition
97
98 .. code::
99
100 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1 S2
101 1 3759889 . TA TAA . PASS OLD_MULTIALLELIC=1:3759889:TA/TAA/TAAA/T GT:PL 1/.:281,5,9 0/0:0,30,323
102 1 3759889 . TA TAAA . . OLD_MULTIALLELIC=1:3759889:TA/TAA/TAAA/T GT:PL ./1:281,58,115 0/0:0,31,483
103 1 3759889 . TA T . . OLD_MULTIALLELIC=1:3759889:TA/TAA/TAAA/T GT:PL ./.:281,338,809 0/0:0,38,567
104
105
106 One might want to post process the partial genotypes like 1/. to the best guess genotype based on the PL values.
107
108
109 With **-s** option:
110
111 Before decomposition
112
113 .. code::
114
115 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1 S2
116 1 3759889 . TA TAA,TAAA,T . PASS AF=0.342,0.173,0.037 GT:DP:PL 1/2:81:281,5,9,58,0,115,338,46,116,809 0/0:86:0,30,323,31,365,483,38,291,325,567
117
118 After decomposition
119
120 .. code::
121
122 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1 S2
123 1 3759889 . TA TAA . PASS AF=0.342;OLD_MULTIALLELIC=1:3759889:TA/TAA/TAAA/T GT:PL 1/.:281,5,9 0/0:0,30,323
124 1 3759889 . TA TAAA . . AF=0.173;OLD_MULTIALLELIC=1:3759889:TA/TAA/TAAA/T GT:PL ./1:281,58,115 0/0:0,31,483
125 1 3759889 . TA T . . AF=0.037;OLD_MULTIALLELIC=1:3759889:TA/TAA/TAAA/T GT:PL ./.:281,338,809 0/0:0,38,567
126
127 In general, you should recompute fields that involves alleles after decomposition. Information is generally lost after vertically decomposing a variant, so care should be taken in interpreting the resultant values.
128
129 @CITATION@
130 ]]>
131 </help>
132 <expand macro="citations"/>
133 </tool>