Mercurial > repos > bgruening > vt_decompose
view vt_decompose.xml @ 0:490e605d1f44 draft default tip
planemo upload for repository https://github.com/atks/vt commit 5f1e53104d11817b9f1f93c4df17b77c80bd7472-dirty
author | bgruening |
---|---|
date | Sat, 04 Jun 2016 12:45:04 -0400 |
parents | |
children |
line wrap: on
line source
<tool id="vt_decompose" name="VT @BINARY@" version="@VERSION@.0"> <description>decomposes multiallelic variants into biallelic ones</description> <macros> <import>vt_macros.xml</import> <token name="@BINARY@">decompose</token> </macros> <expand macro="requirements" /> <expand macro="stdio" /> <expand macro="version_command" /> <command> <![CDATA[ ln -s "${ infile }" infile.vcf && vt @BINARY@ #if str($output_format) == 'bcf': -o decompose.bcf #else: -o decompose.vcf #end if $s infile.vcf && ## For some reason, the file move will randomly produce empty files. ## Wait two seconds to let the system close file handlers and clean up. sleep 2 && #if str($output_format) == 'bcf': mv decompose.bcf "${ outfile }"; #else: mv decompose.vcf "${ outfile }"; #end if ]]> </command> <inputs> <param name="infile" type="data" format="vcf" label="VCF file to be normalised" /> <param argument="-s" type="boolean" truevalue="-s" falsevalue="" selected="false" label="Smart decomposition" help="Splits up INFO and GENOTYPE fields that have number counts of R and A appropriately."/> <param name="output_format" type="select" label="Choose the output format" help=""> <option value="bcf">BCF</option> <option value="vcf" selected="true">VCF</option> </param> </inputs> <outputs> <data name="outfile" format="vcf" label="${tool.name} on ${on_string}"> <change_format> <when input="output_format" value="bcf" format="bcf" /> </change_format> </data> </outputs> <tests> <test> <param name="infile" value="infile01.vcf" /> <output name="outfile" file="decompose_result01.vcf" ftype="vcf" /> </test> <test> <param name="infile" value="infile02.vcf" /> <param name="s" value="True" /> <output name="outfile" file="decompose_result02.vcf" ftype="vcf" /> </test> </tests> <help> <![CDATA[ **What it does** Decompose multiallelic variants in a VCF file. If the VCF file has genotype fields GT,PL, GL or DP, they are modified to reflect the change in alleles. All other genotype fields are removed. The -s option will retain the fields and decompose fields of counts R and A accordingingly. Decomposition and combining variants is a complex operation where the correctness is dependent on: * whether the observed variants are seen in the same sample * if same sample, whether they are homozygous or heterozygous * if both heterozygous, whether they are in the same haplotype or not (if known) and one should be aware of the issues in handling variants resulting from such operations. The original purpose of this tool is to allow for allelic comparisons between call sets. Standard option: Before decomposition .. code:: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1 S2 1 3759889 . TA TAA,TAAA,T . PASS AF=0.342,0.173,0.037 GT:DP:PL 1/2:81:281,5,9,58,0,115,338,46,116,809 0/0:86:0,30,323,31,365,483,38,291,325,567 After decomposition .. code:: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1 S2 1 3759889 . TA TAA . PASS OLD_MULTIALLELIC=1:3759889:TA/TAA/TAAA/T GT:PL 1/.:281,5,9 0/0:0,30,323 1 3759889 . TA TAAA . . OLD_MULTIALLELIC=1:3759889:TA/TAA/TAAA/T GT:PL ./1:281,58,115 0/0:0,31,483 1 3759889 . TA T . . OLD_MULTIALLELIC=1:3759889:TA/TAA/TAAA/T GT:PL ./.:281,338,809 0/0:0,38,567 One might want to post process the partial genotypes like 1/. to the best guess genotype based on the PL values. With **-s** option: Before decomposition .. code:: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1 S2 1 3759889 . TA TAA,TAAA,T . PASS AF=0.342,0.173,0.037 GT:DP:PL 1/2:81:281,5,9,58,0,115,338,46,116,809 0/0:86:0,30,323,31,365,483,38,291,325,567 After decomposition .. code:: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1 S2 1 3759889 . TA TAA . PASS AF=0.342;OLD_MULTIALLELIC=1:3759889:TA/TAA/TAAA/T GT:PL 1/.:281,5,9 0/0:0,30,323 1 3759889 . TA TAAA . . AF=0.173;OLD_MULTIALLELIC=1:3759889:TA/TAA/TAAA/T GT:PL ./1:281,58,115 0/0:0,31,483 1 3759889 . TA T . . AF=0.037;OLD_MULTIALLELIC=1:3759889:TA/TAA/TAAA/T GT:PL ./.:281,338,809 0/0:0,38,567 In general, you should recompute fields that involves alleles after decomposition. Information is generally lost after vertically decomposing a variant, so care should be taken in interpreting the resultant values. @CITATION@ ]]> </help> <expand macro="citations"/> </tool>