annotate defuse_results_to_vcf.py @ 40:ed07bcc39f6e

Provide a matched tabular output
author Jim Johnson <jj@umn.edu>
date Wed, 06 May 2015 14:31:57 -0500
parents 3099cec648e7
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
25
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
1 #!/usr/bin/env python
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
2 """
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
3 #
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
4 #------------------------------------------------------------------------------
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
5 # University of Minnesota
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
6 # Copyright 2012, Regents of the University of Minnesota
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
7 #------------------------------------------------------------------------------
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
8 # Author:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
9 #
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
10 # James E Johnson
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
11 # Jesse Erdmann
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
12 #
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
13 #------------------------------------------------------------------------------
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
14 """
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
15
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
16
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
17 """
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
18 This tool takes the defuse results.tsv tab-delimited file as input and creates a Variant Call Format file as output.
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
19 """
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
20
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
21 import sys,re,os.path
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
22 import optparse
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
23 from optparse import OptionParser
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
24
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
25 """
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
26 http://www.1000genomes.org/wiki/analysis/variant-call-format/vcf-variant-call-format-version-42
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
27
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
28 5. INFO keys used for structural variants
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
29 When the INFO keys reserved for encoding structural variants are used for imprecise variants, the values should be best estimates. When a key reflects a property of a single alt allele (e.g. SVLEN), then when there are multiple alt alleles there will be multiple values for the key corresponding to each alelle (e.g. SVLEN=-100,-110 for a deletion with two distinct alt alleles).
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
30 The following INFO keys are reserved for encoding structural variants. In general, when these keys are used by imprecise variants, the values should be best estimates. When a key reflects a property of a single alt allele (e.g. SVLEN), then when there are multiple alt alleles there will be multiple values for the key corresponding to each alelle (e.g. SVLEN=-100,-110 for a deletion with two distinct alt alleles).
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
31 ##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
32 ##INFO=<ID=NOVEL,Number=0,Type=Flag,Description="Indicates a novel structural variation">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
33 ##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
34 For precise variants, END is POS + length of REF allele - 1, and the for imprecise variants the corresponding best estimate.
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
35 ##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
36 Value should be one of DEL, INS, DUP, INV, CNV, BND. This key can be derived from the REF/ALT fields but is useful for filtering.
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
37 ##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
38 One value for each ALT allele. Longer ALT alleles (e.g. insertions) have positive values, shorter ALT alleles (e.g. deletions) have negative values.
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
39 ##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS for imprecise variants">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
40 ##INFO=<ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END for imprecise variants">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
41 ##INFO=<ID=HOMLEN,Number=.,Type=Integer,Description="Length of base pair identical micro-homology at event breakpoints">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
42 ##INFO=<ID=HOMSEQ,Number=.,Type=String,Description="Sequence of base pair identical micro-homology at event breakpoints">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
43 ##INFO=<ID=BKPTID,Number=.,Type=String,Description="ID of the assembled alternate allele in the assembly file">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
44 For precise variants, the consensus sequence the alternate allele assembly is derivable from the REF and ALT fields. However, the alternate allele assembly file may contain additional information about the characteristics of the alt allele contigs.
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
45 ##INFO=<ID=MEINFO,Number=4,Type=String,Description="Mobile element info of the form NAME,START,END,POLARITY">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
46 ##INFO=<ID=METRANS,Number=4,Type=String,Description="Mobile element transduction info of the form CHR,START,END,POLARITY">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
47 ##INFO=<ID=DGVID,Number=1,Type=String,Description="ID of this element in Database of Genomic Variation">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
48 ##INFO=<ID=DBVARID,Number=1,Type=String,Description="ID of this element in DBVAR">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
49 ##INFO=<ID=DBRIPID,Number=1,Type=String,Description="ID of this element in DBRIP">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
50 ##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakends">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
51 ##INFO=<ID=PARID,Number=1,Type=String,Description="ID of partner breakend">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
52 ##INFO=<ID=EVENT,Number=1,Type=String,Description="ID of event associated to breakend">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
53 ##INFO=<ID=CILEN,Number=2,Type=Integer,Description="Confidence interval around the length of the inserted material between breakends">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
54 ##INFO=<ID=DP,Number=1,Type=Integer,Description="Read Depth of segment containing breakend">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
55 ##INFO=<ID=DPADJ,Number=.,Type=Integer,Description="Read Depth of adjacency">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
56 ##INFO=<ID=CN,Number=1,Type=Integer,Description="Copy number of segment containing breakend">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
57 ##INFO=<ID=CNADJ,Number=.,Type=Integer,Description="Copy number of adjacency">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
58 ##INFO=<ID=CICN,Number=2,Type=Integer,Description="Confidence interval around copy number for the segment">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
59 ##INFO=<ID=CICNADJ,Number=.,Type=Integer,Description="Confidence interval around copy number for the adjacency">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
60 6. FORMAT keys used for structural variants
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
61 ##FORMAT=<ID=CN,Number=1,Type=Integer,Description="Copy number genotype for imprecise events">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
62 ##FORMAT=<ID=CNQ,Number=1,Type=Float,Description="Copy number genotype quality for imprecise events">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
63 ##FORMAT=<ID=CNL,Number=.,Type=Float,Description="Copy number genotype likelihood for imprecise events">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
64 ##FORMAT=<ID=NQ,Number=1,Type=Integer,Description="Phred style probability score that the variant is novel with respect to the genome's ancestor">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
65 ##FORMAT=<ID=HAP,Number=1,Type=Integer,Description="Unique haplotype identifier">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
66 ##FORMAT=<ID=AHAP,Number=1,Type=Integer,Description="Unique identifier of ancestral haplotype">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
67 These keys are analogous to GT/GQ/GL and are provided for genotyping imprecise events by copy number (either because there is an unknown number of alternate alleles or because the haplotypes cannot be determined). CN specifies the integer copy number of the variant in this sample. CNQ is encoded as a phred quality -10log_10p(copy number genotype call is wrong). CNL specifies a list of log10 likelihoods for each potential copy number, starting from zero. When possible, GT/GQ/GL should be used instead of (or in addition to) these keys.
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
68
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
69 Specifying Complex Rearrangements with Breakends
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
70 An arbitrary rearrangement event can be summarized as a set of novel adjacencies.
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
71 Each adjacency ties together 2 breakends. The two breakends at either end of a novel adjacency are called mates.
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
72 There is one line of VCF (i.e. one record) for each of the two breakends in a novel adjacency. A breakend record is identified with the tag SYTYPE=BND" in the INFO field. The REF field of a breakend record indicates a base or sequence s of bases beginning at position POS, as in all VCF records. The ALT field of a breakend record indicates a replacement for s. This "breakend replacement" has three parts:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
73 the string t that replaces places s. The string t may be an extended version of s if some novel bases are inserted during the formation of the novel adjacency.
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
74 The position p of the mate breakend, indicated by a string of the form "chr:pos". This is the location of the first mapped base in the piece being joined at this novel adjacency.
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
75 The direction that the joined sequence continues in, starting from p. This is indicated by the orientation of square brackets surrounding p.
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
76 These 3 elements are combined in 4 possible ways to create the ALT. In each of the 4 cases, the assertion is that s is replaced with t, and then some piece starting at position p is joined to t. The cases are:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
77 REF ALT Meaning
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
78 s t[p[ piece extending to the right of p is joined after t
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
79 s t]p] reverse comp piece extending left of p is joined after t
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
80 s ]p]t piece extending to the left of p is joined before t
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
81 s [p[t reverse comp piece extending right of p is joined before t
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
82
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
83 Examples:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
84 #CHROM POS ID REF ALT QUAL FILT INFO
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
85 2 321681 bnd_W G G]17:198982] 6 PASS SVTYPE=BND;MATEID=bnd_Y
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
86 2 321682 bnd_V T ]13:123456]T 6 PASS SVTYPE=BND;MATEID=bnd_U
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
87 13 123456 bnd_U C C[2:321682[ 6 PASS SVTYPE=BND;MATEID=bnd_V
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
88 13 123457 bnd_X A [17:198983[A 6 PASS SVTYPE=BND;MATEID=bnd_Z
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
89 17 198982 bnd_Y A A]2:321681] 6 PASS SVTYPE=BND;MATEID=bnd_W
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
90 17 198983 bnd_Z C [13:123457[C 6 PASS SVTYPE=BND;MATEID=bnd_X
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
91 """
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
92
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
93 vcf_header = """\
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
94 ##fileformat=VCFv4.1
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
95 ##source=defuse
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
96 ##reference=%s
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
97 ##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
98 ##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
99 ##INFO=<ID=MATEID,Number=1,Type=String,Description="ID of the BND mate">
34
3099cec648e7 Update tool_dependencies, add help
Jim Johnson <jj@umn.edu>
parents: 32
diff changeset
100 ##INFO=<ID=MATELOC,Number=1,Type=String,Description="The chrom:position of the BND mate">
3099cec648e7 Update tool_dependencies, add help
Jim Johnson <jj@umn.edu>
parents: 32
diff changeset
101 ##INFO=<ID=GENESTRAND,Number=2,Type=String,Description="Strands">
25
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
102 ##INFO=<ID=DP,Number=1,Type=Integer,Description="Read Depth of segment containing breakend">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
103 ##INFO=<ID=SPLITCNT,Number=1,Type=Integer,Description="number of split reads supporting the prediction">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
104 ##INFO=<ID=SPANCNT,Number=1,Type=Integer,Description="number of spanning reads supporting the fusion">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
105 ##INFO=<ID=HOMLEN,Number=1,Type=Integer,Description="Length of base pair identical micro-homology at event breakpoints">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
106 ##INFO=<ID=SPLICESCORE,Number=1,Type=Integer,Description="number of nucleotides similar to GTAG at fusion splice">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
107 ##INFO=<ID=GENE,Number=2,Type=String,Description="Gene Names at each breakend">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
108 ##INFO=<ID=GENEID,Number=2,Type=String,Description="Gene IDs at each breakend">
27
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
109 ##INFO=<ID=GENELOC,Number=2,Type=String,Description="location of breakpoint releative to genes">
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
110 ##INFO=<ID=EXPR,Number=2,Type=Integer,Description="expression of genes as number of concordant pairs aligned to exons">
25
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
111 ##INFO=<ID=ORF,Number=0,Type=Flag,Description="fusion combines genes in a way that preserves a reading frame">
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
112 ##INFO=<ID=EXONBND,Number=0,Type=Flag,Description="fusion splice at exon boundaries">
27
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
113 ##INFO=<ID=INTERCHROM,Number=0,Type=Flag,Description="fusion produced by an interchromosomal translocation">
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
114 ##INFO=<ID=READTHROUGH,Number=0,Type=Flag,Description="fusion involving adjacent potentially resulting from co-transcription rather than genome rearrangement">
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
115 ##INFO=<ID=ADJACENT,Number=0,Type=Flag,Description="fusion between adjacent genes">
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
116 ##INFO=<ID=ALTSPLICE,Number=0,Type=Flag,Description="fusion likely the product of alternative splicing between adjacent genes">
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
117 ##INFO=<ID=DELETION,Number=0,Type=Flag,Description="fusion produced by a genomic deletion">
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
118 ##INFO=<ID=EVERSION,Number=0,Type=Flag,Description="fusion produced by a genomic eversion">
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
119 ##INFO=<ID=INVERSION,Number=0,Type=Flag,Description="fusion produced by a genomic inversion">
25
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
120 #CHROM POS ID REF ALT QUAL FILTER INFO\
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
121 """
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
122
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
123 def cmp_alphanumeric(s1,s2):
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
124 if s1 == s2:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
125 return 0
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
126 a1 = re.findall("\d+|[a-zA-Z]+",s1)
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
127 a2 = re.findall("\d+|[a-zA-Z]+",s2)
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
128 for i in range(min(len(a1),len(a2))):
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
129 if a1[i] == a2[i]:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
130 continue
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
131 if a1[i].isdigit() and a2[i].isdigit():
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
132 return int(a1[i]) - int(a2[i])
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
133 return 1 if a1[i] > a2[i] else -1
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
134 return len(a1) - len(a2)
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
135
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
136 def __main__():
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
137 # VCF functions
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
138 chr_dict = dict()
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
139 def add_vcf_line(chr,pos,id,line):
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
140 if chr not in chr_dict:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
141 pos_dict = dict()
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
142 chr_dict[chr] = pos_dict
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
143 if pos not in chr_dict[chr]:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
144 id_dict = dict()
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
145 chr_dict[chr][pos] = id_dict
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
146 chr_dict[chr][pos][id] = line
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
147
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
148 def write_vcf():
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
149 print >> outputFile, vcf_header % (refname)
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
150 for chr in sorted(chr_dict.keys(),cmp=cmp_alphanumeric):
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
151 for pos in sorted(chr_dict[chr].keys()):
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
152 for id in chr_dict[chr][pos]:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
153 print >> outputFile, chr_dict[chr][pos][id]
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
154 #Parse Command Line
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
155 parser = optparse.OptionParser()
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
156 # files
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
157 parser.add_option( '-i', '--input', dest='input', help='The input defuse results.tsv file (else read from stdin)' )
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
158 parser.add_option( '-o', '--output', dest='output', help='The output vcf file (else write to stdout)' )
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
159 parser.add_option( '-r', '--reference', dest='reference', default=None, help='The genomic reference id' )
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
160 (options, args) = parser.parse_args()
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
161
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
162 # results.tsv input
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
163 if options.input != None:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
164 try:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
165 inputPath = os.path.abspath(options.input)
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
166 inputFile = open(inputPath, 'r')
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
167 except Exception, e:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
168 print >> sys.stderr, "failed: %s" % e
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
169 exit(2)
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
170 else:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
171 inputFile = sys.stdin
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
172 # vcf output
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
173 if options.output != None:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
174 try:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
175 outputPath = os.path.abspath(options.output)
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
176 outputFile = open(outputPath, 'w')
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
177 except Exception, e:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
178 print >> sys.stderr, "failed: %s" % e
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
179 exit(3)
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
180 else:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
181 outputFile = sys.stdout
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
182
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
183 refname = options.reference if options.reference else 'unknown'
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
184
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
185 svtype = 'SVTYPE=BND'
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
186 filt = 'PASS'
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
187 columns = []
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
188 try:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
189 for linenum,line in enumerate(inputFile):
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
190 ## print >> sys.stderr, "%d: %s\n" % (linenum,line)
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
191 fields = line.strip().split('\t')
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
192 if line.startswith('cluster_id'):
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
193 columns = fields
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
194 ## print >> sys.stderr, "columns: %s\n" % columns
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
195 continue
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
196 cluster_id = fields[columns.index('cluster_id')]
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
197 gene_chromosome1 = fields[columns.index('gene_chromosome1')]
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
198 gene_chromosome2 = fields[columns.index('gene_chromosome2')]
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
199 genomic_strand1 = fields[columns.index('genomic_strand1')]
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
200 genomic_strand2 = fields[columns.index('genomic_strand2')]
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
201 gene1 = fields[columns.index('gene1')]
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
202 gene2 = fields[columns.index('gene2')]
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
203 gene_info = 'GENEID=%s,%s' % (gene1,gene2)
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
204 gene_name1 = fields[columns.index('gene_name1')]
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
205 gene_name2 = fields[columns.index('gene_name2')]
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
206 gene_name_info = 'GENE=%s,%s' % (gene_name1,gene_name2)
27
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
207 gene_location1 = fields[columns.index('gene_location1')]
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
208 gene_location2 = fields[columns.index('gene_location2')]
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
209 gene_loc = 'GENELOC=%s,%s' % (gene_location1,gene_location2)
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
210 expression1 = int(fields[columns.index('expression1')])
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
211 expression2 = int(fields[columns.index('expression2')])
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
212 expr = 'EXPR=%d,%d' % (expression1,expression2)
25
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
213 genomic_break_pos1 = int(fields[columns.index('genomic_break_pos1')])
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
214 genomic_break_pos2 = int(fields[columns.index('genomic_break_pos2')])
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
215 breakpoint_homology = int(fields[columns.index('breakpoint_homology')])
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
216 homlen = 'HOMLEN=%s' % breakpoint_homology
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
217 orf = fields[columns.index('orf')] == 'Y'
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
218 exonboundaries = fields[columns.index('exonboundaries')] == 'Y'
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
219 read_through = fields[columns.index('read_through')] == 'Y'
27
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
220 interchromosomal = fields[columns.index('interchromosomal')] == 'Y'
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
221 adjacent = fields[columns.index('adjacent')] == 'Y'
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
222 altsplice = fields[columns.index('altsplice')] == 'Y'
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
223 deletion = fields[columns.index('deletion')] == 'Y'
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
224 eversion = fields[columns.index('eversion')] == 'Y'
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
225 inversion = fields[columns.index('inversion')] == 'Y'
25
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
226 span_count = int(fields[columns.index('span_count')])
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
227 splitr_count = int(fields[columns.index('splitr_count')])
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
228 splice_score = int(fields[columns.index('splice_score')])
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
229 probability = fields[columns.index('probability')] if columns.index('probability') else '.'
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
230 splitr_sequence = fields[columns.index('splitr_sequence')]
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
231 split_seqs = splitr_sequence.split('|')
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
232 mate_id1 = "bnd_%s_1" % cluster_id
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
233 mate_id2 = "bnd_%s_2" % cluster_id
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
234 ref1 = split_seqs[0][-1]
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
235 ref2 = split_seqs[1][0]
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
236 b1 = '[' if genomic_strand1 == '+' else ']'
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
237 b2 = '[' if genomic_strand2 == '+' else ']'
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
238 alt1 = "%s%s%s:%d%s" % (ref1,b2,gene_chromosome2,genomic_break_pos2,b2)
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
239 alt2 = "%s%s:%d%s%s" % (b1,gene_chromosome1,genomic_break_pos1,b1,ref2)
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
240 #TODO evaluate what should be included in the INFO field
27
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
241 info = ['DP=%d' % (span_count + splitr_count),'SPLITCNT=%d' % splitr_count,'SPANCNT=%d' % span_count,gene_name_info,gene_info,gene_loc,expr,homlen,'SPLICESCORE=%d' % splice_score]
25
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
242 if orf:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
243 info.append('ORF')
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
244 if exonboundaries:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
245 info.append('EXONBND')
27
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
246 if interchromosomal:
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
247 info.append('INTERCHROM')
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
248 if read_through:
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
249 info.append('READTHROUGH')
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
250 if adjacent:
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
251 info.append('ADJACENT')
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
252 if altsplice:
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
253 info.append('ALTSPLICE')
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
254 if deletion:
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
255 info.append('DELETION')
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
256 if eversion:
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
257 info.append('EVERSION')
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
258 if inversion:
d57fcac025e2 Add more info fields to defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 25
diff changeset
259 info.append('INVERSION')
34
3099cec648e7 Update tool_dependencies, add help
Jim Johnson <jj@umn.edu>
parents: 32
diff changeset
260 info1 = [svtype,'MATEID=%s;MATELOC=%s:%d' % (mate_id2,gene_chromosome2,genomic_break_pos2)] + info
3099cec648e7 Update tool_dependencies, add help
Jim Johnson <jj@umn.edu>
parents: 32
diff changeset
261 info2 = [svtype,'MATEID=%s;MATELOC=%s:%d' % (mate_id1,gene_chromosome1,genomic_break_pos1)] + info
25
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
262 qual = int(float(fields[columns.index('probability')]) * 255) if columns.index('probability') else '.'
32
8027fc53f3f9 Fix formatting error in defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 28
diff changeset
263 vcf1 = '%s\t%d\t%s\t%s\t%s\t%s\t%s\t%s'% (gene_chromosome1,genomic_break_pos1, mate_id1, ref1, alt1, qual, filt, ';'.join(info1) )
8027fc53f3f9 Fix formatting error in defuse_results_to_vcf.py
Jim Johnson <jj@umn.edu>
parents: 28
diff changeset
264 vcf2 = '%s\t%d\t%s\t%s\t%s\t%s\t%s\t%s'% (gene_chromosome2,genomic_break_pos2, mate_id2, ref2, alt2, qual, filt, ';'.join(info2) )
25
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
265 add_vcf_line(gene_chromosome1,genomic_break_pos1,mate_id1,vcf1)
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
266 add_vcf_line(gene_chromosome2,genomic_break_pos2,mate_id2,vcf2)
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
267 write_vcf()
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
268 except Exception, e:
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
269 print >> sys.stderr, "failed: %s" % e
34
3099cec648e7 Update tool_dependencies, add help
Jim Johnson <jj@umn.edu>
parents: 32
diff changeset
270 sys.exit(1)
25
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
271
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
272 if __name__ == "__main__" : __main__()
2ecf82136986 Define defuse.results.tsv ext as subclass of tabular, add defuse_results_to_vcf to generate vcf form DeFuse results
Jim Johnson <jj@umn.edu>
parents:
diff changeset
273