annotate rlGAT/gat-compare.py @ 11:53487f21c0d5 draft

Uploaded
author fubar
date Thu, 29 Aug 2013 01:57:54 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
11
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
1 ################################################################################
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
2 #
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
3 # MRC FGU Computational Genomics Group
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
4 #
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
5 # $Id: script_template.py 2871 2010-03-03 10:20:44Z andreas $
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
6 #
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
7 # Copyright (C) 2009 Andreas Heger
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
8 #
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
9 # This program is free software; you can redistribute it and/or
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
10 # modify it under the terms of the GNU General Public License
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
11 # as published by the Free Software Foundation; either version 2
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
12 # of the License, or (at your option) any later version.
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
13 #
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
14 # This program is distributed in the hope that it will be useful,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
15 # but WITHOUT ANY WARRANTY; without even the implied warranty of
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
16 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
17 # GNU General Public License for more details.
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
18 #
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
19 # You should have received a copy of the GNU General Public License
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
20 # along with this program; if not, write to the Free Software
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
21 # Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
22 #################################################################################
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
23 '''
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
24 gat-compare - compare two gat runs
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
25 ==================================
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
26
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
27 :Author: Andreas Heger
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
28 :Release: $Id$
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
29 :Date: |today|
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
30 :Tags: Python
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
31
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
32 Purpose
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
33 -------
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
34
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
35 This script compares gat runs and compares statistical significance of fold change differences between two or more
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
36 regions of interest.
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
37
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
38 This script requires counts files created by gat (option ``--output-counts-file``).
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
39
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
40 The reported observed value between sets 1 and 2 is the difference of fc2 and fc1:
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
41
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
42 * observed = fold2 - fold1
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
43
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
44 The script accepts one or more output files from a gat run.
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
45
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
46 If only a single file is given, the significance of fold change differences are computed
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
47 for each pairwise comparison of annotations. Thus, the script might answer the question
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
48 if a transcription factor is more enriched in promotors than in enhancers.
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
49
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
50 If multiple files are given, the script computes the fold change differences for the same
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
51 annotation for all pairwise combinations of :term:`segments of interest` across the different
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
52 files. Thus, the script might answer the question if transcription factor A is more enriched
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
53 in promotors than transcription factor B is enriched in promotors.
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
54
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
55 Usage
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
56 -----
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
57
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
58 Example::
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
59
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
60 python gat-compare.py tfA.counts.tsv.gz tfB.counts.tsv.gz
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
61
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
62 Type::
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
63
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
64 python gat-compare.py --help
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
65
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
66 for command line help.
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
67
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
68 Documentation
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
69 -------------
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
70
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
71 Code
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
72 ----
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
73
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
74 '''
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
75
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
76 import os, sys, re, optparse, collections, types, glob, time, math
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
77 import itertools
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
78 import numpy
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
79
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
80 import gat
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
81 import gat.Experiment as E
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
82 import gat.IOTools as IOTools
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
83 import gat.IO as IO
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
84
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
85 def main( argv = None ):
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
86 """script main.
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
87
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
88 parses command line options in sys.argv, unless *argv* is given.
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
89 """
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
90
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
91 if not argv: argv = sys.argv
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
92
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
93 # setup command line parser
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
94 parser = optparse.OptionParser( version = "%prog version: $Id: script_template.py 2871 2010-03-03 10:20:44Z andreas $",
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
95 usage = globals()["__doc__"] )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
96
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
97 parser.add_option("-o", "--order", dest="output_order", type="choice",
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
98 choices = ( "track", "annotation", "fold", "pvalue", "qvalue", "observed" ),
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
99 help="order results in output by fold, track, etc. [default=%default]." )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
100
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
101 parser.add_option("-p", "--pvalue-method", dest="pvalue_method", type="choice",
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
102 choices = ( "empirical", "norm", ),
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
103 help="type of pvalue reported [default=%default]." )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
104
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
105 parser.add_option("-q", "--qvalue-method", dest="qvalue_method", type="choice",
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
106 choices = ( "storey", "BH", "bonferroni", "holm", "hommel", "hochberg", "BY", "none" ),
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
107 help="method to perform multiple testing correction by controlling the fdr [default=%default]." )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
108
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
109 parser.add_option( "--qvalue-lambda", dest="qvalue_lambda", type="float",
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
110 help="fdr computation: lambda [default=%default]." )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
111
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
112 parser.add_option( "--qvalue-pi0-method", dest="qvalue_pi0_method", type="choice",
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
113 choices = ("smoother", "bootstrap" ),
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
114 help="fdr computation: method for estimating pi0 [default=%default]." )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
115
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
116 parser.add_option( "--descriptions", dest="input_filename_descriptions", type="string",
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
117 help="filename mapping annotation terms to descriptions. "
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
118 " if given, the output table will contain additional columns "
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
119 " [default=%default]" )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
120
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
121 parser.add_option( "--pseudo-count", dest="pseudo_count", type="float",
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
122 help="pseudo count. The pseudo count is added to both the observed and expected overlap. "
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
123 " Using a pseudo-count avoids gat reporting fold changes of 0 [default=%default]." )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
124
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
125 parser.add_option( "--output-plots-pattern", dest="output_plots_pattern", type="string",
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
126 help="output pattern for plots [default=%default]" )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
127
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
128 parser.set_defaults(
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
129 pvalue_method = "empirical",
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
130 qvalue_method = "BH",
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
131 qvalue_lambda = None,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
132 qvalue_pi0_method = "smoother",
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
133 # pseudo count for fold change computation to avoid 0 fc
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
134 pseudo_count = 1.0,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
135 output_order = "observed",
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
136 )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
137
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
138 ## add common options (-h/--help, ...) and parse command line
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
139 (options, args) = E.Start( parser, argv = argv, add_output_options = True )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
140
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
141 input_filenames_counts = args
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
142
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
143 ##################################################
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
144 E.info( "received %i filenames with counts" % len(input_filenames_counts))
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
145
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
146 ##################################################
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
147 description_header, descriptions, description_width = IO.readDescriptions( options )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
148
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
149 all_annotator_results = []
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
150
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
151 for input_filename_counts in input_filenames_counts:
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
152
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
153 E.info("processing %s" % input_filename_counts )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
154
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
155 annotator_results = gat.fromCounts( input_filename_counts )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
156
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
157 ##################################################
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
158 if options.pvalue_method != "empirical":
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
159 E.info("updating pvalues to %s" % options.pvalue_method )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
160 gat.updatePValues( annotator_results, options.pvalue_method )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
161
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
162 ##################################################
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
163 ##################################################
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
164 ##################################################
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
165 ## compute global fdr
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
166 ##################################################
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
167 E.info( "computing FDR statistics" )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
168 gat.updateQValues( annotator_results,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
169 method = options.qvalue_method,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
170 vlambda = options.qvalue_lambda,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
171 pi0_method = options.qvalue_pi0_method )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
172
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
173 all_annotator_results.append( annotator_results )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
174
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
175 pseudo_count = options.pseudo_count
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
176 results = []
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
177
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
178 if len(all_annotator_results) == 1:
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
179 E.info("performing pairwise comparison within a file")
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
180
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
181 # collect all annotations
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
182 annotations, segments = list(), set()
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
183 for x in all_annotator_results[0]:
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
184 segments.add( x.track)
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
185 annotations.append( x )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
186
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
187 if len(segments) != 1:
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
188 raise NotImplementedError( "multiple segments of interest")
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
189
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
190 for data1, data2 in itertools.combinations(annotations, 2):
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
191
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
192 # note that fold changes can be very large if there are 0 samples
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
193 # this is fine for getting the distributional params (mean, stddev)
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
194 fold_changes1 = data1.observed / (data1.samples + pseudo_count )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
195 fold_changes2 = data2.observed / (data2.samples + pseudo_count )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
196
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
197 # add a separate fc pseudo-count to avoid 0 values
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
198 fold_changes1 += 0.0001
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
199 fold_changes2 += 0.0001
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
200
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
201 # Test is if relative fold change rfc is different from 1
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
202 # note: rfc = fc1 / fc2 = obs1 / exp1 * obs2 / exp2
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
203 # = obs1 / obs2 * exp2 / exp1
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
204 # Thus, it is equivalent to test rfc = obs1/obs2 versus exp2 / exp1
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
205 #
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
206 # Convert to log space for easier plotting
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
207 # Move the observed fold ratio in order to get an idea of the magnitude
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
208 # of the underlying fold change
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
209 delta_fold = data2.fold - data1.fold
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
210 sampled_delta_fold = numpy.log( fold_changes1 / fold_changes2) + delta_fold
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
211 observed_delta_fold = 0.0 + delta_fold
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
212
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
213 result = gat.AnnotatorResult( data1.annotation, data2.annotation,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
214 "na",
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
215 observed_delta_fold,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
216 sampled_delta_fold,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
217 reference = None,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
218 pseudo_count = 0 )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
219
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
220
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
221 results.append( result )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
222
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
223
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
224 else:
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
225 E.info("performing pairwise comparison between files")
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
226
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
227 ##################################################
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
228 # perform pairwise comparison
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
229 for index1, index2 in itertools.combinations( range(len(input_filenames_counts)), 2):
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
230 E.info( "comparing %i and %i" % (index1,index2))
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
231 a,b = all_annotator_results[index1], all_annotator_results[index2]
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
232
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
233 # index results in a and b
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
234 aa = collections.defaultdict( dict )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
235 for x in a: aa[x.track][x.annotation] = x
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
236
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
237 bb = collections.defaultdict( dict )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
238 for x in b: bb[x.track][x.annotation] = x
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
239
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
240 if len(aa.keys()) != 1 or len(bb.keys()) != 1:
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
241 raise NotImplementedError( "multiple segments of interest")
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
242
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
243 track = "merged"
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
244 # get shared annotations
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
245 annotations1 = aa[track].keys()
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
246 annotations2 = bb[track].keys()
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
247 shared_annotations = list( set(annotations1).intersection( set(annotations2) ) )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
248 E.info( "%i shared annotations" % len(shared_annotations))
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
249
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
250 for annotation in shared_annotations:
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
251
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
252 # if not annotation.startswith("Ram:"): continue
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
253
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
254 data1 = aa[track][annotation]
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
255 data2 = bb[track][annotation]
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
256
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
257 # note that fold changes can be very large if there are 0 samples
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
258 # this is fine for getting the distributional params (mean, stddev)
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
259 fold_changes1 = data1.observed / (data1.samples + pseudo_count )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
260 fold_changes2 = data2.observed / (data2.samples + pseudo_count )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
261
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
262 # add a separate fc pseudo-count to avoid 0 values
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
263 fold_changes1 += 0.0001
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
264 fold_changes2 += 0.0001
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
265
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
266 # Test is if relative fold change rfc is different from 1
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
267 # note: rfc = fc1 / fc2 = obs1 / exp1 * obs2 / exp2
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
268 # = obs1 / obs2 * exp2 / exp1
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
269 # Thus, it is equivalent to test rfc = obs1/obs2 versus exp2 / exp1
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
270 #
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
271 # Convert to log space for easier plotting
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
272 # Move the observed fold ratio in order to get an idea of the magnitude
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
273 # of the underlying fold change
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
274 delta_fold = data2.fold - data1.fold
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
275 sampled_delta_fold = numpy.log( fold_changes1 / fold_changes2) + delta_fold
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
276 observed_delta_fold = 0.0 + delta_fold
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
277
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
278 result = gat.AnnotatorResult( track, annotation,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
279 "na",
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
280 observed_delta_fold,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
281 sampled_delta_fold,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
282 reference = None,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
283 pseudo_count = 0 )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
284
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
285
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
286 results.append( result )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
287
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
288
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
289 if len(results) == 0:
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
290 E.critical( "no results found" )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
291 E.Stop()
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
292 return
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
293
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
294 IO.outputResults( results,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
295 options,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
296 gat.AnnotatorResult.headers,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
297 description_header,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
298 description_width,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
299 descriptions,
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
300 format_observed = "%6.4f" )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
301
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
302 IO.plotResults( results, options )
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
303
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
304 ## write footer and output benchmark information.
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
305 E.Stop()
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
306
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
307 if __name__ == "__main__":
53487f21c0d5 Uploaded
fubar
parents:
diff changeset
308 sys.exit( main( sys.argv) )