annotate variant_effect_predictor/Bio/EnsEMBL/IdMapping/SyntenyFramework.pm @ 3:d30fa12e4cc5 default tip

Merge heads 2:a5976b2dce6f and 1:09613ce8151e which were created as a result of a recently fixed bug.
author devteam <devteam@galaxyproject.org>
date Mon, 13 Jan 2014 10:38:30 -0500
parents 1f6dce3d34e0
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1 =head1 LICENSE
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
2
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
3 Copyright (c) 1999-2012 The European Bioinformatics Institute and
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
4 Genome Research Limited. All rights reserved.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
5
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
6 This software is distributed under a modified Apache license.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
7 For license details, please see
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
8
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
9 http://www.ensembl.org/info/about/code_licence.html
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
10
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
11 =head1 CONTACT
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
12
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
13 Please email comments or questions to the public Ensembl
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
14 developers list at <dev@ensembl.org>.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
15
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
16 Questions may also be sent to the Ensembl help desk at
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
17 <helpdesk@ensembl.org>.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
18
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
19 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
20
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
21 =head1 NAME
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
22
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
23 Bio::EnsEMBL::IdMapping::SyntenyFramework - framework representing syntenic
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
24 regions across the genome
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
25
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
26 =head1 SYNOPSIS
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
27
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
28 # build the SyntenyFramework from unambiguous gene mappings
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
29 my $sf = Bio::EnsEMBL::IdMapping::SyntenyFramework->new(
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
30 -DUMP_PATH => $dump_path,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
31 -CACHE_FILE => 'synteny_framework.ser',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
32 -LOGGER => $self->logger,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
33 -CONF => $self->conf,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
34 -CACHE => $self->cache,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
35 );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
36 $sf->build_synteny($gene_mappings);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
37
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
38 # use it to rescore the genes
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
39 $gene_scores = $sf->rescore_gene_matrix_lsf($gene_scores);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
40
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
41 =head1 DESCRIPTION
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
42
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
43 The SyntenyFramework is a set of SyntenyRegions. These are pairs of
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
44 locations very analoguous to the information in the assembly table (the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
45 locations dont have to be the same length though). They are built from
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
46 genes that map uniquely between source and target.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
47
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
48 Once built, the SyntenyFramework is used to score source and target gene
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
49 pairs to determine whether they are similar. This process is slow (it
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
50 involves testing all gene pairs against all SyntenyRegions), this module
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
51 therefor has built-in support to run the process in parallel via LSF.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
52
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
53 =head1 METHODS
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
54
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
55 new
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
56 build_synteny
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
57 _by_overlap
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
58 add_SyntenyRegion
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
59 get_all_SyntenyRegions
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
60 rescore_gene_matrix_lsf
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
61 rescore_gene_matrix
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
62 logger
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
63 conf
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
64 cache
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
65
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
66 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
67
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
68 package Bio::EnsEMBL::IdMapping::SyntenyFramework;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
69
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
70 use strict;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
71 use warnings;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
72 no warnings 'uninitialized';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
73
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
74 use Bio::EnsEMBL::IdMapping::Serialisable;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
75 our @ISA = qw(Bio::EnsEMBL::IdMapping::Serialisable);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
76
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
77 use Bio::EnsEMBL::Utils::Argument qw(rearrange);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
78 use Bio::EnsEMBL::Utils::Exception qw(throw warning);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
79 use Bio::EnsEMBL::Utils::ScriptUtils qw(path_append);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
80 use Bio::EnsEMBL::IdMapping::SyntenyRegion;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
81 use Bio::EnsEMBL::IdMapping::ScoredMappingMatrix;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
82
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
83 use FindBin qw($Bin);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
84 FindBin->again;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
85
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
86
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
87 =head2 new
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
88
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
89 Arg [LOGGER]: Bio::EnsEMBL::Utils::Logger $logger - a logger object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
90 Arg [CONF] : Bio::EnsEMBL::Utils::ConfParser $conf - a configuration object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
91 Arg [CACHE] : Bio::EnsEMBL::IdMapping::Cache $cache - a cache object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
92 Arg [DUMP_PATH] : String - path for object serialisation
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
93 Arg [CACHE_FILE] : String - filename of serialised object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
94 Example : my $sf = Bio::EnsEMBL::IdMapping::SyntenyFramework->new(
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
95 -DUMP_PATH => $dump_path,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
96 -CACHE_FILE => 'synteny_framework.ser',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
97 -LOGGER => $self->logger,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
98 -CONF => $self->conf,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
99 -CACHE => $self->cache,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
100 );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
101 Description : Constructor.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
102 Return type : Bio::EnsEMBL::IdMapping::SyntenyFramework
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
103 Exceptions : thrown on wrong or missing arguments
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
104 Caller : InternalIdMapper plugins
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
105 Status : At Risk
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
106 : under development
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
107
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
108 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
109
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
110 sub new {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
111 my $caller = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
112 my $class = ref($caller) || $caller;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
113 my $self = $class->SUPER::new(@_);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
114
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
115 my ($logger, $conf, $cache) = rearrange(['LOGGER', 'CONF', 'CACHE'], @_);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
116
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
117 unless ($logger and ref($logger) and
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
118 $logger->isa('Bio::EnsEMBL::Utils::Logger')) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
119 throw("You must provide a Bio::EnsEMBL::Utils::Logger for logging.");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
120 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
121
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
122 unless ($conf and ref($conf) and
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
123 $conf->isa('Bio::EnsEMBL::Utils::ConfParser')) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
124 throw("You must provide configuration as a Bio::EnsEMBL::Utils::ConfParser object.");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
125 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
126
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
127 unless ($cache and ref($cache) and
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
128 $cache->isa('Bio::EnsEMBL::IdMapping::Cache')) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
129 throw("You must provide configuration as a Bio::EnsEMBL::IdMapping::Cache object.");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
130 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
131
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
132 # initialise
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
133 $self->logger($logger);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
134 $self->conf($conf);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
135 $self->cache($cache);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
136 $self->{'cache'} = [];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
137
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
138 return $self;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
139 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
140
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
141
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
142 =head2 build_synteny
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
143
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
144 Arg[1] : Bio::EnsEMBL::IdMapping::MappingList $mappings - gene mappings
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
145 to build the SyntenyFramework from
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
146 Example : $synteny_framework->build_synteny($gene_mappings);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
147 Description : Builds the SyntenyFramework from unambiguous gene mappings.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
148 SyntenyRegions are allowed to overlap. At most two overlapping
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
149 SyntenyRegions are merged (otherwise we'd get too large
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
150 SyntenyRegions with little information content).
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
151 Return type : none
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
152 Exceptions : thrown on wrong or missing argument
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
153 Caller : InternalIdMapper plugins
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
154 Status : At Risk
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
155 : under development
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
156
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
157 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
158
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
159 sub build_synteny {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
160 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
161 my $mappings = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
162
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
163 unless ($mappings and
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
164 $mappings->isa('Bio::EnsEMBL::IdMapping::MappingList')) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
165 throw('Need a gene Bio::EnsEMBL::IdMapping::MappingList.');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
166 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
167
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
168 # create a synteny region for each mapping
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
169 my @synteny_regions = ();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
170
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
171 foreach my $entry (@{ $mappings->get_all_Entries }) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
172
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
173 my $source_gene = $self->cache->get_by_key('genes_by_id', 'source',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
174 $entry->source);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
175 my $target_gene = $self->cache->get_by_key('genes_by_id', 'target',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
176 $entry->target);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
177
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
178 my $sr = Bio::EnsEMBL::IdMapping::SyntenyRegion->new_fast([
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
179 $source_gene->start,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
180 $source_gene->end,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
181 $source_gene->strand,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
182 $source_gene->seq_region_name,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
183 $target_gene->start,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
184 $target_gene->end,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
185 $target_gene->strand,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
186 $target_gene->seq_region_name,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
187 $entry->score,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
188 ]);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
189
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
190 push @synteny_regions, $sr;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
191 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
192
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
193 unless (@synteny_regions) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
194 $self->logger->warning("No synteny regions could be identified.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
195 return;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
196 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
197
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
198 # sort synteny regions
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
199 #my @sorted = sort _by_overlap @synteny_regions;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
200 my @sorted = reverse sort {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
201 $a->source_seq_region_name cmp $b->source_seq_region_name ||
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
202 $a->source_start <=> $b->source_start ||
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
203 $a->source_end <=> $b->source_end } @synteny_regions;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
204
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
205 $self->logger->info("SyntenyRegions before merging: ".scalar(@sorted)."\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
206
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
207 # now create merged regions from overlapping syntenies, but only merge a
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
208 # maximum of 2 regions (otherwise you end up with large synteny blocks which
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
209 # won't contain much information in this context)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
210 my $last_merged = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
211 my $last_sr = shift(@sorted);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
212
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
213 while (my $sr = shift(@sorted)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
214 #$self->logger->debug("this ".$sr->to_string."\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
215
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
216 my $merged_sr = $last_sr->merge($sr);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
217
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
218 if (! $merged_sr) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
219 unless ($last_merged) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
220 $self->add_SyntenyRegion($last_sr->stretch(2));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
221 #$self->logger->debug("nnn ".$last_sr->to_string."\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
222 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
223 $last_merged = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
224 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
225 $self->add_SyntenyRegion($merged_sr->stretch(2));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
226 #$self->logger->debug("mmm ".$merged_sr->to_string."\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
227 $last_merged = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
228 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
229
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
230 $last_sr = $sr;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
231 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
232
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
233 # deal with last synteny region in @sorted
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
234 unless ($last_merged) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
235 $self->add_SyntenyRegion($last_sr->stretch(2));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
236 $last_merged = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
237 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
238
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
239 #foreach my $sr (@{ $self->get_all_SyntenyRegions }) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
240 # $self->logger->debug("SRs ".$sr->to_string."\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
241 #}
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
242
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
243 $self->logger->info("SyntenyRegions after merging: ".scalar(@{ $self->get_all_SyntenyRegions })."\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
244
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
245 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
246
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
247
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
248 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
249 # sort SyntenyRegions by overlap
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
250 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
251 sub _by_overlap {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
252 # first sort by seq_region
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
253 my $retval = ($b->source_seq_region_name cmp $a->source_seq_region_name);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
254 return $retval if ($retval);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
255
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
256 # then sort by overlap:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
257 # return -1 if $a is downstream, 1 if it's upstream, 0 if they overlap
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
258 if ($a->source_end < $b->source_start) { return 1; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
259 if ($a->source_start < $b->source_end) { return -1; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
260 return 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
261 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
262
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
263
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
264 =head2 add_SyntenyRegion
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
265
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
266 Arg[1] : Bio::EnsEMBL::IdMaping::SyntenyRegion - SyntenyRegion to add
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
267 Example : $synteny_framework->add_SyntenyRegion($synteny_region);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
268 Description : Adds a SyntenyRegion to the framework. For speed reasons (and
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
269 since this is an internal method), no argument check is done.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
270 Return type : none
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
271 Exceptions : none
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
272 Caller : internal
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
273 Status : At Risk
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
274 : under development
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
275
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
276 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
277
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
278 sub add_SyntenyRegion {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
279 push @{ $_[0]->{'cache'} }, $_[1];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
280 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
281
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
282
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
283 =head2 get_all_SyntenyRegions
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
284
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
285 Example : foreach my $sr (@{ $sf->get_all_SyntenyRegions }) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
286 # do something with the SyntenyRegion
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
287 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
288 Description : Get a list of all SyntenyRegions in the framework.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
289 Return type : Arrayref of Bio::EnsEMBL::IdMapping::SyntenyRegion
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
290 Exceptions : none
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
291 Caller : general
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
292 Status : At Risk
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
293 : under development
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
294
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
295 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
296
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
297 sub get_all_SyntenyRegions {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
298 return $_[0]->{'cache'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
299 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
300
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
301
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
302 =head2 rescore_gene_matrix_lsf
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
303
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
304 Arg[1] : Bio::EnsEMBL::IdMapping::ScoredmappingMatrix $matrix - gene
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
305 scores to rescore
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
306 Example : my $new_scores = $sf->rescore_gene_matrix_lsf($gene_scores);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
307 Description : This method runs rescore_gene_matrix() (via the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
308 synteny_resocre.pl script) in parallel with lsf, then combines
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
309 the results to return a single rescored scoring matrix.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
310 Parallelisation is done by chunking the scoring matrix into
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
311 several pieces (determined by the --synteny_rescore_jobs
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
312 configuration option).
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
313 Return type : Bio::EnsEMBL::IdMapping::ScoredMappingMatrix
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
314 Exceptions : thrown on wrong or missing argument
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
315 thrown on filesystem I/O error
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
316 thrown on failure of one or mor lsf jobs
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
317 Caller : InternalIdMapper plugins
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
318 Status : At Risk
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
319 : under development
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
320
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
321 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
322
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
323 sub rescore_gene_matrix_lsf {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
324 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
325 my $matrix = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
326
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
327 unless ($matrix and
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
328 $matrix->isa('Bio::EnsEMBL::IdMapping::ScoredMappingMatrix')) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
329 throw('Need a Bio::EnsEMBL::IdMapping::ScoredMappingMatrix.');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
330 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
331
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
332 # serialise SyntenyFramework to disk
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
333 $self->logger->debug("Serialising SyntenyFramework...\n", 0, 'stamped');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
334 $self->write_to_file;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
335 $self->logger->debug("Done.\n", 0, 'stamped');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
336
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
337 # split the ScoredMappingMatrix into chunks and write to disk
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
338 my $matrix_size = $matrix->size;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
339 $self->logger->debug("Scores before rescoring: $matrix_size.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
340
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
341 my $num_jobs = $self->conf->param('synteny_rescore_jobs') || 20;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
342 $num_jobs++;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
343
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
344 my $dump_path = path_append($self->conf->param('basedir'),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
345 'matrix/synteny_rescore');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
346
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
347 $self->logger->debug("Creating sub-matrices...\n", 0, 'stamped');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
348 foreach my $i (1..$num_jobs) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
349 my $start = (int($matrix_size/($num_jobs-1)) * ($i - 1)) + 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
350 my $end = int($matrix_size/($num_jobs-1)) * $i;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
351 $self->logger->debug("$start-$end\n", 1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
352 my $sub_matrix = $matrix->sub_matrix($start, $end);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
353
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
354 $sub_matrix->cache_file_name("gene_matrix_synteny$i.ser");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
355 $sub_matrix->dump_path($dump_path);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
356
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
357 $sub_matrix->write_to_file;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
358 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
359 $self->logger->debug("Done.\n", 0, 'stamped');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
360
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
361 # create an empty lsf log directory
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
362 my $logpath = path_append($self->logger->logpath, 'synteny_rescore');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
363 system("rm -rf $logpath") == 0 or
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
364 $self->logger->error("Unable to delete lsf log dir $logpath: $!\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
365 system("mkdir -p $logpath") == 0 or
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
366 $self->logger->error("Can't create lsf log dir $logpath: $!\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
367
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
368 # build lsf command
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
369 my $lsf_name = 'idmapping_synteny_rescore_'.time;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
370
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
371 my $options = $self->conf->create_commandline_options(
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
372 logauto => 1,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
373 logautobase => "synteny_rescore",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
374 logpath => $logpath,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
375 interactive => 0,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
376 is_component => 1,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
377 );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
378
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
379 my $cmd = qq{$Bin/synteny_rescore.pl $options --index \$LSB_JOBINDEX};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
380
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
381 my $bsub_cmd =
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
382 sprintf( "|bsub -J%s[1-%d] "
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
383 . "-o %s/synteny_rescore.%%I.out "
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
384 . "-e %s/synteny_rescore.%%I.err %s",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
385 $lsf_name, $num_jobs, $logpath, $logpath,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
386 $self->conf()->param('lsf_opt_synteny_rescore') );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
387
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
388 # run lsf job array
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
389 $self->logger->info("Submitting $num_jobs jobs to lsf.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
390 $self->logger->debug("$cmd\n\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
391
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
392 local *BSUB;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
393 open( BSUB, $bsub_cmd )
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
394 or $self->logger->error("Could not open open pipe to bsub: $!\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
395
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
396 print BSUB $cmd;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
397 $self->logger->error("Error submitting synteny rescoring jobs: $!\n")
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
398 unless ($? == 0);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
399 close BSUB;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
400
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
401 # submit dependent job to monitor finishing of jobs
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
402 $self->logger->info("Waiting for jobs to finish...\n", 0, 'stamped');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
403
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
404 my $dependent_job = qq{bsub -K -w "ended($lsf_name)" -q small } .
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
405 qq{-o $logpath/synteny_rescore_depend.out /bin/true};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
406
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
407 system($dependent_job) == 0 or
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
408 $self->logger->error("Error submitting dependent job: $!\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
409
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
410 $self->logger->info("All jobs finished.\n", 0, 'stamped');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
411
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
412 # check for lsf errors
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
413 sleep(5);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
414 my $err;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
415 foreach my $i (1..$num_jobs) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
416 $err++ unless (-e "$logpath/synteny_rescore.$i.success");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
417 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
418
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
419 if ($err) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
420 $self->logger->error("At least one of your jobs failed.\nPlease check the logfiles at $logpath for errors.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
421 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
422
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
423 # merge and return matrix
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
424 $self->logger->debug("Merging rescored matrices...\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
425 $matrix->flush;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
426
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
427 foreach my $i (1..$num_jobs) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
428 # read partial matrix created by lsf job from file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
429 my $sub_matrix = Bio::EnsEMBL::IdMapping::ScoredMappingMatrix->new(
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
430 -DUMP_PATH => $dump_path,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
431 -CACHE_FILE => "gene_matrix_synteny$i.ser",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
432 );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
433 $sub_matrix->read_from_file;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
434
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
435 # merge with main matrix
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
436 $matrix->merge($sub_matrix);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
437 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
438
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
439 $self->logger->debug("Done.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
440 $self->logger->debug("Scores after rescoring: ".$matrix->size.".\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
441
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
442 return $matrix;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
443 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
444
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
445
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
446 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
447 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
448 =head2 rescore_gene_matrix
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
449
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
450 Arg[1] : Bio::EnsEMBL::IdMapping::ScoredmappingMatrix $matrix - gene
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
451 scores to rescore
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
452 Example : my $new_scores = $sf->rescore_gene_matrix($gene_scores);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
453 Description : Rescores a gene matrix. Retains 70% of old score and builds
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
454 other 30% from the synteny match.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
455 Return type : Bio::EnsEMBL::IdMapping::ScoredMappingMatrix
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
456 Exceptions : thrown on wrong or missing argument
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
457 Caller : InternalIdMapper plugins
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
458 Status : At Risk
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
459 : under development
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
460
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
461 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
462
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
463 sub rescore_gene_matrix {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
464 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
465 my $matrix = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
466
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
467 unless ($matrix and
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
468 $matrix->isa('Bio::EnsEMBL::IdMapping::ScoredMappingMatrix')) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
469 throw('Need a Bio::EnsEMBL::IdMapping::ScoredMappingMatrix.');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
470 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
471
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
472 my $retain_factor = 0.7;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
473
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
474 foreach my $entry (@{ $matrix->get_all_Entries }) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
475 my $source_gene = $self->cache->get_by_key('genes_by_id', 'source',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
476 $entry->source);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
477
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
478 my $target_gene = $self->cache->get_by_key('genes_by_id', 'target',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
479 $entry->target);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
480
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
481 my $highest_score = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
482
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
483 foreach my $sr (@{ $self->get_all_SyntenyRegions }) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
484 my $score = $sr->score_location_relationship($source_gene, $target_gene);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
485 $highest_score = $score if ($score > $highest_score);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
486 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
487
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
488 #$self->logger->debug("highscore ".$entry->to_string." ".
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
489 # sprintf("%.6f\n", $highest_score));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
490
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
491 $matrix->set_score($entry->source, $entry->target,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
492 ($entry->score * 0.7 + $highest_score * 0.3));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
493 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
494
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
495 return $matrix;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
496 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
497
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
498
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
499 =head2 logger
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
500
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
501 Arg[1] : (optional) Bio::EnsEMBL::Utils::Logger - the logger to set
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
502 Example : $object->logger->info("Starting ID mapping.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
503 Description : Getter/setter for logger object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
504 Return type : Bio::EnsEMBL::Utils::Logger
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
505 Exceptions : none
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
506 Caller : constructor
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
507 Status : At Risk
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
508 : under development
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
509
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
510 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
511
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
512 sub logger {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
513 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
514 $self->{'_logger'} = shift if (@_);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
515 return $self->{'_logger'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
516 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
517
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
518
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
519 =head2 conf
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
520
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
521 Arg[1] : (optional) Bio::EnsEMBL::Utils::ConfParser - the configuration
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
522 to set
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
523 Example : my $basedir = $object->conf->param('basedir');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
524 Description : Getter/setter for configuration object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
525 Return type : Bio::EnsEMBL::Utils::ConfParser
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
526 Exceptions : none
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
527 Caller : constructor
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
528 Status : At Risk
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
529 : under development
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
530
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
531 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
532
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
533 sub conf {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
534 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
535 $self->{'_conf'} = shift if (@_);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
536 return $self->{'_conf'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
537 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
538
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
539
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
540 =head2 cache
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
541
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
542 Arg[1] : (optional) Bio::EnsEMBL::IdMapping::Cache - the cache to set
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
543 Example : $object->cache->read_from_file('source');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
544 Description : Getter/setter for cache object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
545 Return type : Bio::EnsEMBL::IdMapping::Cache
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
546 Exceptions : none
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
547 Caller : constructor
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
548 Status : At Risk
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
549 : under development
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
550
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
551 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
552
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
553 sub cache {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
554 my $self = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
555 $self->{'_cache'} = shift if (@_);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
556 return $self->{'_cache'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
557 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
558
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
559
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
560 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
561