comparison variant_effect_predictor/Bio/EnsEMBL/Compara/AlignSlice.pm @ 0:21066c0abaf5 draft

Uploaded
author willmclaren
date Fri, 03 Aug 2012 10:04:48 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:21066c0abaf5
1 =head1 LICENSE
2
3 Copyright (c) 1999-2012 The European Bioinformatics Institute and
4 Genome Research Limited. All rights reserved.
5
6 This software is distributed under a modified Apache license.
7 For license details, please see
8
9 http://www.ensembl.org/info/about/code_licence.html
10
11 =head1 CONTACT
12
13 Please email comments or questions to the public Ensembl
14 developers list at <dev@ensembl.org>.
15
16 Questions may also be sent to the Ensembl help desk at
17 <helpdesk@ensembl.org>.
18
19 =head1 NAME
20
21 Bio::EnsEMBL::Compara::AlignSlice - An AlignSlice can be used to map genes and features from one species onto another one
22
23 =head1 DESCRIPTION
24
25 INTRODUCTION
26
27 An AlignSlice is an object built with a reference Slice and the corresponding set of genomic alignments.
28 The genomic alignments are used to map features from one species onto another and viceversa.
29
30 STRUCTURE
31
32 Every Bio::EnsEMBL::Compara::AlignSlice contains a set of Bio::EnsEMBL::Compara::AlignSlice::Slice
33 objects, at least one by species involved in the alignments. For instance, if the reference Slice is a
34 human slice and the set of alignments corresponds to human-mouse BLASTZ_NET alignments, there will be at
35 least one Bio::EnsEMBL::Compara::AlignSlice::Slice for human and at least another one for mouse. The main
36 Bio::EnsEMBL::Compara::AlignSlice::Slice for the reference species will contain a single genomic sequence
37 whilst the other Bio::EnsEMBL::Compara::AlignSlice::Slice objects might be made of several pieces of
38 genomic sequences, depending on the set of alignments. Here is a graphical representation:
39
40 ref.Slice **************************************************************
41
42 alignments 11111111111
43 2222222222222222
44 33333333333333333
45
46 resulting Bio::EnsEMBL::Compara::AlignSlice:
47
48 AS::Slice 1 **************************************************************
49 AS::Slice 2 .11111111111....2222222222222222......33333333333333333.......
50
51 MODES
52
53 Two modes are currently available: condensed and expanded mode. The default mode is "condensed". In
54 condensed mode, no gaps are allowed in the reference Slice which means that information about deletions
55 in the reference species (i.e. insertions in the other species) are lost. On the other hand, the
56 first Bio::EnsEMBL::Compara::AlignSlice::Slice object corresponds to the original Bio::EnsEMBL::Slice.
57
58 In the expanded mode, the first Bio::EnsEMBL::Compara::AlignSlice::Slice is expanded in order to
59 accomodate the gaps corresponding to the deletions (insertions). Bear in mind that in expanded mode, the
60 length of the resulting AlignSlice will be most probably larger than the length of the reference
61 Bio::EnsEMBL::Slice.
62
63
64 OVERLAPPING ALIGNMENTS
65
66 No overlapping alignments are allowed by default. This means that if an alignment overlaps another one, the
67 second alignment is ignored. This is due to lack of information needed to reconciliate both alignment.
68 Here is a graphical example showing this problem:
69
70 ALN 1: Human (ref) CTGTGAAAA----CCCCATTAGG
71 Mouse (1) CTGAAAATTTTCCCC
72
73 ALN 2: Human (ref) CTGTGAAAA---CCCCATTAGG
74 Mouse (1) AAAGGGCCCCATTA
75
76 Possible solution 1:
77 Human (ref) CTGTGAAAA----CCCCATTAGG
78 Mouse (1) CTGAAAATTTTCCCC----
79 Mouse (1) ----AAA-GGGCCCCATTA
80
81 Possible solution 2:
82 Human (ref) CTGTGAAAA----CCCCATTAGG
83 Mouse (1) CTGAAAATTTTCCCC----
84 Mouse (1) ----AAAGGG-CCCCATTA
85
86 Possible solution 3:
87 Human (ref) CTGTGAAAA-------CCCCATTAGG
88 Mouse (1) CTGAAAATTTT---CCCC
89 Mouse (1) AAA----GGGCCCCATTA
90
91 There is no easy way to find which of these possible solution is the best without trying to realign the
92 three sequences together and this is far beyond the aim of this module. The best solution is to start with
93 multiple alignments instead of overlapping pairwise ones.
94
95 The third possibility is probably not the best alignment you can get although its implementation is
96 systematic (insert as many gaps as needed in order to accommodate the insertions and never ever overlap
97 them) and computationally cheap as no realignment is needed. You may ask this module to solve overlapping
98 alignments in this way using the "solve_overlapping" option.
99
100 RESULT
101
102 The AlignSlice results in a set of Bio::EnsEMBL::Compara::AlignSlice::Slice which are objects really
103 similar to the Bio::EnsEMBL::Slice. There are intended to be used just like the genuine
104 Bio::EnsEMBL::Slice but some methods do not work though. Some examples of non-ported methods are: expand()
105 and invert(). Some other methods work as expected (seq, subseq, get_all_Attributes,
106 get_all_VariationFeatures, get_all_RepeatFeatures...). All these Bio::EnsEMBL::Compara::AlignSlice::Slice
107 share the same fake coordinate system defined by the Bio::EnsEMBL::Compara::AlignSlice. This allows to
108 map features from one species onto the others.
109
110 =head1 SYNOPSIS
111
112 use Bio::EnsEMBL::Compara::AlignSlice;
113
114 ## You may create your own AlignSlice objects but if you are interested in
115 ## getting AlignSlice built with data from an EnsEMBL Compara database you
116 ## should consider using the Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor
117 ## instead
118 my $align_slice = new Bio::EnsEMBL::Compara::AlignSlice(
119 -adaptor => $align_slice_adaptor,
120 -reference_Slice => $reference_Slice,
121 -method_link_species_set => $all_genomic_align_blocks,
122 -genomicAlignBlocks => $all_genomic_align_blocks,
123 -expanded => 1
124 -solve_overlapping => 1
125 );
126
127 my $all_slices = $aling_slice->get_all_Slices();
128 foreach my $this_slice (@$all_slices) {
129 ## See also Bio::EnsEMBL::Compara::AlignSlice::Slice
130 my $species_name = $this_slice->genome_db->name()
131 my $all_mapped_genes = $this_slice->get_all_Genes();
132 }
133
134 my $simple_align = $align_slice->get_projected_SimpleAlign();
135
136 SET VALUES
137 $align_slice->adaptor($align_slice_adaptor);
138 $align_slice->reference_Slice($reference_slice);
139 $align_slice->add_GenomicAlignBlock($genomic_align_block_1);
140 $align_slice->add_GenomicAlignBlock($genomic_align_block_2);
141 $align_slice->add_GenomicAlignBlock($genomic_align_block_3);
142
143 GET VALUES
144 my $align_slice_adaptor = $align_slice->adaptor();
145 my $reference_slice = $align_slice->reference_Slice();
146 my $all_genomic_align_blocks = $align_slice->get_all_GenomicAlignBlock();
147 my $mapped_genes = $align_slice->get_all_Genes_by_genome_db_id(3);
148 my $simple_align = $align_slice->get_projected_SimpleAlign();
149
150 =head1 OBJECT ATTRIBUTES
151
152 =over
153
154 =item adaptor
155
156 Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor object to access DB
157
158 =item reference_slice
159
160 Bio::EnsEMBL::Slice object used to create this Bio::EnsEBML::Compara::AlignSlice object
161
162 =item all_genomic_align_blocks
163
164 a listref of Bio::EnsEMBL::Compara::GenomicAlignBlock objects found using the reference_Slice
165
166 =back
167
168 =head1 APPENDIX
169
170 The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _
171
172 =cut
173
174
175 # Let the code begin...
176
177
178 package Bio::EnsEMBL::Compara::AlignSlice;
179
180 use strict;
181 use Bio::EnsEMBL::Utils::Argument qw(rearrange);
182 use Bio::EnsEMBL::Utils::Exception qw(throw warning info verbose);
183 use Bio::EnsEMBL::Compara::AlignSlice::Exon;
184 use Bio::EnsEMBL::Compara::AlignSlice::Slice;
185 use Bio::EnsEMBL::Compara::GenomicAlignBlock;
186 use Bio::EnsEMBL::Compara::GenomicAlign;
187 use Bio::SimpleAlign;
188
189 ## Creates a new coordinate system for creating empty Slices.
190
191 my $aligngap_coord_system = new Bio::EnsEMBL::CoordSystem(
192 -NAME => 'alignment',
193 -VERSION => "none",
194 -TOP_LEVEL => 0,
195 -SEQUENCE_LEVEL => 1,
196 -RANK => 1,
197 );
198
199 =head2 new (CONSTRUCTOR)
200
201 Arg[1] : a reference to a hash where keys can be:
202 -adaptor
203 -reference_slice
204 -genomic_align_blocks
205 -method_link_species_set
206 -expanded
207 -solve_overlapping
208 -preserve_blocks
209 -adaptor: the Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor
210 -reference_slice:
211 Bio::EnsEMBL::Slice, the guide slice for this align_slice
212 -genomic_align_blocks:
213 listref of Bio::EnsEMBL::Compara::GenomicAlignBlock
214 objects containing to the alignments to be used for this
215 align_slice
216 -method_link_species_set;
217 Bio::EnsEMBL::Compara::MethodLinkSpeciesSet object for
218 all the previous genomic_align_blocks. At the moment all
219 the blocks should correspond to the same MethodLinkSpeciesSet
220 -expanded: boolean flag. If set to true, the AlignSlice will insert all
221 the gaps requiered by the alignments in the reference_slice
222 (see MODES elsewhere in this document)
223 -solve_overlapping:
224 boolean flag. If set to true, the AlignSlice will allow
225 overlapping alginments and solve indeterminations according
226 to the method described in OVERLAPPING ALIGNMENTS elsewhere
227 in this document
228 -preserve_blocks:
229 boolean flag. By default the AlignSlice trim the alignments
230 in order to fit the reference_slice. This flags tell the
231 AlignSlice to use the alignment block as they are (usually
232 this is only used by the AlignSliceAdaptor, use with care)
233 Example : my $align_slice =
234 new Bio::EnsEMBL::Compara::AlignSlice(
235 -adaptor => $align_slice_adaptor,
236 -reference_slice => $reference_slice,
237 -method_link_species_set => $method_link_species_set,
238 -genomic_align_blocks => [$gab1, $gab2],
239 -expanded => 1
240 );
241 Description: Creates a new Bio::EnsEMBL::Compara::AlignSlice object
242 Returntype : Bio::EnsEMBL::Compara::AlignSlice object
243 Exceptions : none
244 Caller : general
245
246 =cut
247
248 sub new {
249 my ($class, @args) = @_;
250
251 my $self = {};
252 bless $self,$class;
253
254 my ($adaptor, $reference_slice, $genomic_align_blocks, $genomic_align_trees,
255 $method_link_species_set, $expanded, $solve_overlapping, $preserve_blocks,
256 $species_order) =
257 rearrange([qw(
258 ADAPTOR REFERENCE_SLICE GENOMIC_ALIGN_BLOCKS GENOMIC_ALIGN_TREES
259 METHOD_LINK_SPECIES_SET EXPANDED SOLVE_OVERLAPPING PRESERVE_BLOCKS
260 SPECIES_ORDER
261 )], @args);
262
263 $self->adaptor($adaptor) if (defined ($adaptor));
264 $self->reference_Slice($reference_slice) if (defined ($reference_slice));
265 if (defined($genomic_align_blocks)) {
266 foreach my $this_genomic_align_block (@$genomic_align_blocks) {
267 $self->add_GenomicAlignBlock($this_genomic_align_block);
268 }
269 }
270 $self->{_method_link_species_set} = $method_link_species_set if (defined($method_link_species_set));
271
272 $self->{expanded} = 0;
273 if ($expanded) {
274 $self->{expanded} = 1;
275 }
276
277 $self->{solve_overlapping} = 0;
278 if ($solve_overlapping) {
279 $self->{solve_overlapping} = $solve_overlapping;
280 }
281
282 if ($genomic_align_trees) {
283 $self->_create_underlying_Slices($genomic_align_trees, $self->{expanded},
284 $self->{solve_overlapping}, $preserve_blocks, $species_order);
285
286 #Awful hack to store the _alignslice_from and _alignslice_to on the
287 #GenomicAlignBlock for use in get_all_ConservationScores which uses
288 #GenomicAlignBlock and not GenomicAlignTree
289 foreach my $tree (@$genomic_align_trees) {
290 foreach my $block (@$genomic_align_blocks) {
291
292 #my $gab_id = $tree->get_all_leaves->[0]->get_all_genomic_aligns_for_node->[0]->genomic_align_block_id;
293 #Hope always have a reference_genomic_align. The above doesn't work because only the reference species
294 #has the genomic_align_block_id set
295 my $gab_id = $tree->reference_genomic_align->genomic_align_block_id;
296 my $block_id = $block->dbID;
297
298 #if the block has been restricted, need to look at original_dbID
299 if (!defined $block_id) {
300 $block_id = $block->{original_dbID};
301 }
302 my $tree_ref_ga = $tree->{reference_genomic_align};
303 my $block_ref_ga = $block->{reference_genomic_align};
304
305 #Need to check the ref_ga details not just the block id because
306 #the original_dbID is not unique for 2x genome blocks
307 if ($gab_id == $block_id &&
308 $tree_ref_ga->dnafrag_start == $block_ref_ga->dnafrag_start &&
309 $tree_ref_ga->dnafrag_end == $block_ref_ga->dnafrag_end &&
310 $tree_ref_ga->dnafrag_strand == $block_ref_ga->dnafrag_strand) {
311
312 $block->{_alignslice_from} = $tree->{_alignslice_from};
313 $block->{_alignslice_to} = $tree->{_alignslice_to};
314 }
315 }
316 }
317 } else {
318 $self->_create_underlying_Slices($genomic_align_blocks, $self->{expanded},
319 $self->{solve_overlapping}, $preserve_blocks, $species_order);
320 }
321 return $self;
322 }
323
324
325 =head2 sub_AlignSlice
326
327 Arg 1 : int $start
328 Arg 2 : int $end
329 Example : my $sub_align_slice = $align_slice->sub_AlignSlice(10, 50);
330 Description: Creates a new Bio::EnsEMBL::Compara::AlignSlice object
331 corresponding to a sub region of this one
332 Returntype : Bio::EnsEMBL::Compara::AlignSlice object
333 Exceptions : return undef if no internal slices can be created (see
334 Bio::EnsEMBL::Compara::AlignSlice::Slice->sub_Slice)
335 Caller : $align_slice
336 Status : Testing
337
338 =cut
339
340 sub sub_AlignSlice {
341 my ($self, $start, $end) = @_;
342 my $sub_align_slice = {};
343
344 throw("Must provide START argument") if (!defined($start));
345 throw("Must provide END argument") if (!defined($end));
346
347 bless $sub_align_slice, ref($self);
348
349 $sub_align_slice->adaptor($self->adaptor);
350 $sub_align_slice->reference_Slice($self->reference_Slice);
351 foreach my $this_genomic_align_block (@{$self->get_all_GenomicAlignBlocks()}) {
352 $sub_align_slice->add_GenomicAlignBlock($this_genomic_align_block);
353 }
354 $sub_align_slice->{_method_link_species_set} = $self->{_method_link_species_set};
355
356 $sub_align_slice->{expanded} = $self->{expanded};
357
358 $sub_align_slice->{solve_overlapping} = $self->{solve_overlapping};
359
360 foreach my $this_slice (@{$self->get_all_Slices}) {
361 my $new_slice = $this_slice->sub_Slice($start, $end);
362 push(@{$sub_align_slice->{_slices}}, $new_slice) if ($new_slice);
363 }
364 return undef if (!$sub_align_slice->{_slices});
365
366 return $sub_align_slice;
367 }
368
369
370 =head2 adaptor
371
372 Arg[1] : (optional) Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor $align_slice_adaptor
373 Example : my $align_slice_adaptor = $align_slice->adaptor
374 Example : $align_slice->adaptor($align_slice_adaptor)
375 Description: getter/setter for the adaptor attribute
376 Returntype : Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor object
377 Exceptions : throw if arg is not a Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor
378 Caller : $object->methodname
379
380 =cut
381
382 sub adaptor {
383 my ($self, $adaptor) = @_;
384
385 if (defined($adaptor)) {
386 throw "[$adaptor] must be a Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor object"
387 unless ($adaptor and $adaptor->isa("Bio::EnsEMBL::Compara::DBSQL::AlignSliceAdaptor"));
388 $self->{'adaptor'} = $adaptor;
389 }
390
391 return $self->{'adaptor'};
392 }
393
394
395 =head2 get_all_Slices (experimental)
396
397 Arg[1] : [optional] string $species_name1
398 Arg[2] : [optional] string $species_name2
399 Arg[...] : [optional] string $species_nameN
400 Example : my $slices = $align_slice->get_all_Slices
401 Description: getter for all the Slices in this AlignSlice. If a list
402 of species is specified, returns only the slices for
403 these species. The slices are returned in a "smart"
404 order, i.e. the slice corresponding to the reference
405 species is returned first and then the remaining slices
406 depending on their phylogenetic distance to the first
407 one.
408 NB: You can use underscores instead of whitespaces for
409 the name of the species, i.e. Homo_sapiens will be
410 understood as "Homo sapiens". However if the GenomeDB is found
411 to already have _ defined in the name then this behaviour is
412 disabled.
413 Returntype : listref of Bio::EnsEMBL::Compara::AlignSlice::Slice
414 objects.
415 Exceptions :
416 Caller : $object->methodname
417
418 =cut
419
420 sub get_all_Slices {
421 my ( $self, @species_names ) = @_;
422 my $slices = [];
423
424 if (@species_names) {
425 foreach my $slice ( @{ $self->{_slices} } ) {
426 #Substitute _ for spaces & check if the current GenomeDB matches with
427 #or without them
428 foreach my $this_species_name (@species_names) {
429 ( my $space_species_name = $this_species_name ) =~ s/_/ /g;
430 push( @$slices, $slice )
431 if ( ( $this_species_name eq $slice->genome_db->name )
432 || ( $space_species_name eq $slice->genome_db->name ) );
433 }
434 }
435 }
436 else {
437 $slices = $self->{_slices};
438 }
439
440 return $slices;
441 }
442
443
444 =head2 reference_Slice
445
446 Arg[1] : (optional) Bio::EnsEMBL::Slice $slice
447 Example : my $reference_slice = $align_slice->reference_slice
448 Example : $align_slice->reference_Slice($reference_slice)
449 Description: getter/setter for the attribute reference_slice
450 Returntype : Bio::EnsEMBL::Slice object
451 Exceptions : throw if arg is not a Bio::EnsEMBL::Slice object
452 Caller : $object->methodname
453
454 =cut
455
456 sub reference_Slice {
457 my ($self, $reference_slice) = @_;
458
459 if (defined($reference_slice)) {
460 throw "[$reference_slice] must be a Bio::EnsEMBL::Slice object"
461 unless ($reference_slice and $reference_slice->isa("Bio::EnsEMBL::Slice"));
462 $self->{'reference_slice'} = $reference_slice;
463 }
464
465 return $self->{'reference_slice'};
466 }
467
468
469 =head2 add_GenomicAlignBlock
470
471 Arg[1] : Bio::EnsEMBL::Compara::GenomicAlignBlock $genomicAlignBlock
472 Example : $align_slice->add_GenomicAlignBlock($genomicAlignBlock)
473 Description: add a Bio::EnsEMBL::Compara::GenomicAlignBlock object to the array
474 stored in the attribute all_genomic_align_blocks
475 Returntype : none
476 Exceptions : throw if arg is not a Bio::EnsEMBL::Compara::GenomicAlignBlock
477 Caller : $object->methodname
478
479 =cut
480
481 sub add_GenomicAlignBlock {
482 my ($self, $genomic_align_block) = @_;
483
484 if (!defined($genomic_align_block)) {
485 throw "Too few arguments for Bio::EnsEMBL::Compara::AlignSlice->add_GenomicAlignBlock()";
486 }
487 if (!$genomic_align_block or !$genomic_align_block->isa("Bio::EnsEMBL::Compara::GenomicAlignBlock")) {
488 throw "[$genomic_align_block] must be a Bio::EnsEMBL::Compara::GenomicAlignBlock object";
489 }
490
491 push(@{$self->{'all_genomic_align_blocks'}}, $genomic_align_block);
492 }
493
494
495 =head2 get_all_GenomicAlignBlocks
496
497 Arg[1] : none
498 Example : my $all_genomic_align_blocks = $align_slice->get_all_GenomicAlignBlocks
499 Description: getter for the attribute all_genomic_align_blocks
500 Returntype : listref of Bio::EnsEMBL::Compara::GenomicAlignBlock objects
501 Exceptions : none
502 Caller : $object->methodname
503
504 =cut
505
506 sub get_all_GenomicAlignBlocks {
507 my ($self) = @_;
508
509 return ($self->{'all_genomic_align_blocks'} || []);
510 }
511
512
513 =head2 get_MethodLinkSpeciesSet
514
515 Arg[1] : none
516 Example : my $method_link_species_set = $align_slice->get_MethodLinkSpeciesSet
517 Description: getter for the Bio::EnsEMBL::Compara::MethodLinkSpeciesSet
518 used to create this object
519 Returntype : Bio::EnsEMBL::Compara::MethodLinkSpeciesSet
520 Exceptions : none
521 Caller : $object->methodname
522
523 =cut
524
525 sub get_MethodLinkSpeciesSet {
526 my ($self) = @_;
527
528 return $self->{'_method_link_species_set'};
529 }
530
531
532 =head2 get_SimpleAlign
533
534 Arg[1] : none
535 Example : use Bio::AlignIO;
536 my $out = Bio::AlignIO->newFh(-fh=>\*STDOUT, -format=> "clustalw");
537 print $out $align_slice->get_SimpleAlign();
538 Description : This method creates a Bio::SimpleAlign object using the
539 Bio::EnsEMBL::Compara::AlignSlice::Slices underlying this
540 Bio::EnsEMBL::Compara::AlignSlice object. The SimpleAlign
541 describes the alignment where the first sequence
542 corresponds to the reference Slice and the remaining
543 correspond to the other species.
544 Returntype : Bio::SimpleAlign object
545 Exceptions :
546 Caller : $object->methodname
547
548 =cut
549
550 sub get_SimpleAlign {
551 my ($self, @species) = @_;
552 my $simple_align;
553
554 ## Create a single Bio::SimpleAlign for the projection
555 $simple_align = Bio::SimpleAlign->new();
556 $simple_align->id("ProjectedMultiAlign");
557
558 my $genome_db_name_counter;
559 foreach my $slice (@{$self->get_all_Slices(@species)}) {
560 my $seq = Bio::LocatableSeq->new(
561 -SEQ => $slice->seq,
562 -START => $slice->start,
563 -END => $slice->end,
564 -ID => $slice->genome_db->name.($genome_db_name_counter->{$slice->genome_db->name} or ""),
565 -STRAND => $slice->strand
566 );
567 ## This allows to have several sequences for the same species. Bio::SimpleAlign complains
568 ## about having the same ID, START and END for two sequences...
569 if (!defined($genome_db_name_counter->{$slice->genome_db->name})) {
570 $genome_db_name_counter->{$slice->genome_db->name} = 2;
571 } else {
572 $genome_db_name_counter->{$slice->genome_db->name}++;
573 }
574
575 $simple_align->add_seq($seq);
576 }
577
578 return $simple_align;
579 }
580
581
582 =head2 get_all_ConservationScores
583
584 Arg 1 : (opt) integer $display_size (default 700)
585 Arg 2 : (opt) string $display_type (one of "AVERAGE" or "MAX") (default "MAX")
586 Arg 3 : (opt) integer $window_size
587 Example : my $conservation_scores =
588 $align_slice->get_all_ConservationScores(1000, "AVERAGE", 10);
589 Description: Retrieve the corresponding
590 Bio::EnsEMBL::Compara::ConservationScore objects for the
591 Bio::EnsEMBL::Compara::AlignSlice object. It calls either
592 _get_expanded_conservation_scores if the AlignSlice has
593 "expanded" set or _get_condensed_conservation_scores for
594 condensed mode.
595 It sets up the align_start, align_end and slice_length and map
596 the resulting objects onto the AlignSlice.
597 Please refer to the documentation in
598 Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor
599 for more details.
600 Returntype : ref. to an array of Bio::EnsEMBL::Compara::ConservationScore
601 objects.
602 Caller : object::methodname
603 Status : At risk
604
605 =cut
606
607 sub get_all_ConservationScores {
608 my ($self, $display_size, $display_type, $window_size) = @_;
609 my $all_conservation_scores = [];
610 my $y_axis_min;
611 my $y_axis_max;
612
613 my $conservation_score_adaptor = $self->adaptor->db->get_ConservationScoreAdaptor();
614
615 #Get scores in either expanded or condensed mode
616 if ($self->{expanded}) {
617 $all_conservation_scores = $self->_get_expanded_conservation_scores($conservation_score_adaptor, $display_size, $display_type, $window_size);
618 } else {
619 $all_conservation_scores = $self->_get_condensed_conservation_scores($conservation_score_adaptor, $display_size, $display_type, $window_size);
620 }
621
622 return $all_conservation_scores;
623 }
624
625 =head2 _get_expanded_conservation_scores
626
627 Arg 1 : Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor
628 Arg 2 : (opt) integer $display_size (default 700)
629 Arg 3 : (opt) string $display_type (one of "AVERAGE" or "MAX") (default "MAX")
630 Arg 4 : (opt) integer $window_size
631 Example : my $conservation_scores =
632 $self->_get_expanded_conservation_scores($cs_adaptor, 1000, "AVERAGE", 10);
633 Description: Retrieve the corresponding
634 Bio::EnsEMBL::Compara::ConservationScore objects for the
635 Bio::EnsEMBL::Compara::GenomicAlignBlock objects underlying
636 this Bio::EnsEMBL::Compara::AlignSlice object. This method
637 calls the Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor->
638 fetch_all_by_GenomicAlignBlock() method. It sets up the align_start,
639 align_end and slice_length and map the resulting objects onto
640 the AlignSlice. $diaplay_slize, $display_type and $window_size
641 are passed to the fetch_all_by_GenomicAlignBlock() method.
642 Please refer to the documentation in
643 Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor
644 for more details.
645 Returntype : ref. to an array of Bio::EnsEMBL::Compara::ConservationScore
646 objects.
647 Caller : object::methodname
648 Status : At risk
649
650 =cut
651 sub _get_expanded_conservation_scores {
652 my ($self, $conservation_score_adaptor, $display_size, $display_type, $window_size) = @_;
653 my $y_axis_min;
654 my $y_axis_max;
655
656 my $all_conservation_scores = [];
657 my $offset = 0; #the start of each gab in alignment coords
658 my $prev_gab_end;
659 foreach my $this_genomic_align_block (@{$self->get_all_GenomicAlignBlocks()}) {
660 $this_genomic_align_block->{restricted_aln_start} = $this_genomic_align_block->{_alignslice_from};
661 $this_genomic_align_block->{restricted_aln_end} = $this_genomic_align_block->{_alignslice_to};
662
663 #Need to map the coords I get back from the conservation_score_adaptor onto the slice using the start position of the genomic_align_block
664 my $all_these_conservation_scores = $conservation_score_adaptor->fetch_all_by_GenomicAlignBlock(
665 $this_genomic_align_block, 1,$this_genomic_align_block->length, $self->get_all_Slices()->[0]->length,
666 $display_size, $display_type, $window_size);
667
668 #Need to account for gaps between gabs which will be slice end - start on the reference slice.
669 if ($prev_gab_end) {
670 my $gap = $this_genomic_align_block->reference_slice_start - $prev_gab_end - 1;
671 if ($gap) {
672 $offset += $gap;
673 }
674 }
675 $prev_gab_end = $this_genomic_align_block->reference_slice_end;
676
677 foreach my $score (@$all_these_conservation_scores) {
678 $score->position($score->position + $offset);
679 }
680
681 #offset is the sum of preceding gab lengths
682 $offset += ($this_genomic_align_block->{restricted_aln_end} - $this_genomic_align_block->{restricted_aln_start} + 1);
683
684 #initialise y axis min and max
685 if (!defined $y_axis_max) {
686 $y_axis_max = $all_these_conservation_scores->[0]->y_axis_max;
687 $y_axis_min = $all_these_conservation_scores->[0]->y_axis_min;
688 }
689 #find overall min and max
690 if ($y_axis_min > $all_these_conservation_scores->[0]->y_axis_min) {
691 $y_axis_min = $all_these_conservation_scores->[0]->y_axis_min;
692 }
693 if ($y_axis_max < $all_these_conservation_scores->[0]->y_axis_max) {
694 $y_axis_max = $all_these_conservation_scores->[0]->y_axis_max;
695 }
696 push (@$all_conservation_scores, @$all_these_conservation_scores);
697 }
698 #set overall min and max
699 $all_conservation_scores->[0]->y_axis_min($y_axis_min);
700 $all_conservation_scores->[0]->y_axis_max($y_axis_max);
701
702 return $all_conservation_scores;
703 }
704
705 =head2 _get_condensed_conservation_scores
706
707 Arg 1 : Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor
708 Arg 2 : (opt) integer $display_size (default 700)
709 Arg 3 : (opt) string $display_type (one of "AVERAGE" or "MAX") (default "MAX")
710 Arg 4 : (opt) integer $window_size
711 Example : my $conservation_scores =
712 $self->_get_expanded_conservation_scores($cs_adaptor, 1000, "AVERAGE", 10);
713 Description: Retrieve the corresponding
714 Bio::EnsEMBL::Compara::ConservationScore objects for the
715 reference Bio::EnsEMBL::Slice object of
716 this Bio::EnsEMBL::Compara::AlignSlice object. This method
717 calls the Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor->
718 fetch_all_by_MethodLinkSpeciesSet_Slice() method. It sets up
719 the align_start, align_end and slice_length and map the
720 resulting objects onto the AlignSlice. $display_slize,
721 $display_type and $window_size are passed to the
722 fetch_all_by_MethodLinkSpeciesSet_Slice() method.
723 Please refer to the documentation in
724 Bio::EnsEMBL::Compara::DBSQL::ConservationScoreAdaptor
725 for more details.
726 Returntype : ref. to an array of Bio::EnsEMBL::Compara::ConservationScore
727 objects.
728 Caller : object::methodname
729 Status : At risk
730
731 =cut
732 sub _get_condensed_conservation_scores {
733 my ($self, $conservation_score_adaptor, $display_size, $display_type, $window_size) = @_;
734
735 my $all_conservation_scores = [];
736
737 throw ("Must have method_link_species_set defined to retrieve conservation scores for a condensed AlignSlice") if (!defined $self->{_method_link_species_set});
738 throw ("Must have reference slice defined to retrieve conservation scores for a condensed AlignSlice") if (!defined $self->{'reference_slice'});
739
740 # select the conservation score mlss_id linked to the msa
741 my $sql = 'SELECT method_link_species_set_id FROM method_link_species_set_tag JOIN method_link_species_set USING (method_link_species_set_id) JOIN method_link USING (method_link_id) WHERE type LIKE "%CONSERVATION\_SCORE" AND tag = "msa_mlss_id" AND value = ?';
742 my $sth = $conservation_score_adaptor->prepare($sql);
743 $sth->execute($self->{_method_link_species_set}->dbID);
744
745 my ($cs_mlss_id) = @{$sth->fetchrow_arrayref};
746 $sth->finish;
747
748 throw ("Unable to find conservation score method_link_species_set for this multiple alignment " . $self->{_method_link_species_set}->dbID) if (!defined $cs_mlss_id);
749 my $mlss_adaptor = $self->adaptor->db->get_MethodLinkSpeciesSetAdaptor();
750
751 my $cs_mlss = $mlss_adaptor->fetch_by_dbID($cs_mlss_id);
752
753 $all_conservation_scores = $conservation_score_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice($cs_mlss, $self->{'reference_slice'}, $display_size, $display_type, $window_size);
754
755 return $all_conservation_scores;
756 }
757
758
759 =head2 get_all_ConstrainedElements
760
761 Arg 1 : (opt) string $method_link_type (default = GERP_CONSTRAINED_ELEMENT)
762 Arg 2 : (opt) listref Bio::EnsEMBL::Compara::GenomeDB $species_set
763 (default, the set of species from the MethodLinkSpeciesSet used
764 to build this AlignSlice)
765 Example : my $constrained_elements =
766 $align_slice->get_all_ConstrainedElements();
767 Description: Retrieve the corresponding constrained elements for these alignments.
768 Objects will be located on this AlignSlice, i.e. the
769 reference_slice, reference_slice_start, reference_slice_end
770 and reference_slice_strand will refer to this AlignSlice
771 object
772 Returntype : ref. to an array of Bio::EnsEMBL::Compara::ConstrainedElement
773 objects.
774 Caller : object::methodname
775 Status : At risk
776
777 =cut
778
779 sub get_all_ConstrainedElements {
780 my ($self, $method_link_type, $species_set) = @_;
781 my $all_constrained_elements = [];
782
783 $method_link_type ||= "GERP_CONSTRAINED_ELEMENT";
784 my $key_cache = "_constrained_elements_".$method_link_type;
785 if ($species_set) {
786 $key_cache .= "::" . join("-", sort map {s/\W/_/g} map {$_->name} @$species_set);
787 } else {
788 $species_set = $self->{_method_link_species_set}->species_set;
789 }
790
791 if (!defined($self->{$key_cache})) {
792 my $method_link_species_set_adaptor = $self->adaptor->db->get_MethodLinkSpeciesSetAdaptor();
793 my $method_link_species_set = $method_link_species_set_adaptor->fetch_by_method_link_type_GenomeDBs(
794 $method_link_type, $self->{_method_link_species_set}->species_set);
795
796 if ($method_link_species_set) {
797 my $constrained_element_adaptor = $self->adaptor->db->get_ConstrainedElementAdaptor();
798 $all_constrained_elements = $constrained_element_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice(
799 $method_link_species_set, $self->reference_Slice);
800 my $big_mapper = $self->{_reference_Mapper};
801 foreach my $this_constrained_element (@{$all_constrained_elements}) {
802 my $reference_slice_start;
803 my $reference_slice_end;
804 my $reference_slice_strand;
805
806 my @alignment_coords = $big_mapper->map_coordinates(
807 "sequence", # $self->genomic_align->dbID,
808 $this_constrained_element->start + $this_constrained_element->slice->start - 1,
809 $this_constrained_element->end + $this_constrained_element->slice->start - 1,
810 $this_constrained_element->strand,
811 "sequence" # $from_mapper->from
812 );
813 foreach my $alignment_coord (@alignment_coords) {
814 next if (!$alignment_coord->isa("Bio::EnsEMBL::Mapper::Coordinate"));
815 if (!defined($reference_slice_strand)) {
816 $reference_slice_start = $alignment_coord->start;
817 $reference_slice_end = $alignment_coord->end;
818 $reference_slice_strand = $alignment_coord->strand;
819 } else {
820 if ($alignment_coord->start < $reference_slice_start) {
821 $reference_slice_start = $alignment_coord->start;
822 }
823 if ($alignment_coord->end > $reference_slice_end) {
824 $reference_slice_end = $alignment_coord->end;
825 }
826 }
827 }
828 $this_constrained_element->slice($self);
829 $this_constrained_element->start($reference_slice_start);
830 $this_constrained_element->end($reference_slice_end);
831 $this_constrained_element->strand($reference_slice_strand);
832 }
833 }
834 $self->{$key_cache} = $all_constrained_elements;
835 }
836
837 return $self->{$key_cache};
838 }
839
840
841 =head2 _create_underlying_Slices (experimental)
842
843 Arg[1] : listref of Bio::EnsEMBL::Compara::GenomicAlignBlocks
844 $genomic_align_blocks
845 Arg[2] : [optional] boolean $expanded (default = FALSE)
846 Arg[3] : [optional] boolean $solve_overlapping (default = FALSE)
847 Arg[4] : [optional] boolean $preserve_blocks (default = FALSE)
848 Example :
849 Description: Creates a set of Bio::EnsEMBL::Compara::AlignSlice::Slices
850 and attach it to this object.
851 Returntype :
852 Exceptions : warns about overlapping GenomicAlignBlocks
853 Caller :
854
855 =cut
856
857 sub _create_underlying_Slices {
858 my ($self, $genomic_align_blocks, $expanded, $solve_overlapping, $preserve_blocks, $species_order) = @_;
859 my $strand = $self->reference_Slice->strand;
860
861 my $align_slice_length = 0;
862 my $last_ref_pos;
863 if ($strand == 1) {
864 $last_ref_pos = $self->reference_Slice->start;
865 } else {
866 $last_ref_pos = $self->reference_Slice->end;
867 }
868
869 my $ref_genome_db = $self->adaptor->db->get_GenomeDBAdaptor->fetch_by_Slice($self->reference_Slice);
870 my $big_mapper = Bio::EnsEMBL::Mapper->new("sequence", "alignment");
871
872 my $sorted_genomic_align_blocks;
873
874 if ($solve_overlapping eq "restrict") {
875 $sorted_genomic_align_blocks = _sort_and_restrict_GenomicAlignBlocks($genomic_align_blocks);
876 } elsif ($solve_overlapping) {
877 $sorted_genomic_align_blocks = _sort_and_compile_GenomicAlignBlocks($genomic_align_blocks);
878 } else {
879 $sorted_genomic_align_blocks = _sort_GenomicAlignBlocks($genomic_align_blocks);
880 }
881 @$sorted_genomic_align_blocks = reverse(@$sorted_genomic_align_blocks) if ($strand == -1);
882 foreach my $this_genomic_align_block (@$sorted_genomic_align_blocks) {
883 my $original_genomic_align_block = $this_genomic_align_block;
884 my ($from, $to);
885
886 if ($preserve_blocks) {
887 ## Don't restrict the block. Set from and to to 1 and length respectively
888 $from = 1;
889 $to = $this_genomic_align_block->length;
890 } else {
891 #need to check that the block is still overlapping the slice - it may
892 #have already been restricted by the options above.
893 if ($this_genomic_align_block->reference_genomic_align->dnafrag_start > $self->reference_Slice->end || $this_genomic_align_block->reference_genomic_align->dnafrag_end < $self->reference_Slice->start) {
894 next;
895 }
896 ($this_genomic_align_block, $from, $to) = $this_genomic_align_block->restrict_between_reference_positions(
897 $self->reference_Slice->start, $self->reference_Slice->end);
898 }
899
900 $original_genomic_align_block->{_alignslice_from} = $from;
901 $original_genomic_align_block->{_alignslice_to} = $to;
902
903 my $reference_genomic_align = $this_genomic_align_block->reference_genomic_align;
904
905 #If I haven't needed to restrict, I don't gain this link so add it here
906 if (!defined $reference_genomic_align->genomic_align_block->reference_genomic_align) {
907 $reference_genomic_align->genomic_align_block($this_genomic_align_block);
908 }
909
910 my ($this_pos, $this_gap_between_genomic_align_blocks);
911 if ($strand == 1) {
912 $this_pos = $reference_genomic_align->dnafrag_start;
913 $this_gap_between_genomic_align_blocks = $this_pos - $last_ref_pos;
914 } else {
915 $this_pos = $reference_genomic_align->dnafrag_end;
916 $this_gap_between_genomic_align_blocks = $last_ref_pos - $this_pos;
917 }
918
919 if ($this_gap_between_genomic_align_blocks > 0) {
920 ## Add mapper info for inter-genomic_align_block space
921 if ($strand == 1) {
922 $big_mapper->add_map_coordinates(
923 'sequence',
924 $last_ref_pos,
925 $this_pos - 1,
926 $strand,
927 'alignment',
928 $align_slice_length + 1,
929 $align_slice_length + $this_gap_between_genomic_align_blocks,
930 );
931 } else {
932 $big_mapper->add_map_coordinates(
933 'sequence',
934 $this_pos + 1,
935 $last_ref_pos,
936 $strand,
937 'alignment',
938 $align_slice_length + 1,
939 $align_slice_length + $this_gap_between_genomic_align_blocks,
940 );
941 }
942 $align_slice_length += $this_gap_between_genomic_align_blocks;
943 }
944 $reference_genomic_align->genomic_align_block->start($align_slice_length + 1);
945 $original_genomic_align_block->{_alignslice_start} = $align_slice_length;
946 if ($expanded) {
947 $align_slice_length += CORE::length($reference_genomic_align->aligned_sequence("+FAKE_SEQ"));
948 $reference_genomic_align->genomic_align_block->end($align_slice_length);
949 $big_mapper->add_Mapper($reference_genomic_align->get_Mapper(0));
950 } else {
951 $align_slice_length += $reference_genomic_align->dnafrag_end - $reference_genomic_align->dnafrag_start + 1;
952 $reference_genomic_align->genomic_align_block->end($align_slice_length);
953 $big_mapper->add_Mapper($reference_genomic_align->get_Mapper(0,1));
954 }
955 $reference_genomic_align->genomic_align_block->slice($self);
956
957 if ($strand == 1) {
958 $last_ref_pos = $reference_genomic_align->dnafrag_end + 1;
959 } else {
960 $last_ref_pos = $reference_genomic_align->dnafrag_start - 1;
961 }
962 }
963 my ($this_pos, $this_gap_between_genomic_align_blocks);
964 if ($strand == 1) {
965 $this_pos = $self->reference_Slice->end;
966 $this_gap_between_genomic_align_blocks = $this_pos - ($last_ref_pos - 1);
967 } else {
968 $this_pos = $self->reference_Slice->start;
969 $this_gap_between_genomic_align_blocks = $last_ref_pos + 1 - $this_pos;
970 }
971
972 ## $last_ref_pos is the next nucleotide position after the last mapped one.
973 if ($this_gap_between_genomic_align_blocks > 0) {
974 if ($strand == 1) {
975 $big_mapper->add_map_coordinates(
976 'sequence',
977 $last_ref_pos,
978 $this_pos,
979 $strand,
980 'alignment',
981 $align_slice_length + 1,
982 $align_slice_length + $this_gap_between_genomic_align_blocks,
983 );
984 } else {
985 $big_mapper->add_map_coordinates(
986 'sequence',
987 $this_pos,
988 $last_ref_pos,
989 $strand,
990 'alignment',
991 $align_slice_length + 1,
992 $align_slice_length + $this_gap_between_genomic_align_blocks,
993 );
994 }
995 $align_slice_length += $this_gap_between_genomic_align_blocks;
996 }
997
998 if ($species_order) {
999 foreach my $species_def (@$species_order) {
1000 my $genome_db_name = $species_def->{genome_db}->name;
1001 # print STDERR "SPECIES:: ", $genome_db_name, "\n";
1002 my $new_slice = new Bio::EnsEMBL::Compara::AlignSlice::Slice(
1003 -length => $align_slice_length,
1004 -requesting_slice => $self->reference_Slice,
1005 -align_slice => $self,
1006 -method_link_species_set => $self->{_method_link_species_set},
1007 -genome_db => $species_def->{genome_db},
1008 -expanded => $expanded,
1009 );
1010 $new_slice->{genomic_align_ids} = $species_def->{genomic_align_ids};
1011 push(@{$self->{slices}->{lc($genome_db_name)}}, $new_slice);
1012 push(@{$self->{_slices}}, $new_slice);
1013 }
1014 } else {
1015 # print STDERR "SPECIES:: ", $ref_genome_db->name, "\n";
1016 $self->{slices}->{lc($ref_genome_db->name)} = [new Bio::EnsEMBL::Compara::AlignSlice::Slice(
1017 -length => $align_slice_length,
1018 -requesting_slice => $self->reference_Slice,
1019 -align_slice => $self,
1020 -method_link_species_set => $self->{_method_link_species_set},
1021 -genome_db => $ref_genome_db,
1022 -expanded => $expanded,
1023 )];
1024 $self->{_slices} = [$self->{slices}->{lc($ref_genome_db->name)}->[0]];
1025 }
1026
1027 $self->{slices}->{lc($ref_genome_db->name)}->[0]->add_Slice_Mapper_pair(
1028 $self->reference_Slice,
1029 $big_mapper,
1030 1,
1031 $align_slice_length,
1032 $self->reference_Slice->strand
1033 );
1034 $self->{_reference_Mapper} = $big_mapper;
1035
1036 foreach my $this_genomic_align_block (@$sorted_genomic_align_blocks) {
1037 if (UNIVERSAL::isa($this_genomic_align_block, "Bio::EnsEMBL::Compara::GenomicAlignTree")) {
1038 ## For trees, loop through all nodes (internal and leaves) to add the GenomicAligns
1039 foreach my $this_genomic_align_node (@{$this_genomic_align_block->get_all_nodes}) {
1040 # but we have to skip the reference node as this has already been added to the guide Slice
1041 next if ($this_genomic_align_node eq $this_genomic_align_block->reference_genomic_align_node);
1042
1043 # For composite segments (2X genomes), the node will link to several GenomicAligns.
1044 # Add each of them to one of the AS:Slice objects
1045 foreach my $this_genomic_align (@{$this_genomic_align_node->get_all_genomic_aligns_for_node}) {
1046 # Link to genomic_align_block may have been lost during tree minimization
1047 $this_genomic_align->genomic_align_block_id(0);
1048 $this_genomic_align->genomic_align_block($this_genomic_align_block);
1049 $self->_add_GenomicAlign_to_a_Slice($this_genomic_align, $this_genomic_align_block,
1050 $species_order, $align_slice_length);
1051 }
1052 }
1053 } else {
1054 ## For plain alignments, just use all non-reference GenomicAlign objects
1055 foreach my $this_genomic_align
1056 (@{$this_genomic_align_block->get_all_non_reference_genomic_aligns}) {
1057 $self->_add_GenomicAlign_to_a_Slice($this_genomic_align, $this_genomic_align_block,
1058 $species_order, $align_slice_length);
1059 }
1060 }
1061 }
1062 #It is possible for the same region in the ref species (eg Gibbon) to align to 2 different blocks (pairwise to human in the case of the EPO low coverage alignments). In this case, although the incoming $genomic_align_blocks will have 2 blocks, the $sorted_genomic_align_blocks will only contain 1 of the blocks. It may happen that one species occurs in one block (eg gorilla) but not in the other. However, the $species_order will contain gorilla but the $sorted_genomic_align_blocks may not. This results in a slice being created for gorilla but it has no slice_mapper_pairs. Must check the slices and remove any that have {'slice_mapper_pairs'} as undef (no alignment comes out as a GAP).
1063
1064 if ($species_order) {
1065 my $slices = $self->{_slices};
1066 for (my $i = (@$slices-1); $i >= 0; --$i) {
1067 if (@{$slices->[$i]->{'slice_mapper_pairs'}} == 0) {
1068 #remove from {slices}
1069 delete $self->{slices}->{$slices->[$i]->genome_db->name};
1070 #remove from {_slices}
1071 splice @$slices, $i, 1;
1072 }
1073 }
1074 }
1075
1076
1077
1078 return $self;
1079 }
1080
1081 =head2 _add_GenomicAlign_to_a_Slice
1082
1083 =cut
1084
1085 sub _add_GenomicAlign_to_a_Slice {
1086 my ($self, $this_genomic_align, $this_genomic_align_block, $species_order, $align_slice_length) = @_;
1087
1088 my $expanded = $self->{expanded};
1089 my $species = $this_genomic_align->dnafrag->genome_db->name;
1090
1091 if (!defined($self->{slices}->{lc($species)})) {
1092 $self->{slices}->{lc($species)} = [new Bio::EnsEMBL::Compara::AlignSlice::Slice(
1093 -length => $align_slice_length,
1094 -requesting_slice => $self->reference_Slice,
1095 -align_slice => $self,
1096 -method_link_species_set => $self->{_method_link_species_set},
1097 -genome_db => $this_genomic_align->dnafrag->genome_db,
1098 -expanded => $expanded,
1099 )];
1100 push(@{$self->{_slices}}, $self->{slices}->{lc($species)}->[0]);
1101 }
1102
1103 my $this_block_start = $this_genomic_align_block->start;
1104 my $this_block_end = $this_genomic_align_block->end;
1105 my $this_core_slice = $this_genomic_align->get_Slice();
1106 if (!$this_core_slice) {
1107 $this_core_slice = new Bio::EnsEMBL::Slice(
1108 -coord_system => $aligngap_coord_system,
1109 -seq_region_name => "GAP",
1110 -start => $this_block_start,
1111 -end => $this_block_end,
1112 -strand => 0
1113 );
1114 $this_core_slice->{seq} = "." x ($this_block_end - $this_block_start + 1);
1115 }
1116 return if (!$this_core_slice); ## The restriction of the GenomicAlignBlock may return a void GenomicAlign
1117
1118 ## This creates a link between the slice and the tree node. This is required to display
1119 ## the tree on the web interface.
1120 if ($this_genomic_align->genome_db->name eq "ancestral_sequences") {
1121 foreach my $genomic_align_node (@{$this_genomic_align_block->get_all_sorted_genomic_align_nodes}) {
1122 my $genomic_align_group = $genomic_align_node->genomic_align_group;
1123 next if (!$genomic_align_group);
1124
1125 foreach my $genomic_align (@{$genomic_align_group->get_all_GenomicAligns}) {
1126 if ($this_genomic_align == $genomic_align) {
1127 my $simple_tree = $genomic_align_node->newick_simple_format();
1128 $simple_tree =~ s/\_[^\_]+\_\d+\_\d+\[[\+\-]\]//g;
1129 $simple_tree =~ s/\:[\d\.]+//g;
1130 $this_core_slice->{_tree} = $simple_tree;
1131 last;
1132 }
1133 }
1134 }
1135 }
1136
1137 my $this_mapper = $this_genomic_align->get_Mapper(0, !$expanded);
1138 # Fix block start and block end for composite segments (2X genomes)
1139 if ($this_genomic_align->cigar_line =~ /^(\d*)X/ or $this_genomic_align->cigar_line =~ /(\d*)X$/) {
1140 $this_block_start = undef;
1141 $this_block_end = undef;
1142 my @blocks = $this_mapper->map_coordinates("sequence", $this_genomic_align->dnafrag_start,
1143 $this_genomic_align->dnafrag_end, $this_genomic_align->dnafrag_strand, "sequence");
1144 foreach my $this_block (@blocks) {
1145 next if ($this_block->isa("Bio::EnsEMBL::Mapper::Gap"));
1146 $this_block_start = $this_block->start if (!defined($this_block_start) or $this_block->start < $this_block_start);
1147 $this_block_end = $this_block->end if (!defined($this_block_end) or $this_block->end > $this_block_end);
1148 }
1149 }
1150
1151 # Choose the appropriate AS::Slice for adding this bit of the alignment
1152 my $this_underlying_slice = $self->_choose_underlying_Slice($this_genomic_align, $this_block_start,
1153 $this_block_end, $align_slice_length, $species_order);
1154
1155 # Add a Slice, Mapper, and start-end-strand coordinates to an underlying AS::Slice
1156 $this_underlying_slice->add_Slice_Mapper_pair(
1157 $this_core_slice,
1158 $this_mapper,
1159 $this_block_start,
1160 $this_block_end,
1161 $this_genomic_align->dnafrag_strand
1162 );
1163 return;
1164 }
1165
1166
1167 sub _choose_underlying_Slice {
1168 my ($self, $this_genomic_align, $this_block_start, $this_block_end, $align_slice_length, $species_order) = @_;
1169 my $underlying_slice = undef;
1170
1171 my $expanded = $self->{expanded};
1172 my $species = $this_genomic_align->dnafrag->genome_db->name;
1173
1174 if (defined($this_genomic_align->{_temporary_AS_underlying_Slice})) {
1175 my $preset_underlying_slice = $this_genomic_align->{_temporary_AS_underlying_Slice};
1176 delete($this_genomic_align->{_temporary_AS_underlying_Slice});
1177 return $preset_underlying_slice;
1178 }
1179
1180 if (!defined($self->{slices}->{lc($species)})) {
1181 ## No slice for this species yet. Create, store and return it
1182 $underlying_slice = new Bio::EnsEMBL::Compara::AlignSlice::Slice(
1183 -length => $align_slice_length,
1184 -requesting_slice => $self->reference_Slice,
1185 -align_slice => $self,
1186 -method_link_species_set => $self->{_method_link_species_set},
1187 -genome_db => $this_genomic_align->dnafrag->genome_db,
1188 -expanded => $expanded,
1189 );
1190 push(@{$self->{_slices}}, $underlying_slice);
1191 push(@{$self->{slices}->{lc($species)}}, $underlying_slice);
1192 return $underlying_slice;
1193 }
1194
1195 if ($species_order) {
1196 my $preset_underlying_slice = undef;
1197 foreach my $this_underlying_slice (@{$self->{_slices}}) {
1198 if (!$this_genomic_align->{original_dbID} and $this_genomic_align->dbID) {
1199 $this_genomic_align->{original_dbID} = $this_genomic_align->dbID;
1200 }
1201 if (grep {$_ == $this_genomic_align->{original_dbID}}
1202 @{$this_underlying_slice->{genomic_align_ids}}) {
1203 $preset_underlying_slice = $this_underlying_slice;
1204 }
1205 }
1206 if ($preset_underlying_slice) {
1207 my $overlap = 0;
1208 my $slice_mapper_pairs = $preset_underlying_slice->get_all_Slice_Mapper_pairs();
1209 foreach my $slice_mapper_pair (@$slice_mapper_pairs) {
1210 my $block_start = $slice_mapper_pair->{start};
1211 my $block_end = $slice_mapper_pair->{end};
1212 #a block may not have a start and end if there is no sequence
1213 #eg the cigar_line looks like 139D17186X
1214 next if (!defined $this_block_start || !defined $this_block_end);
1215 if ($this_block_start <= $block_end and $this_block_end >= $block_start) {
1216 $overlap = 1;
1217 last;
1218 }
1219 }
1220 if (!$overlap) {
1221 ## This block does not overlap any previous block: add it!
1222 $underlying_slice = $preset_underlying_slice;
1223 }
1224 }
1225 }
1226
1227 if (!$underlying_slice) {
1228 ## Try to add this alignment to an existing underlying Bio::EnsEMBL::Compara::AlignSlice::Slice
1229 SLICE: foreach my $this_underlying_slice (@{$self->{slices}->{lc($species)}}) {
1230 my $slice_mapper_pairs = $this_underlying_slice->get_all_Slice_Mapper_pairs();
1231 PAIRS: foreach my $slice_mapper_pair (@$slice_mapper_pairs) {
1232 my $block_start = $slice_mapper_pair->{start};
1233 my $block_end = $slice_mapper_pair->{end};
1234 if ($this_block_start <= $block_end and $this_block_end >= $block_start) {
1235 next SLICE; ## This block overlaps a previous block
1236 }
1237 }
1238 ## This block does not overlap any previous block: add it!
1239 $underlying_slice = $this_underlying_slice;
1240 }
1241 }
1242
1243 if (!$underlying_slice) {
1244 ## This block overlaps at least one block in every available underlying
1245 ## Bio::EnsEMBL::Compara::AlignSlice::Slice. Create a new one!
1246 $underlying_slice = new Bio::EnsEMBL::Compara::AlignSlice::Slice(
1247 -length => $align_slice_length,
1248 -requesting_slice => $self->reference_Slice,
1249 -align_slice => $self,
1250 -method_link_species_set => $self->{_method_link_species_set},
1251 -genome_db => $this_genomic_align->dnafrag->genome_db,
1252 -expanded => $expanded,
1253 );
1254 push(@{$self->{_slices}}, $underlying_slice);
1255 push(@{$self->{slices}->{lc($species)}}, $underlying_slice);
1256 }
1257
1258 # if ($this_genomic_align->cigar_line =~ /X/) {
1259 # ## This GenomicAlign is part of a composite alignment
1260 # my $genomic_align_group = $this_genomic_align->genomic_align_group_by_type("composite");
1261 # foreach my $this_genomic_align (@{$genomic_align_group->genomic_align_array}) {
1262 # # next if ($this_genomic_align
1263 # }
1264 # }
1265
1266 return $underlying_slice;
1267 }
1268
1269
1270 =head2 _sort_and_restrict_GenomicAlignBlocks
1271
1272 Arg[1] : listref of Bio::EnsEMBL::Compara::GenomicAlignBlocks $gabs
1273 Example : $sorted_gabs = _sort_GenomicAlignBlocks($gabs);
1274 Description : This method returns the original list of
1275 Bio::EnsEMBL::Compara::GenomicAlignBlock objects in order
1276 Returntype : listref of Bio::EnsEMBL::Compara::GenomicAlignBlock objects
1277 Exceptions :
1278 Caller : methodname()
1279
1280 =cut
1281
1282 sub _sort_and_restrict_GenomicAlignBlocks {
1283 my ($genomic_align_blocks) = @_;
1284 my $sorted_genomic_align_blocks = [];
1285 return $sorted_genomic_align_blocks if (!$genomic_align_blocks);
1286
1287 my $last_end;
1288 foreach my $this_genomic_align_block (sort _sort_gabs @{$genomic_align_blocks}) {
1289 if (defined($last_end) and
1290 $this_genomic_align_block->reference_genomic_align->dnafrag_start <= $last_end) {
1291 if ($this_genomic_align_block->reference_genomic_align->dnafrag_end > $last_end) {
1292 $this_genomic_align_block = $this_genomic_align_block->restrict_between_reference_positions($last_end + 1, undef);
1293 } else {
1294 warning("Ignoring GenomicAlignBlock because it overlaps".
1295 " previous GenomicAlignBlock");
1296 next;
1297 }
1298 }
1299 $last_end = $this_genomic_align_block->reference_genomic_align->dnafrag_end;
1300 push(@$sorted_genomic_align_blocks, $this_genomic_align_block);
1301 }
1302
1303 return $sorted_genomic_align_blocks;
1304 }
1305
1306 =head2 _sort_GenomicAlignBlocks
1307
1308 Arg[1] : listref of Bio::EnsEMBL::Compara::GenomicAlignBlocks $gabs
1309 Example : $sorted_gabs = _sort_GenomicAlignBlocks($gabs);
1310 Description : This method returns the original list of
1311 Bio::EnsEMBL::Compara::GenomicAlignBlock objects in order
1312 Returntype : listref of Bio::EnsEMBL::Compara::GenomicAlignBlock objects
1313 Exceptions :
1314 Caller : methodname()
1315
1316 =cut
1317
1318 sub _sort_GenomicAlignBlocks {
1319 my ($genomic_align_blocks) = @_;
1320 my $sorted_genomic_align_blocks = [];
1321 return $sorted_genomic_align_blocks if (!$genomic_align_blocks);
1322
1323 my $last_end;
1324 foreach my $this_genomic_align_block (sort _sort_gabs @{$genomic_align_blocks}) {
1325 if (!defined($last_end) or
1326 $this_genomic_align_block->reference_genomic_align->dnafrag_start > $last_end) {
1327 push(@$sorted_genomic_align_blocks, $this_genomic_align_block);
1328 $last_end = $this_genomic_align_block->reference_genomic_align->dnafrag_end;
1329 } else {
1330 my $block_id;
1331 if (UNIVERSAL::isa($a, "Bio::EnsEMBL::Compara::GenomicAlignBlock")) {
1332 warning("Ignoring Bio::EnsEMBL::Compara::GenomicAlignBlock #".
1333 ($this_genomic_align_block->dbID or "-unknown")." because it overlaps".
1334 " previous Bio::EnsEMBL::Compara::GenomicAlignBlock");
1335 } else {
1336 warning("Ignoring Bio::EnsEMBL::Compara::GenomicAlignTree #".
1337 ($this_genomic_align_block->node_id or "-unknown")." because it overlaps".
1338 " previous Bio::EnsEMBL::Compara::GenomicAlignTree");
1339 }
1340
1341 }
1342 }
1343
1344 return $sorted_genomic_align_blocks;
1345 }
1346
1347 sub _sort_gabs {
1348
1349 if (UNIVERSAL::isa($a, "Bio::EnsEMBL::Compara::GenomicAlignBlock")) {
1350 _sort_genomic_align_block();
1351 } else {
1352 _sort_genomic_align_tree();
1353 }
1354 }
1355
1356 sub _sort_genomic_align_block {
1357
1358 if ($a->reference_genomic_align->dnafrag_start == $b->reference_genomic_align->dnafrag_start) {
1359 ## This may happen when a block has been split into small pieces and some of them contain
1360 ## gaps only for the reference species. In this case, use another species for sorting these
1361 ## genomic_align_blocks
1362 for (my $i = 0; $i<@{$a->get_all_non_reference_genomic_aligns()}; $i++) {
1363 for (my $j = 0; $j<@{$b->get_all_non_reference_genomic_aligns()}; $j++) {
1364 next if ($a->get_all_non_reference_genomic_aligns->[$i]->dnafrag_id !=
1365 $b->get_all_non_reference_genomic_aligns->[$j]->dnafrag_id);
1366 if (($a->get_all_non_reference_genomic_aligns->[$i]->dnafrag_start !=
1367 $b->get_all_non_reference_genomic_aligns->[$j]->dnafrag_start) and
1368 ($a->get_all_non_reference_genomic_aligns->[$i]->dnafrag_strand ==
1369 $b->get_all_non_reference_genomic_aligns->[$j]->dnafrag_strand)) {
1370 ## This other genomic_align is not a full gap and ca be used to sort these blocks
1371 if ($a->get_all_non_reference_genomic_aligns->[$i]->dnafrag_strand == 1) {
1372 return $a->get_all_non_reference_genomic_aligns->[$i]->dnafrag_start <=>
1373 $b->get_all_non_reference_genomic_aligns->[$j]->dnafrag_start;
1374 } else {
1375 return $b->get_all_non_reference_genomic_aligns->[$j]->dnafrag_start <=>
1376 $a->get_all_non_reference_genomic_aligns->[$i]->dnafrag_start;
1377 }
1378 }
1379 }
1380 }
1381 } else {
1382 return $a->reference_genomic_align->dnafrag_start <=> $b->reference_genomic_align->dnafrag_start
1383 }
1384 }
1385
1386 sub _sort_genomic_align_tree {
1387
1388 if ($a->reference_genomic_align->dnafrag_start == $b->reference_genomic_align->dnafrag_start) {
1389 ## This may happen when a block has been split into small pieces and some of them contain
1390 ## gaps only for the reference species. In this case, use another species for sorting these
1391 ## genomic_align_blocks
1392 my $a_leaves = $a->get_all_leaves;
1393 my $b_leaves = $b->get_all_leaves;
1394
1395 for (my $i = 0; $i < @$a_leaves; $i++) {
1396 for (my $j = 0; $j < @$b_leaves; $j++) {
1397 #look at high coverage sequences only
1398 my $a_gas = $a_leaves->[$i]->get_all_genomic_aligns_for_node;
1399 next if (@$a_gas > 1);
1400 my $a_ga = $a_gas->[0];
1401
1402 my $b_gas = $b_leaves->[$j]->get_all_genomic_aligns_for_node;
1403 next if (@$b_gas > 1);
1404 my $b_ga = $b_gas->[0];
1405
1406 next if ($a_ga->dnafrag_id != $b_ga->dnafrag_id);
1407 if (($a_ga->dnafrag_start != $b_ga->dnafrag_start) and ($a_ga->dnafrag_strand == $b_ga->dnafrag_strand)) {
1408 ## This other genomic_align is not a full gap and ca be used to sort these blocks
1409 if ($a_ga->dnafrag_strand == 1) {
1410 return $a_ga->dnafrag_start <=> $b_ga->dnafrag_start;
1411 } else {
1412 return $b_ga->dnafrag_start <=> $a_ga->dnafrag_start;
1413 }
1414 }
1415 }
1416 }
1417 } else {
1418 return $a->reference_genomic_align->dnafrag_start <=> $b->reference_genomic_align->dnafrag_start
1419 }
1420 }
1421
1422 =head2 _sort_and_compile_GenomicAlignBlocks
1423
1424 Arg[1] : listref of Bio::EnsEMBL::Compara::GenomicAlignBlocks $gabs
1425 Example : $sorted_fake_gabs = _sort_and_compile_GenomicAlignBlocks($gabs);
1426 Description : This method returns a list of
1427 Bio::EnsEMBL::Compara::GenomicAlignBlock objects sorted by
1428 position on the reference Bio::EnsEMBL::Compara::DnaFrag. If two
1429 or more Bio::EnsEMBL::Compara::GenomicAlignBlock objects
1430 overlap, it compile them, using the _compile_GenomicAlignBlocks
1431 method.
1432 Returntype : listref of Bio::EnsEMBL::Compara::GenomicAlignBlock objects
1433 Exceptions :
1434 Caller : methodname()
1435
1436 =cut
1437
1438 sub _sort_and_compile_GenomicAlignBlocks {
1439 my ($genomic_align_blocks) = @_;
1440 my $sorted_genomic_align_blocks = [];
1441 return $sorted_genomic_align_blocks if (!$genomic_align_blocks);
1442
1443 ##############################################################################################
1444 ##
1445 ## Compile GenomicAlignBlocks in group of GenomicAlignBlocks based on reference coordinates
1446 ##
1447 my $sets_of_genomic_align_blocks = [];
1448 my $start_pos;
1449 my $end_pos;
1450 my $this_set_of_genomic_align_blocks = [];
1451 foreach my $this_genomic_align_block (sort _sort_gabs @$genomic_align_blocks) {
1452 my $this_start_pos = $this_genomic_align_block->reference_genomic_align->dnafrag_start;
1453 my $this_end_pos = $this_genomic_align_block->reference_genomic_align->dnafrag_end;
1454 if (defined($end_pos) and ($this_start_pos <= $end_pos)) {
1455 # this genomic_align_block overlaps previous one. Extend this set_of_coordinates
1456 $end_pos = $this_end_pos if ($this_end_pos > $end_pos);
1457 } else {
1458 # there is a gap between this genomic_align_block and the previous one. Close and save
1459 # this set_of_genomic_align_blocks (if it exists) and start a new one.
1460 push(@{$sets_of_genomic_align_blocks}, [$start_pos, $end_pos, $this_set_of_genomic_align_blocks])
1461 if (defined(@$this_set_of_genomic_align_blocks));
1462 $start_pos = $this_start_pos;
1463 $end_pos = $this_end_pos;
1464 $this_set_of_genomic_align_blocks = [];
1465 }
1466 push(@$this_set_of_genomic_align_blocks, $this_genomic_align_block);
1467 }
1468 push(@{$sets_of_genomic_align_blocks}, [$start_pos, $end_pos, $this_set_of_genomic_align_blocks])
1469 if (defined(@$this_set_of_genomic_align_blocks));
1470 ##
1471 ##############################################################################################
1472
1473 foreach my $this_set_of_genomic_align_blocks (@$sets_of_genomic_align_blocks) {
1474 my $this_compiled_genomic_align_block;
1475 if (@$this_set_of_genomic_align_blocks == 1) {
1476 $this_compiled_genomic_align_block = $this_set_of_genomic_align_blocks->[0];
1477 } else {
1478 $this_compiled_genomic_align_block =
1479 _compile_GenomicAlignBlocks(@$this_set_of_genomic_align_blocks);
1480 }
1481 push(@{$sorted_genomic_align_blocks}, $this_compiled_genomic_align_block);
1482 }
1483
1484 return $sorted_genomic_align_blocks;
1485 }
1486
1487 =head2 _compile_GenomicAlignBlocks
1488
1489 Arg [1] : integer $start_pos (the start of the fake genomic_align)
1490 Arg [2] : integer $end_pos (the end of the fake genomic_align)
1491 Arg [3] : listref of Bio::EnsEMBL::Compara::GenomicAlignBlocks $set_of_genomic_align_blocks
1492 $all_genomic_align_blocks (the pairwise genomic_align_blocks used for
1493 this fake multiple genomic_aling_block)
1494 Example :
1495 Description :
1496 Returntype : Bio::EnsEMBL::Compara::GenomicAlignBlock object
1497 Exceptions :
1498 Caller : methodname
1499
1500 =cut
1501
1502 sub _compile_GenomicAlignBlocks {
1503 my ($start_pos, $end_pos, $all_genomic_align_blocks) = @_;
1504
1505 ############################################################################################
1506 ##
1507 ## Change strands in order to have all reference genomic aligns on the forward strand
1508 ##
1509 my $strand;
1510 foreach my $this_genomic_align_block (@$all_genomic_align_blocks) {
1511 my $this_genomic_align = $this_genomic_align_block->reference_genomic_align;
1512 if (!defined($strand)) {
1513 $strand = $this_genomic_align->dnafrag_strand;
1514 } elsif ($strand != $this_genomic_align->dnafrag_strand) {
1515 $strand = 0;
1516 }
1517 if ($this_genomic_align->dnafrag_strand == -1) {
1518
1519 if (UNIVERSAL::isa($this_genomic_align_block, "Bio::EnsEMBL::Compara::GenomicAlignTree")) {
1520 foreach my $this_node (@{$this_genomic_align_block->get_all_nodes}) {
1521 my $genomic_align_group = $this_node->genomic_align_group;
1522 next if (!$genomic_align_group);
1523 foreach my $genomic_align (@{$genomic_align_group->get_all_GenomicAligns}) {
1524 $genomic_align->reverse_complement;
1525 }
1526 }
1527 } else {
1528 foreach my $genomic_align (@{$this_genomic_align_block->genomic_align_array}) {
1529 $genomic_align->reverse_complement;
1530 }
1531 }
1532 }
1533 }
1534 ##
1535 ############################################################################################
1536
1537 ## Nothing has to be compiled if there is one single GenomicAlignBlock!
1538 $all_genomic_align_blocks->[0]->reverse_complement;
1539 return $all_genomic_align_blocks->[0] if (scalar(@$all_genomic_align_blocks) == 1);
1540
1541 ############################################################################################
1542 ##
1543 ## Fix all sequences
1544 ##
1545 foreach my $this_genomic_align_block (@$all_genomic_align_blocks) {
1546 my $this_genomic_align = $this_genomic_align_block->reference_genomic_align;
1547 my $this_start_pos = $this_genomic_align->dnafrag_start;
1548 my $this_end_pos = $this_genomic_align->dnafrag_end;
1549 my $starting_gap = $this_start_pos - $start_pos;
1550 my $ending_gap = $end_pos - $this_end_pos;
1551
1552 my $this_cigar_line = $this_genomic_align->cigar_line;
1553 my $this_original_sequence = $this_genomic_align->original_sequence;
1554 $this_genomic_align->aligned_sequence("");
1555 if ($starting_gap) {
1556 $this_cigar_line = $starting_gap."M".$this_cigar_line;
1557 $this_original_sequence = ("N" x $starting_gap).$this_original_sequence;
1558 }
1559 if ($ending_gap) {
1560 $this_cigar_line .= $ending_gap."M";
1561 $this_original_sequence .= ("N" x $ending_gap);
1562 }
1563 $this_genomic_align->cigar_line($this_cigar_line);
1564 $this_genomic_align->original_sequence($this_original_sequence);
1565
1566 foreach my $this_genomic_align (@{$this_genomic_align_block->get_all_non_reference_genomic_aligns}) {
1567 $this_genomic_align->aligned_sequence("");
1568 my $this_cigar_line = $this_genomic_align->cigar_line;
1569 $this_cigar_line = $starting_gap."D".$this_cigar_line if ($starting_gap);
1570 $this_cigar_line .= $ending_gap."D" if ($ending_gap);
1571 $this_genomic_align->cigar_line($this_cigar_line);
1572 $this_genomic_align->aligned_sequence(); # compute aligned_sequence using cigar_line
1573 }
1574 }
1575 ##
1576 ############################################################################################
1577
1578 ############################################################################################
1579 ##
1580 ## Distribute gaps
1581 ##
1582 my $aln_pos = 0;
1583 my $gap;
1584 do {
1585 my $gap_pos;
1586 my $genomic_align_block;
1587 $gap = undef;
1588
1589 ## Get the (next) first gap from all the alignments (sets: $gap_pos, $gap and $genomic_align_block_id)
1590 foreach my $this_genomic_align_block (@$all_genomic_align_blocks) {
1591 my $this_gap_pos = index($this_genomic_align_block->reference_genomic_align->aligned_sequence,
1592 "-", $aln_pos);
1593 if ($this_gap_pos > 0 and (!defined($gap_pos) or $this_gap_pos < $gap_pos)) {
1594 $gap_pos = $this_gap_pos;
1595 my $gap_string = substr($this_genomic_align_block->reference_genomic_align->aligned_sequence,
1596 $gap_pos);
1597 ($gap) = $gap_string =~ /^(\-+)/;
1598 $genomic_align_block = $this_genomic_align_block;
1599 }
1600 }
1601
1602 ## If a gap has been found, apply it to the other GAB
1603 if ($gap) {
1604 $aln_pos = $gap_pos + length($gap);
1605 foreach my $this_genomic_align_block (@$all_genomic_align_blocks) {
1606 next if ($genomic_align_block eq $this_genomic_align_block); # Do not add gap to itself!!
1607 if (UNIVERSAL::isa($this_genomic_align_block, "Bio::EnsEMBL::Compara::GenomicAlignTree")) {
1608 foreach my $this_node (@{$this_genomic_align_block->get_all_nodes}) {
1609 my $genomic_align_group = $this_node->genomic_align_group;
1610 next if (!$genomic_align_group);
1611 foreach my $this_genomic_align (@{$genomic_align_group->get_all_GenomicAligns}) {
1612 # insert gap in the aligned_sequence
1613 my $aligned_sequence = $this_genomic_align->aligned_sequence;
1614 substr($aligned_sequence, $gap_pos, 0, $gap);
1615 $this_genomic_align->aligned_sequence($aligned_sequence);
1616 }
1617 }
1618 } else {
1619 foreach my $this_genomic_align (@{$this_genomic_align_block->genomic_align_array}) {
1620 # insert gap in the aligned_sequence
1621 my $aligned_sequence = $this_genomic_align->aligned_sequence;
1622 substr($aligned_sequence, $gap_pos, 0, $gap);
1623 $this_genomic_align->aligned_sequence($aligned_sequence);
1624 }
1625 }
1626 }
1627 }
1628
1629 } while ($gap); # exit loop if no gap has been found
1630
1631 ## Fix all cigar_lines in order to match new aligned_sequences
1632 foreach my $this_genomic_align_block (@$all_genomic_align_blocks) {
1633 foreach my $this_genomic_align (@{$this_genomic_align_block->genomic_align_array}) {
1634 $this_genomic_align->cigar_line(""); # undef old cigar_line
1635 $this_genomic_align->cigar_line(); # compute cigar_line from aligned_sequence
1636 }
1637 }
1638 ##
1639 ############################################################################################
1640
1641 ############################################################################################
1642 ##
1643 ## Create the reference_genomic_align for this fake genomic_align_block
1644 ##
1645 ## All the blocks have been edited and all the reference genomic_aling
1646 ## should be equivalent. Here, we create a new one with no fixed sequence.
1647 ## This permits to retrieve the real sequence when needed
1648 ##
1649 my $reference_genomic_align;
1650 if (@$all_genomic_align_blocks) {
1651 my $this_genomic_align = $all_genomic_align_blocks->[0]->reference_genomic_align;
1652 $reference_genomic_align = new Bio::EnsEMBL::Compara::GenomicAlign(
1653 -dbID => -1,
1654 -dnafrag => $this_genomic_align->dnafrag,
1655 -dnafrag_start => $start_pos,
1656 -dnafrag_end => $end_pos,
1657 -dnafrag_strand => 1,
1658 -cigar_line => $this_genomic_align->cigar_line,
1659 -method_link_species_set => $this_genomic_align->method_link_species_set,
1660 -visible => 1
1661 );
1662 }
1663 ##
1664 ############################################################################################
1665
1666 ## Create the genomic_align_array (the list of genomic_aling for this fake gab
1667 my $genomic_align_array = [$reference_genomic_align];
1668 foreach my $this_genomic_align_block (@$all_genomic_align_blocks) {
1669 foreach my $this_genomic_align (@{$this_genomic_align_block->get_all_non_reference_genomic_aligns}) {
1670 $this_genomic_align->genomic_align_block_id(0); # undef old genomic_align_block_id
1671 push(@$genomic_align_array, $this_genomic_align);
1672 }
1673 }
1674
1675 ## Create the fake multiple Bio::EnsEMBL::Compara::GenomicAlignBlock
1676 my $fake_genomic_align_block = new Bio::EnsEMBL::Compara::GenomicAlignBlock(
1677 -length => ($end_pos - $start_pos + 1),
1678 -genomic_align_array => $genomic_align_array,
1679 -reference_genomic_align => $reference_genomic_align,
1680 -level_id => 0
1681 );
1682
1683 if ($strand == -1) {
1684 $fake_genomic_align_block->reverse_complement;
1685 }
1686
1687 return $fake_genomic_align_block;
1688 }
1689
1690
1691 sub DESTROY {
1692 my $self = shift;
1693 ## Remove circular reference in order to allow Perl to clear the object
1694 $self->{all_genomic_align_blocks} = undef;
1695 }
1696
1697 1;