annotate variant_effect_predictor/Bio/EnsEMBL/BaseAlignFeature.pm @ 1:d6778b5d8382 draft default tip

Deleted selected files
author willmclaren
date Fri, 03 Aug 2012 10:05:43 -0400
parents 21066c0abaf5
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1 =head1 LICENSE
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
2
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
3 Copyright (c) 1999-2012 The European Bioinformatics Institute and
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
4 Genome Research Limited. All rights reserved.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
5
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
6 This software is distributed under a modified Apache license.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
7 For license details, please see
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
8
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
9 http://www.ensembl.org/info/about/code_licence.html
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
10
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
11 =head1 CONTACT
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
12
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
13 Please email comments or questions to the public Ensembl
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
14 developers list at <dev@ensembl.org>.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
15
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
16 Questions may also be sent to the Ensembl help desk at
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
17 <helpdesk@ensembl.org>.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
18
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
19 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
20
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
21 =head1 NAME
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
22
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
23 Bio::EnsEMBL::BaseAlignFeature - Baseclass providing a common abstract
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
24 implmentation for alignment features
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
25
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
26 =head1 SYNOPSIS
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
27
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
28 my $feat = new Bio::EnsEMBL::DnaPepAlignFeature(
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
29 -slice => $slice,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
30 -start => 100,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
31 -end => 120,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
32 -strand => 1,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
33 -hseqname => 'SP:RF1231',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
34 -hstart => 200,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
35 -hend => 220,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
36 -analysis => $analysis,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
37 -cigar_string => '10M3D5M2I'
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
38 );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
39
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
40 Alternatively if you have an array of ungapped features
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
41
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
42 my $feat =
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
43 new Bio::EnsEMBL::DnaPepAlignFeature( -features => \@features );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
44
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
45 Where @features is an array of Bio::EnsEMBL::FeaturePair
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
46
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
47 There is a method to manipulate the cigar_string into ungapped features
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
48
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
49 my @ungapped_features = $feat->ungapped_features();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
50
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
51 This converts the cigar string into an array of Bio::EnsEMBL::FeaturePair
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
52
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
53 $analysis is a Bio::EnsEMBL::Analysis object
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
54
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
55 Bio::EnsEMBL::Feature methods can be used
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
56 Bio::EnsEMBL::FeaturePair methods can be used
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
57
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
58 The cigar_string contains the ungapped pieces that make up the gapped
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
59 alignment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
60
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
61 It looks like: n Matches [ x Deletes or Inserts m Matches ]*
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
62 but a bit more condensed like "23M4I12M2D1M"
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
63 and evenmore condensed as you can ommit 1s "23M4I12M2DM"
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
64
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
65
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
66 To make things clearer this is how a blast HSP would be parsed
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
67
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
68 >AK014066
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
69 Length = 146
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
70
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
71 Minus Strand HSPs:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
72
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
73 Score = 76 (26.8 bits), Expect = 1.4, P = 0.74
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
74 Identities = 20/71 (28%), Positives = 29/71 (40%), Frame = -1
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
75
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
76 Query: 479 GLQAPPPTPQGCRLIPPPPLGLQAPLPTLRAVGSSHHHP*GRQGSSLSSFRSSLASKASA 300
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
77 G APPP PQG R P P G + P L + + ++ R +A +
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
78 Sbjct: 7 GALAPPPAPQG-RWAFPRPTG-KRPATPLHGTARQDRQVRRSEAAKVTGCRGRVAPHVAP 64
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
79
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
80 Query: 299 SSPHNPSPLPS 267
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
81 H P+P P+
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
82 Sbjct: 65 PLTHTPTPTPT 75
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
83
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
84 The alignment goes from 267 to 479 in sequence 1 and 7 to 75 in sequence 2
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
85 and the strand is -1.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
86
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
87 The alignment is made up of the following ungapped pieces :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
88
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
89 sequence 1 start 447 , sequence 2 start 7 , match length 33 , strand -1
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
90 sequence 1 start 417 , sequence 2 start 18 , match length 27 , strand -1
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
91 sequence 1 start 267 , sequence 2 start 27 , match length 137 , strand -1
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
92
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
93 These ungapped pieces are made up into the following string (called a cigar
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
94 string) "33M3I27M3I137M" with start 267 end 479 strand -1 hstart 7 hend 75
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
95 hstrand 1 and feature type would be DnaPepAlignFeature
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
96
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
97 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
98
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
99
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
100 package Bio::EnsEMBL::BaseAlignFeature;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
101
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
102 use Bio::EnsEMBL::FeaturePair;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
103 use Bio::EnsEMBL::Utils::Argument qw(rearrange);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
104 use Bio::EnsEMBL::Utils::Exception qw(throw warning);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
105 use Scalar::Util qw(weaken isweak);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
106
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
107 use vars qw(@ISA);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
108 use strict;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
109
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
110 @ISA = qw(Bio::EnsEMBL::FeaturePair);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
111
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
112
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
113 =head2 new
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
114
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
115 Arg [..] : List of named arguments. (-cigar_string , -features) defined
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
116 in this constructor, others defined in FeaturePair and
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
117 SeqFeature superclasses. Either cigar_string or a list
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
118 of ungapped features should be provided - not both.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
119 Example : $baf = new BaseAlignFeatureSubclass(-cigar_string => '3M3I12M');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
120 Description: Creates a new BaseAlignFeature using either a cigarstring or
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
121 a list of ungapped features. BaseAlignFeature is an abstract
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
122 baseclass and should not actually be instantiated - rather its
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
123 subclasses should be.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
124 Returntype : Bio::EnsEMBL::BaseAlignFeature
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
125 Exceptions : thrown if both feature and cigar string args are provided
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
126 thrown if neither feature nor cigar string args are provided
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
127 Caller : general
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
128 Status : Stable
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
129
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
130 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
131
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
132 sub new {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
133
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
134 my $caller = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
135
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
136 my $class = ref($caller) || $caller;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
137
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
138 my $self = $class->SUPER::new(@_);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
139
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
140 my ($cigar_string,$features) = rearrange([qw(CIGAR_STRING FEATURES)], @_);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
141
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
142 if (defined($cigar_string) && defined($features)) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
143 throw("CIGAR_STRING or FEATURES argument is required - not both.");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
144 } elsif (defined($features)) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
145 $self->_parse_features($features);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
146
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
147 } elsif (defined($cigar_string)) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
148 $self->{'cigar_string'} = $cigar_string;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
149 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
150 throw("CIGAR_STRING or FEATURES argument is required");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
151 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
152
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
153 return $self;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
154 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
155
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
156
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
157 =head2 new_fast
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
158
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
159 Arg [1] : hashref $hashref
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
160 A hashref which will be blessed into a PepDnaAlignFeature.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
161 Example : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
162 Description: This allows for very fast object creation when a large number
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
163 of PepDnaAlignFeatures needs to be created. This is a bit of
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
164 a hack but necessary when thousands of features need to be
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
165 generated within a couple of seconds for web display. It is
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
166 not recommended that this method be called unless you know what
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
167 you are doing. It requires knowledge of the internals of this
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
168 class and its superclasses.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
169 Returntype : Bio::EnsEMBL::BaseAlignFeature
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
170 Exceptions : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
171 Caller : none currently
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
172 Status : Stable
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
173
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
174 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
175
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
176 sub new_fast {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
177 my ($class, $hashref) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
178 my $self = bless $hashref, $class;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
179 weaken($self->{adaptor}) if ( ! isweak($self->{adaptor}) );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
180 return $self;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
181 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
182
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
183
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
184 =head2 cigar_string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
185
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
186 Arg [1] : string $cigar_string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
187 Example : $feature->cigar_string( "12MI3M" );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
188 Description: get/set for attribute cigar_string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
189 cigar_string describes the alignment. "xM" stands for
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
190 x matches (mismatches), "xI" for inserts into query sequence
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
191 (thats the ensembl sequence), "xD" for deletions
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
192 (inserts in the subject). an "x" that is 1 can be omitted.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
193 Returntype : string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
194 Exceptions : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
195 Caller : general
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
196 Status : Stable
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
197
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
198 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
199
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
200 sub cigar_string {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
201 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
202 $self->{'cigar_string'} = shift if(@_);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
203 return $self->{'cigar_string'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
204 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
205
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
206
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
207 =head2 alignment_length
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
208
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
209 Arg [1] : None
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
210 Description: return the alignment length (including indels) based on the cigar_string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
211 Returntype : int
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
212 Exceptions :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
213 Caller :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
214 Status : Stable
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
215
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
216 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
217
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
218 sub alignment_length {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
219 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
220
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
221 if (! defined $self->{'_alignment_length'} && defined $self->cigar_string) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
222
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
223 my @pieces = ( $self->cigar_string =~ /(\d*[MDI])/g );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
224 unless (@pieces) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
225 print STDERR "Error parsing cigar_string\n";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
226 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
227 my $alignment_length = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
228 foreach my $piece (@pieces) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
229 my ($length) = ( $piece =~ /^(\d*)/ );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
230 if (! defined $length || $length eq "") {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
231 $length = 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
232 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
233 $alignment_length += $length;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
234 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
235 $self->{'_alignment_length'} = $alignment_length;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
236 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
237 return $self->{'_alignment_length'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
238 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
239
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
240 =head2 ungapped_features
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
241
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
242 Args : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
243 Example : @ungapped_features = $align_feature->get_feature
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
244 Description: converts the internal cigar_string into an array of
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
245 ungapped feature pairs
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
246 Returntype : list of Bio::EnsEMBL::FeaturePair
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
247 Exceptions : cigar_string not set internally
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
248 Caller : general
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
249 Status : Stable
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
250
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
251 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
252
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
253 sub ungapped_features {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
254 my ($self) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
255
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
256 if (!defined($self->{'cigar_string'})) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
257 throw("No cigar_string defined. Can't return ungapped features");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
258 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
259
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
260 return @{$self->_parse_cigar()};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
261 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
262
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
263 =head2 strands_reversed
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
264
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
265 Arg [1] : int $strands_reversed
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
266 Description: get/set for attribute strands_reversed
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
267 0 means that strand and hstrand are the original strands obtained
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
268 from the alignment program used
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
269 1 means that strand and hstrand have been flipped as compared to
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
270 the original result provided by the alignment program used.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
271 You may want to use the reverse_complement method to restore the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
272 original strandness.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
273 Returntype : int
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
274 Exceptions : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
275 Caller : general
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
276 Status : Stable
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
277
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
278 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
279
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
280 sub strands_reversed {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
281 my ($self, $arg) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
282
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
283 if ( defined $arg ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
284 $self->{'strands_reversed'} = $arg ;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
285 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
286
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
287 $self->{'strands_reversed'} = 0 unless (defined $self->{'strands_reversed'});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
288
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
289 return $self->{'strands_reversed'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
290 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
291
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
292 =head2 reverse_complement
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
293
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
294 Args : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
295 Description: reverse complement the FeaturePair,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
296 modifing strand, hstrand and cigar_string in consequence
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
297 Returntype : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
298 Exceptions : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
299 Caller : general
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
300 Status : Stable
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
301
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
302 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
303
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
304 sub reverse_complement {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
305 my ($self) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
306
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
307 # reverse strand in both sequences
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
308 $self->strand($self->strand * -1);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
309 $self->hstrand($self->hstrand * -1);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
310
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
311 # reverse cigar_string as consequence
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
312 my $cigar_string = $self->cigar_string;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
313 $cigar_string =~ s/(D|I|M)/$1 /g;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
314 my @cigar_pieces = split / /,$cigar_string;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
315 $cigar_string = "";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
316 while (my $piece = pop @cigar_pieces) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
317 $cigar_string .= $piece;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
318 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
319
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
320 $self->{'strands_reversed'} = 0 unless (defined $self->{'strands_reversed'});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
321
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
322 if ($self->strands_reversed) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
323 $self->strands_reversed(0)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
324 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
325 $self->strands_reversed(1);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
326 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
327
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
328 $self->cigar_string($cigar_string);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
329 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
330
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
331
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
332
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
333 =head2 transform
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
334
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
335 Arg 1 : String $coordinate_system_name
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
336 Arg [2] : String $coordinate_system_version
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
337 Example : $feature = $feature->transform('contig');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
338 $feature = $feature->transform('chromosome', 'NCBI33');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
339 Description: Moves this AlignFeature to the given coordinate system.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
340 If the feature cannot be transformed to the destination
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
341 coordinate system undef is returned instead.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
342 Returntype : Bio::EnsEMBL::BaseAlignFeature;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
343 Exceptions : wrong parameters
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
344 Caller : general
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
345 Status : Medium Risk
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
346 : deprecation needs to be removed at some time
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
347
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
348 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
349
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
350 sub transform {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
351 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
352
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
353 # catch for old style transform calls
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
354 if( ref $_[0] eq 'HASH') {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
355 deprecate("Calling transform with a hashref is deprecate.\n" .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
356 'Use $feat->transfer($slice) or ' .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
357 '$feat->transform("coordsysname") instead.');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
358 my (undef, $new_feat) = each(%{$_[0]});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
359 return $self->transfer($new_feat->slice);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
360 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
361
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
362 my $new_feature = $self->SUPER::transform(@_);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
363 if ( !defined($new_feature)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
364 || $new_feature->length() != $self->length() )
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
365 {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
366 my @segments = @{ $self->project(@_) };
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
367
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
368 if ( !@segments ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
369 return undef;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
370 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
371
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
372 my @ungapped;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
373 foreach my $f ( $self->ungapped_features() ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
374 $f = $f->transform(@_);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
375 if ( defined($f) ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
376 push( @ungapped, $f );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
377 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
378 warning( "Failed to transform alignment feature; "
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
379 . "ungapped component could not be transformed" );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
380 return undef;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
381 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
382 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
383
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
384 eval { $new_feature = $self->new( -features => \@ungapped ); };
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
385
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
386 if ($@) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
387 warning($@);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
388 return undef;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
389 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
390 } ## end if ( !defined($new_feature...))
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
391
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
392 return $new_feature;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
393 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
394
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
395
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
396 =head2 _parse_cigar
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
397
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
398 Args : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
399 Description: PRIVATE (internal) method - creates ungapped features from
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
400 internally stored cigar line
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
401 Returntype : list of Bio::EnsEMBL::FeaturePair
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
402 Exceptions : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
403 Caller : ungapped_features
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
404 Status : Stable
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
405
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
406 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
407
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
408 sub _parse_cigar {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
409 my ( $self ) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
410
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
411 my $query_unit = $self->_query_unit();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
412 my $hit_unit = $self->_hit_unit();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
413
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
414 my $string = $self->{'cigar_string'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
415
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
416 throw("No cigar string defined in object") if(!defined($string));
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
417
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
418 my @pieces = ( $string =~ /(\d*[MDI])/g );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
419 #print "cigar: ",join ( ",", @pieces ),"\n";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
420
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
421 my @features;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
422 my $strand1 = $self->{'strand'} || 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
423 my $strand2 = $self->{'hstrand'}|| 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
424
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
425 my ( $start1, $start2 );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
426
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
427 if( $strand1 == 1 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
428 $start1 = $self->{'start'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
429 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
430 $start1 = $self->{'end'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
431 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
432
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
433 if( $strand2 == 1 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
434 $start2 = $self->{'hstart'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
435 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
436 $start2 = $self->{'hend'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
437 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
438
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
439 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
440 # Construct ungapped blocks as FeaturePairs objects for each MATCH
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
441 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
442 foreach my $piece (@pieces) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
443
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
444 my ($length) = ( $piece =~ /^(\d*)/ );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
445 if( $length eq "" ) { $length = 1 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
446 my $mapped_length;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
447
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
448 # explicit if statements to avoid rounding problems
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
449 # and make sure we have sane coordinate systems
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
450 if( $query_unit == 1 && $hit_unit == 3 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
451 $mapped_length = $length*3;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
452 } elsif( $query_unit == 3 && $hit_unit == 1 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
453 $mapped_length = $length / 3;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
454 } elsif ( $query_unit == 1 && $hit_unit == 1 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
455 $mapped_length = $length;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
456 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
457 throw("Internal error $query_unit $hit_unit, currently only " .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
458 "allowing 1 or 3 ");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
459 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
460
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
461 if( int($mapped_length) != $mapped_length and
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
462 ($piece =~ /M$/ or $piece =~ /D$/)) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
463 throw("Internal error with mismapped length of hit, query " .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
464 "$query_unit, hit $hit_unit, length $length");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
465 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
466
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
467 if( $piece =~ /M$/ ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
468 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
469 # MATCH
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
470 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
471 my ( $qstart, $qend);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
472 if( $strand1 == 1 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
473 $qstart = $start1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
474 $qend = $start1 + $length - 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
475 $start1 = $qend + 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
476 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
477 $qend = $start1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
478 $qstart = $start1 - $length + 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
479 $start1 = $qstart - 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
480 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
481
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
482 my ($hstart, $hend);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
483 if( $strand2 == 1 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
484 $hstart = $start2;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
485 $hend = $start2 + $mapped_length - 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
486 $start2 = $hend + 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
487 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
488 $hend = $start2;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
489 $hstart = $start2 - $mapped_length + 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
490 $start2 = $hstart - 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
491 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
492
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
493
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
494 push @features, Bio::EnsEMBL::FeaturePair->new
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
495 (-SLICE => $self->{'slice'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
496 -SEQNAME => $self->{'seqname'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
497 -START => $qstart,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
498 -END => $qend,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
499 -STRAND => $strand1,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
500 -HSLICE => $self->{'hslice'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
501 -HSEQNAME => $self->{'hseqname'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
502 -HSTART => $hstart,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
503 -HEND => $hend,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
504 -HSTRAND => $strand2,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
505 -SCORE => $self->{'score'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
506 -PERCENT_ID => $self->{'percent_id'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
507 -ANALYSIS => $self->{'analysis'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
508 -P_VALUE => $self->{'p_value'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
509 -EXTERNAL_DB_ID => $self->{'external_db_id'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
510 -HCOVERAGE => $self->{'hcoverage'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
511 -GROUP_ID => $self->{'group_id'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
512 -LEVEL_ID => $self->{'level_id'});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
513
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
514
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
515 # end M cigar bits
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
516 } elsif( $piece =~ /I$/ ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
517 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
518 # INSERT
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
519 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
520 if( $strand1 == 1 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
521 $start1 += $length;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
522 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
523 $start1 -= $length;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
524 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
525 } elsif( $piece =~ /D$/ ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
526 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
527 # DELETION
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
528 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
529 if( $strand2 == 1 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
530 $start2 += $mapped_length;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
531 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
532 $start2 -= $mapped_length;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
533 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
534 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
535 throw( "Illegal cigar line $string!" );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
536 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
537 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
538
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
539 return \@features;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
540 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
541
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
542
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
543
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
544
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
545 =head2 _parse_features
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
546
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
547 Arg [1] : listref Bio::EnsEMBL::FeaturePair $ungapped_features
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
548 Description: creates internal cigarstring and start,end hstart,hend
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
549 entries.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
550 Returntype : none, fills in values of self
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
551 Exceptions : argument list undergoes many sanity checks - throws under many
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
552 invalid conditions
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
553 Caller : new
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
554 Status : Stable
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
555
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
556 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
557
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
558 my $message_only_once = 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
559
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
560 sub _parse_features {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
561 my ($self,$features ) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
562
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
563 my $query_unit = $self->_query_unit();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
564 my $hit_unit = $self->_hit_unit();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
565
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
566 if (ref($features) ne "ARRAY") {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
567 throw("features must be an array reference not a [".ref($features)."]");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
568 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
569
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
570 my $strand = $features->[0]->strand;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
571
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
572 throw ('FeaturePair needs to have strand == 1 or strand == -1') if(!$strand);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
573
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
574 my @f;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
575
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
576 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
577 # Sort the features on their start position
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
578 # Ascending order on positive strand, descending on negative strand
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
579 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
580 if( $strand == 1 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
581 @f = sort {$a->start <=> $b->start} @$features;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
582 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
583 @f = sort { $b->start <=> $a->start} @$features;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
584 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
585
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
586 my $hstrand = $f[0]->hstrand;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
587 my $slice = $f[0]->slice();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
588 my $hslice = $f[0]->hslice();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
589 my $name = $slice->name() if($slice);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
590 my $hname = $f[0]->hseqname;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
591 my $score = $f[0]->score;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
592 my $percent = $f[0]->percent_id;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
593 my $analysis = $f[0]->analysis;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
594 my $pvalue = $f[0]->p_value();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
595 my $external_db_id = $f[0]->external_db_id;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
596 my $hcoverage = $f[0]->hcoverage;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
597 my $group_id = $f[0]->group_id;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
598 my $level_id = $f[0]->level_id;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
599
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
600 my $seqname = $f[0]->seqname;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
601 # implicit strand 1 for peptide sequences
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
602 $strand ||= 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
603 $hstrand ||= 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
604 my $ori = $strand * $hstrand;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
605
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
606 throw("No features in the array to parse") if(scalar(@f) == 0);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
607
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
608 my $prev1; # where last feature q part ended
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
609 my $prev2; # where last feature s part ended
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
610
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
611 my $string;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
612
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
613 # Use strandedness info of query and hit to make sure both sets of
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
614 # start and end coordinates are oriented the right way around.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
615 my $f1start;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
616 my $f1end;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
617 my $f2end;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
618 my $f2start;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
619
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
620 if ($strand == 1) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
621 $f1start = $f[0]->start;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
622 $f1end = $f[-1]->end;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
623 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
624 $f1start = $f[-1]->start;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
625 $f1end = $f[0]->end;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
626 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
627
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
628 if ($hstrand == 1) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
629 $f2start = $f[0]->hstart;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
630 $f2end = $f[-1]->hend;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
631 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
632 $f2start = $f[-1]->hstart;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
633 $f2end = $f[0]->hend;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
634 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
635
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
636 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
637 # Loop through each portion of alignment and construct cigar string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
638 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
639
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
640 foreach my $f (@f) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
641 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
642 # Sanity checks
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
643 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
644
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
645 if (!$f->isa("Bio::EnsEMBL::FeaturePair")) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
646 throw("Array element [$f] is not a Bio::EnsEMBL::FeaturePair");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
647 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
648 if( defined($f->hstrand()) && $f->hstrand() != $hstrand ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
649 throw("Inconsistent hstrands in feature array");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
650 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
651 if( defined($f->strand()) && ($f->strand != $strand)) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
652 throw("Inconsistent strands in feature array");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
653 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
654 if ( defined($name) && $name ne $f->slice->name()) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
655 throw("Inconsistent names in feature array [$name - ".
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
656 $f->slice->name()."]");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
657 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
658 if ( defined($hname) && $hname ne $f->hseqname) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
659 throw("Inconsistent hit names in feature array [$hname - ".
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
660 $f->hseqname . "]");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
661 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
662 if ( defined($score) && $score ne $f->score) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
663 throw("Inconsisent scores in feature array [$score - " .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
664 $f->score . "]");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
665 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
666 if (defined($f->percent_id) && $percent ne $f->percent_id) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
667 throw("Inconsistent pids in feature array [$percent - " .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
668 $f->percent_id . "]");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
669 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
670 if(defined($pvalue) && $pvalue != $f->p_value()) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
671 throw("Inconsistant p_values in feature arraw [$pvalue " .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
672 $f->p_value() . "]");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
673 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
674 if($seqname && $seqname ne $f->seqname){
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
675 throw("Inconsistent seqname in feature array [$seqname - ".
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
676 $f->seqname . "]");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
677 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
678 my $start1 = $f->start; #source sequence alignment start
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
679 my $start2 = $f->hstart(); #hit sequence alignment start
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
680
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
681 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
682 # More sanity checking
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
683 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
684 if (defined($prev1)) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
685 if ( $strand == 1 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
686 if ($f->start < $prev1) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
687 throw("Inconsistent coords in feature array (forward strand).\n" .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
688 "Start [".$f->start()."] in current feature should be greater " .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
689 "than previous feature end [$prev1].");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
690 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
691 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
692 if ($f->end > $prev1) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
693 throw("Inconsistent coords in feature array (reverse strand).\n" .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
694 "End [".$f->end() ."] should be less than previous feature " .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
695 "start [$prev1].");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
696 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
697 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
698 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
699
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
700 my $length = ($f->end - $f->start + 1); #length of source seq alignment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
701 my $hlength = ($f->hend - $f->hstart + 1); #length of hit seq alignment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
702
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
703 # using multiplication to avoid rounding errors, hence the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
704 # switch from query to hit for the ratios
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
705
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
706 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
707 # Yet more sanity checking
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
708 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
709 if($query_unit > $hit_unit){
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
710 # I am going to make the assumption here that this situation will
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
711 # only occur with DnaPepAlignFeatures, this may not be true
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
712 my $query_p_length = sprintf "%.0f", ($length/$query_unit);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
713 my $hit_p_length = sprintf "%.0f", ($hlength * $hit_unit);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
714 if( $query_p_length != $hit_p_length) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
715 throw( "Feature lengths not comparable Lengths:" .$length .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
716 " " . $hlength . " Ratios:" . $query_unit . " " .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
717 $hit_unit );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
718 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
719 } else{
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
720 my $query_d_length = sprintf "%.0f", ($length*$hit_unit);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
721 my $hit_d_length = sprintf "%.0f", ($hlength * $query_unit);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
722 if( $length * $hit_unit != $hlength * $query_unit ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
723 throw( "Feature lengths not comparable Lengths:" . $length .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
724 " " . $hlength . " Ratios:" . $query_unit . " " .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
725 $hit_unit );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
726 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
727 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
728
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
729 my $hlengthfactor = ($query_unit/$hit_unit);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
730
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
731 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
732 # Check to see if there is an I type (insertion) gap:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
733 # If there is a space between the end of the last source sequence
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
734 # alignment and the start of this one, then this is an insertion
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
735 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
736
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
737 my $insertion_flag = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
738 if( $strand == 1 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
739 if( ( defined $prev1 ) && ( $f->start > $prev1 + 1 )) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
740
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
741 #there is an insertion
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
742 $insertion_flag = 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
743 my $gap = $f->start - $prev1 - 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
744 if( $gap == 1 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
745 $gap = ""; # no need for a number if gap length is 1
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
746 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
747 $string .= "$gap"."I";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
748
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
749 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
750
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
751 #shift our position in the source seq alignment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
752 $prev1 = $f->end();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
753 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
754
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
755 if(( defined $prev1 ) && ($f->end + 1 < $prev1 )) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
756
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
757 #there is an insertion
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
758 $insertion_flag = 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
759 my $gap = $prev1 - $f->end() - 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
760 if( $gap == 1 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
761 $gap = ""; # no need for a number if gap length is 1
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
762 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
763 $string .= "$gap"."I";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
764 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
765
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
766 #shift our position in the source seq alignment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
767 $prev1 = $f->start();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
768 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
769
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
770 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
771 # Check to see if there is a D type (deletion) gap
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
772 # There is a deletion gap if there is a space between the end of the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
773 # last portion of the hit sequence alignment and this one
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
774 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
775 if( $hstrand == 1 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
776 if(( defined $prev2 ) && ( $f->hstart() > $prev2 + 1 )) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
777
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
778 #there is a deletion
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
779 my $gap = $f->hstart - $prev2 - 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
780 my $gap2 = int( $gap * $hlengthfactor + 0.5 );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
781
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
782 if( $gap2 == 1 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
783 $gap2 = ""; # no need for a number if gap length is 1
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
784 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
785 $string .= "$gap2"."D";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
786
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
787 #sanity check, Should not be an insertion and deletion
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
788 if($insertion_flag) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
789 if ($message_only_once) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
790 warning("Should not be an deletion and insertion on the " .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
791 "same alignment region. cigar_line=$string\n");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
792 $message_only_once = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
793 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
794 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
795 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
796 #shift our position in the hit seq alignment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
797 $prev2 = $f->hend();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
798
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
799 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
800 if( ( defined $prev2 ) && ( $f->hend() + 1 < $prev2 )) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
801
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
802 #there is a deletion
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
803 my $gap = $prev2 - $f->hend - 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
804 my $gap2 = int( $gap * $hlengthfactor + 0.5 );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
805
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
806 if( $gap2 == 1 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
807 $gap2 = ""; # no need for a number if gap length is 1
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
808 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
809 $string .= "$gap2"."D";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
810
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
811 #sanity check, Should not be an insertion and deletion
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
812 if($insertion_flag) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
813 if ($message_only_once) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
814 warning("Should not be an deletion and insertion on the " .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
815 "same alignment region. prev2 = $prev2; f->hend() = " .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
816 $f->hend() . "; cigar_line = $string;\n");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
817 $message_only_once = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
818 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
819 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
820 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
821 #shift our position in the hit seq alignment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
822 $prev2 = $f->hstart();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
823 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
824
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
825 my $matchlength = $f->end() - $f->start() + 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
826 if( $matchlength == 1 ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
827 $matchlength = "";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
828 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
829 $string .= $matchlength."M";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
830 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
831
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
832 $self->{'start'} = $f1start;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
833 $self->{'end'} = $f1end;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
834 $self->{'seqname'} = $seqname;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
835 $self->{'strand'} = $strand;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
836 $self->{'score'} = $score;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
837 $self->{'percent_id'} = $percent;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
838 $self->{'analysis'} = $analysis;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
839 $self->{'slice'} = $slice;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
840 $self->{'hslice'} = $hslice;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
841 $self->{'hstart'} = $f2start;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
842 $self->{'hend'} = $f2end;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
843 $self->{'hstrand'} = $hstrand;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
844 $self->{'hseqname'} = $hname;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
845 $self->{'cigar_string'} = $string;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
846 $self->{'p_value'} = $pvalue;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
847 $self->{'external_db_id'} = $external_db_id;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
848 $self->{'hcoverage'} = $hcoverage;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
849 $self->{'group_id'} = $group_id;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
850 $self->{'level_id'} = $level_id;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
851 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
852
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
853
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
854
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
855
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
856
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
857
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
858 =head2 _hit_unit
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
859
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
860 Args : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
861 Description: abstract method, overwrite with something that returns
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
862 one or three
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
863 Returntype : int 1,3
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
864 Exceptions : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
865 Caller : internal
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
866 Status : Stable
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
867
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
868 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
869
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
870 sub _hit_unit {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
871 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
872 throw( "Abstract method call!" );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
873 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
874
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
875
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
876
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
877 =head2 _query_unit
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
878
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
879 Args : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
880 Description: abstract method, overwrite with something that returns
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
881 one or three
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
882 Returntype : int 1,3
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
883 Exceptions : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
884 Caller : internal
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
885 Status : Stable
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
886
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
887 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
888
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
889 sub _query_unit {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
890 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
891 throw( "Abstract method call!" );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
892 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
893
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
894
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
895
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
896
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
897 1;