annotate variant_effect_predictor/Bio/LocatableSeq.pm @ 0:2bc9b66ada89 draft default tip

Uploaded
author mahtabm
date Thu, 11 Apr 2013 06:29:17 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
1 # $Id: LocatableSeq.pm,v 1.22.2.1 2003/03/31 11:49:51 heikki Exp $
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
2 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
3 # BioPerl module for Bio::LocatableSeq
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
4 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
5 # Cared for by Ewan Birney <birney@sanger.ac.uk>
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
6 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
7 # Copyright Ewan Birney
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
8 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
9 # You may distribute this module under the same terms as perl itself
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
10
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
11 # POD documentation - main docs before the code
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
12
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
13 =head1 NAME
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
14
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
15 Bio::LocatableSeq - A Sequence object with start/end points on it
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
16 that can be projected into a MSA or have coordinates relative to another seq.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
17
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
18 =head1 SYNOPSIS
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
19
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
20
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
21 use Bio::LocatableSeq;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
22 my $seq = new Bio::LocatableSeq(-seq => "CAGT-GGT",
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
23 -id => "seq1",
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
24 -start => 1,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
25 -end => 7);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
26
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
27
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
28 =head1 DESCRIPTION
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
29
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
30
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
31 # a normal sequence object
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
32 $locseq->seq();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
33 $locseq->id();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
34
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
35 # has start,end points
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
36 $locseq->start();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
37 $locseq->end();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
38
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
39 # inheriets off RangeI, so range operations possible
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
40
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
41 =head1 FEEDBACK
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
42
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
43
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
44 =head2 Mailing Lists
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
45
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
46
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
47 User feedback is an integral part of the evolution of this and other
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
48 Bioperl modules. Send your comments and suggestions preferably to one
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
49 of the Bioperl mailing lists. Your participation is much appreciated.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
50
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
51 bioperl-l@bioperl.org - General discussion
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
52 http://bio.perl.org/MailList.html - About the mailing lists
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
53
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
54
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
55 The locatable sequence object was developed mainly because the
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
56 SimpleAlign object requires this functionality, and in the rewrite
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
57 of the Sequence object we had to decide what to do with this.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
58
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
59 It is, to be honest, not well integrated with the rest of bioperl, for
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
60 example, the trunc() function does not return a LocatableSeq object,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
61 as some might have thought. There are all sorts of nasty gotcha's
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
62 about interactions between coordinate systems when these sort of
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
63 objects are used.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
64
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
65
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
66 =head2 Reporting Bugs
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
67
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
68 Report bugs to the Bioperl bug tracking system to help us keep track
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
69 the bugs and their resolution. Bug reports can be submitted via email
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
70 or the web:
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
71
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
72 bioperl-bugs@bio.perl.org
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
73 http://bugzilla.bioperl.org/
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
74
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
75
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
76 =head1 APPENDIX
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
77
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
78 The rest of the documentation details each of the object
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
79 methods. Internal methods are usually preceded with a _
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
80
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
81 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
82
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
83 #'
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
84 # Let the code begin...
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
85
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
86 package Bio::LocatableSeq;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
87 use vars qw(@ISA);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
88 use strict;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
89
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
90 use Bio::PrimarySeq;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
91 use Bio::RangeI;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
92 use Bio::Location::Simple;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
93 use Bio::Location::Fuzzy;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
94
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
95
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
96 @ISA = qw(Bio::PrimarySeq Bio::RangeI);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
97
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
98 sub new {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
99 my ($class, @args) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
100 my $self = $class->SUPER::new(@args);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
101
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
102 my ($start,$end,$strand) =
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
103 $self->_rearrange( [qw(START END STRAND)],
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
104 @args);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
105
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
106 defined $start && $self->start($start);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
107 defined $end && $self->end($end);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
108 defined $strand && $self->strand($strand);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
109
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
110 return $self; # success - we hope!
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
111 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
112
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
113 =head2 start
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
114
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
115 Title : start
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
116 Usage : $obj->start($newval)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
117 Function:
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
118 Returns : value of start
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
119 Args : newvalue (optional)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
120
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
121 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
122
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
123 sub start{
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
124 my $self = shift;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
125 if( @_ ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
126 my $value = shift;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
127 $self->{'start'} = $value;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
128 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
129 return $self->{'start'};
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
130
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
131 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
132
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
133 =head2 end
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
134
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
135 Title : end
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
136 Usage : $obj->end($newval)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
137 Function:
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
138 Returns : value of end
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
139 Args : newvalue (optional)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
140
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
141 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
142
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
143 sub end {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
144 my $self = shift;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
145 if( @_ ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
146 my $value = shift;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
147 my $string = $self->seq;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
148 if ($string and $self->start) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
149 my $s2 = $string;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
150 $string =~ s/[.-]+//g;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
151 my $len = CORE::length $string;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
152 my $new_end = $self->start + $len - 1 ;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
153 my $id = $self->id;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
154 $self->warn("In sequence $id residue count gives value $len.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
155 Overriding value [$value] with value $new_end for Bio::LocatableSeq::end().")
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
156 and $value = $new_end if $new_end != $value and $self->verbose > 0;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
157 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
158
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
159 $self->{'end'} = $value;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
160 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
161 return $self->{'end'};
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
162
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
163 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
164
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
165 =head2 strand
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
166
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
167 Title : strand
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
168 Usage : $obj->strand($newval)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
169 Function:
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
170 Returns : value of strand
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
171 Args : newvalue (optional)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
172
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
173 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
174
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
175 sub strand{
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
176 my $self = shift;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
177 if( @_ ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
178 my $value = shift;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
179 $self->{'strand'} = $value;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
180 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
181 return $self->{'strand'};
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
182 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
183
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
184 =head2 get_nse
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
185
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
186 Title : get_nse
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
187 Usage :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
188 Function: read-only name of form id/start-end
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
189 Example :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
190 Returns :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
191 Args :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
192
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
193 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
194
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
195 sub get_nse{
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
196 my ($self,$char1,$char2) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
197
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
198 $char1 ||= "/";
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
199 $char2 ||= "-";
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
200
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
201 $self->throw("Attribute id not set") unless $self->id();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
202 $self->throw("Attribute start not set") unless $self->start();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
203 $self->throw("Attribute end not set") unless $self->end();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
204
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
205 return $self->id() . $char1 . $self->start . $char2 . $self->end ;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
206
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
207 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
208
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
209
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
210 =head2 no_gap
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
211
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
212 Title : no_gaps
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
213 Usage :$self->no_gaps('.')
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
214 Function:
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
215
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
216 Gets number of gaps in the sequence. The count excludes
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
217 leading or trailing gap characters.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
218
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
219 Valid bioperl sequence characters are [A-Za-z\-\.\*]. Of
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
220 these, '.' and '-' are counted as gap characters unless an
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
221 optional argument specifies one of them.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
222
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
223 Returns : number of internal gaps in the sequnce.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
224 Args : a gap character (optional)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
225
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
226 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
227
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
228 sub no_gaps {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
229 my ($self,$char) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
230 my ($seq, $count) = (undef, 0);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
231
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
232 # default gap characters
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
233 $char ||= '-.';
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
234
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
235 $self->warn("I hope you know what you are doing setting gap to [$char]")
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
236 unless $char =~ /[-.]/;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
237
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
238 $seq = $self->seq;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
239 return 0 unless $seq; # empty sequence does not have gaps
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
240
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
241 $seq =~ s/^([$char]+)//;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
242 $seq =~ s/([$char]+)$//;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
243 $count++ while $seq =~ /[$char]+/g;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
244
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
245 return $count;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
246
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
247 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
248
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
249
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
250 =head2 column_from_residue_number
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
251
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
252 Title : column_from_residue_number
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
253 Usage : $col = $seq->column_from_residue_number($resnumber)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
254 Function:
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
255
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
256 This function gives the position in the alignment
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
257 (i.e. column number) of the given residue number in the
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
258 sequence. For example, for the sequence
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
259
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
260 Seq1/91-97 AC..DEF.GH
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
261
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
262 column_from_residue_number(94) returns 5.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
263
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
264 An exception is thrown if the residue number would lie
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
265 outside the length of the aligment
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
266 (e.g. column_from_residue_number( "Seq2", 22 )
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
267
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
268 Returns : A column number for the position of the
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
269 given residue in the given sequence (1 = first column)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
270 Args : A residue number in the whole sequence (not just that
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
271 segment of it in the alignment)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
272
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
273 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
274
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
275 sub column_from_residue_number {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
276 my ($self, $resnumber) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
277
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
278 $self->throw("Residue number has to be a positive integer, not [$resnumber]")
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
279 unless $resnumber =~ /^\d+$/ and $resnumber > 0;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
280
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
281 if ($resnumber >= $self->start() and $resnumber <= $self->end()) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
282 my @residues = split //, $self->seq;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
283 my $count = $self->start();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
284 my $i;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
285 for ($i=0; $i < @residues; $i++) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
286 if ($residues[$i] ne '.' and $residues[$i] ne '-') {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
287 $count == $resnumber and last;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
288 $count++;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
289 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
290 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
291 # $i now holds the index of the column.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
292 # The actual column number is this index + 1
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
293
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
294 return $i+1;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
295 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
296
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
297 $self->throw("Could not find residue number $resnumber");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
298
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
299 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
300
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
301
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
302 =head2 location_from_column
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
303
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
304 Title : location_from_column
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
305 Usage : $loc = $ali->location_from_column( $seq_number, $column_number)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
306 Function:
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
307
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
308 This function gives the residue number in the sequence with
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
309 the given name for a given position in the alignment
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
310 (i.e. column number) of the given. Gaps complicate this
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
311 process and force the output to be a L<Bio::Range> where
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
312 values can be undefined. For example, for the alignment
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
313
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
314 Seq1/91-97 AC..DEF.G.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
315 Seq2/1-9 .CYHDEFKGK
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
316
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
317 location_from_column( Seq1/91-97, 3 ) position 93
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
318 location_from_column( Seq1/91-97, 2 ) position 92^93
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
319 location_from_column( Seq1/91-97, 10) position 97^98
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
320 location_from_column( Seq2/1-9, 1 ) position undef
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
321
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
322 An exact position returns a Bio::Location::Simple object
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
323 where where location_type() returns 'EXACT', if a position
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
324 is between bases location_type() returns 'IN-BETWEEN'.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
325 Column before the first residue returns undef. Note that if
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
326 the position is after the last residue in the alignment,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
327 that there is no guarantee that the original sequence has
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
328 residues after that position.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
329
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
330 An exception is thrown if the column number is not within
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
331 the sequence.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
332
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
333 Returns : Bio::Location::Simple or undef
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
334 Args : A column number
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
335 Throws : If column is not within the sequence
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
336
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
337 See L<Bio::Location::Simple> for more.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
338
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
339 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
340
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
341 sub location_from_column {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
342 my ($self, $column) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
343
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
344 $self->throw("Column number has to be a positive integer, not [$column]")
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
345 unless $column =~ /^\d+$/ and $column > 0;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
346 $self->throw("Column number [column] is larger than".
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
347 " sequence length [". $self->length. "]")
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
348 unless $column <= $self->length;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
349
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
350 my ($loc);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
351 my $s = $self->subseq(1,$column);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
352 $s =~ s/\W//g;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
353 my $pos = CORE::length $s;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
354
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
355 my $start = $self->start || 0 ;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
356 if ($self->subseq($column, $column) =~ /[a-zA-Z]/ ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
357 $loc = new Bio::Location::Simple
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
358 (-start => $pos + $start - 1,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
359 -end => $pos + $start - 1,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
360 -strand => 1
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
361 );
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
362 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
363 elsif ($pos == 0 and $self->start == 1) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
364 } else {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
365 $loc = new Bio::Location::Simple
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
366 (-start => $pos + $start - 1,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
367 -end => $pos +1 + $start - 1,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
368 -strand => 1,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
369 -location_type => 'IN-BETWEEN'
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
370 );
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
371 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
372 return $loc;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
373 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
374
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
375 =head2 revcom
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
376
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
377 Title : revcom
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
378 Usage : $rev = $seq->revcom()
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
379 Function: Produces a new Bio::LocatableSeq object which
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
380 has the reversed complement of the sequence. For protein
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
381 sequences this throws an exception of "Sequence is a
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
382 protein. Cannot revcom"
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
383
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
384 Returns : A new Bio::LocatableSeq object
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
385 Args : none
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
386
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
387 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
388
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
389 sub revcom {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
390 my ($self) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
391
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
392 my $new = $self->SUPER::revcom;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
393 $new->strand($self->strand * -1);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
394 $new->start($self->start) if $self->start;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
395 $new->end($self->end) if $self->end;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
396 return $new;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
397 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
398
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
399
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
400 =head2 trunc
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
401
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
402 Title : trunc
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
403 Usage : $subseq = $myseq->trunc(10,100);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
404 Function: Provides a truncation of a sequence,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
405
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
406 Example :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
407 Returns : a fresh Bio::PrimarySeqI implementing object
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
408 Args : Two integers denoting first and last columns of the
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
409 sequence to be included into sub-sequence.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
410
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
411
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
412 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
413
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
414 sub trunc {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
415
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
416 my ($self, $start, $end) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
417 my $new = $self->SUPER::trunc($start, $end);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
418
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
419 $start = $self->location_from_column($start);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
420 $start ? ($start = $start->start) : ($start = 1);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
421
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
422 $end = $self->location_from_column($end);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
423 $end = $end->start if $end;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
424
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
425 $new->strand($self->strand);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
426 $new->start($start) if $start;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
427 $new->end($end) if $end;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
428
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
429 return $new;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
430 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
431
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
432 1;