annotate variant_effect_predictor/Bio/LocatableSeq.pm @ 1:d6778b5d8382 draft default tip

Deleted selected files
author willmclaren
date Fri, 03 Aug 2012 10:05:43 -0400
parents 21066c0abaf5
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1 # $Id: LocatableSeq.pm,v 1.22.2.1 2003/03/31 11:49:51 heikki Exp $
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
2 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
3 # BioPerl module for Bio::LocatableSeq
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
4 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
5 # Cared for by Ewan Birney <birney@sanger.ac.uk>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
6 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
7 # Copyright Ewan Birney
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
8 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
9 # You may distribute this module under the same terms as perl itself
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
10
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
11 # POD documentation - main docs before the code
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
12
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
13 =head1 NAME
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
14
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
15 Bio::LocatableSeq - A Sequence object with start/end points on it
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
16 that can be projected into a MSA or have coordinates relative to another seq.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
17
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
18 =head1 SYNOPSIS
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
19
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
20
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
21 use Bio::LocatableSeq;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
22 my $seq = new Bio::LocatableSeq(-seq => "CAGT-GGT",
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
23 -id => "seq1",
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
24 -start => 1,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
25 -end => 7);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
26
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
27
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
28 =head1 DESCRIPTION
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
29
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
30
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
31 # a normal sequence object
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
32 $locseq->seq();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
33 $locseq->id();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
34
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
35 # has start,end points
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
36 $locseq->start();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
37 $locseq->end();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
38
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
39 # inheriets off RangeI, so range operations possible
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
40
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
41 =head1 FEEDBACK
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
42
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
43
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
44 =head2 Mailing Lists
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
45
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
46
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
47 User feedback is an integral part of the evolution of this and other
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
48 Bioperl modules. Send your comments and suggestions preferably to one
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
49 of the Bioperl mailing lists. Your participation is much appreciated.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
50
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
51 bioperl-l@bioperl.org - General discussion
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
52 http://bio.perl.org/MailList.html - About the mailing lists
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
53
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
54
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
55 The locatable sequence object was developed mainly because the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
56 SimpleAlign object requires this functionality, and in the rewrite
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
57 of the Sequence object we had to decide what to do with this.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
58
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
59 It is, to be honest, not well integrated with the rest of bioperl, for
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
60 example, the trunc() function does not return a LocatableSeq object,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
61 as some might have thought. There are all sorts of nasty gotcha's
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
62 about interactions between coordinate systems when these sort of
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
63 objects are used.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
64
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
65
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
66 =head2 Reporting Bugs
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
67
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
68 Report bugs to the Bioperl bug tracking system to help us keep track
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
69 the bugs and their resolution. Bug reports can be submitted via email
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
70 or the web:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
71
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
72 bioperl-bugs@bio.perl.org
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
73 http://bugzilla.bioperl.org/
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
74
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
75
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
76 =head1 APPENDIX
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
77
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
78 The rest of the documentation details each of the object
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
79 methods. Internal methods are usually preceded with a _
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
80
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
81 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
82
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
83 #'
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
84 # Let the code begin...
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
85
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
86 package Bio::LocatableSeq;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
87 use vars qw(@ISA);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
88 use strict;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
89
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
90 use Bio::PrimarySeq;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
91 use Bio::RangeI;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
92 use Bio::Location::Simple;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
93 use Bio::Location::Fuzzy;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
94
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
95
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
96 @ISA = qw(Bio::PrimarySeq Bio::RangeI);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
97
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
98 sub new {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
99 my ($class, @args) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
100 my $self = $class->SUPER::new(@args);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
101
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
102 my ($start,$end,$strand) =
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
103 $self->_rearrange( [qw(START END STRAND)],
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
104 @args);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
105
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
106 defined $start && $self->start($start);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
107 defined $end && $self->end($end);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
108 defined $strand && $self->strand($strand);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
109
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
110 return $self; # success - we hope!
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
111 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
112
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
113 =head2 start
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
114
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
115 Title : start
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
116 Usage : $obj->start($newval)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
117 Function:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
118 Returns : value of start
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
119 Args : newvalue (optional)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
120
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
121 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
122
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
123 sub start{
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
124 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
125 if( @_ ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
126 my $value = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
127 $self->{'start'} = $value;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
128 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
129 return $self->{'start'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
130
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
131 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
132
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
133 =head2 end
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
134
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
135 Title : end
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
136 Usage : $obj->end($newval)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
137 Function:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
138 Returns : value of end
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
139 Args : newvalue (optional)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
140
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
141 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
142
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
143 sub end {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
144 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
145 if( @_ ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
146 my $value = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
147 my $string = $self->seq;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
148 if ($string and $self->start) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
149 my $s2 = $string;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
150 $string =~ s/[.-]+//g;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
151 my $len = CORE::length $string;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
152 my $new_end = $self->start + $len - 1 ;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
153 my $id = $self->id;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
154 $self->warn("In sequence $id residue count gives value $len.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
155 Overriding value [$value] with value $new_end for Bio::LocatableSeq::end().")
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
156 and $value = $new_end if $new_end != $value and $self->verbose > 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
157 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
158
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
159 $self->{'end'} = $value;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
160 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
161 return $self->{'end'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
162
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
163 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
164
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
165 =head2 strand
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
166
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
167 Title : strand
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
168 Usage : $obj->strand($newval)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
169 Function:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
170 Returns : value of strand
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
171 Args : newvalue (optional)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
172
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
173 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
174
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
175 sub strand{
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
176 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
177 if( @_ ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
178 my $value = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
179 $self->{'strand'} = $value;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
180 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
181 return $self->{'strand'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
182 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
183
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
184 =head2 get_nse
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
185
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
186 Title : get_nse
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
187 Usage :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
188 Function: read-only name of form id/start-end
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
189 Example :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
190 Returns :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
191 Args :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
192
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
193 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
194
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
195 sub get_nse{
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
196 my ($self,$char1,$char2) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
197
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
198 $char1 ||= "/";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
199 $char2 ||= "-";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
200
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
201 $self->throw("Attribute id not set") unless $self->id();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
202 $self->throw("Attribute start not set") unless $self->start();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
203 $self->throw("Attribute end not set") unless $self->end();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
204
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
205 return $self->id() . $char1 . $self->start . $char2 . $self->end ;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
206
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
207 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
208
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
209
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
210 =head2 no_gap
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
211
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
212 Title : no_gaps
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
213 Usage :$self->no_gaps('.')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
214 Function:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
215
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
216 Gets number of gaps in the sequence. The count excludes
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
217 leading or trailing gap characters.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
218
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
219 Valid bioperl sequence characters are [A-Za-z\-\.\*]. Of
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
220 these, '.' and '-' are counted as gap characters unless an
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
221 optional argument specifies one of them.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
222
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
223 Returns : number of internal gaps in the sequnce.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
224 Args : a gap character (optional)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
225
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
226 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
227
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
228 sub no_gaps {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
229 my ($self,$char) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
230 my ($seq, $count) = (undef, 0);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
231
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
232 # default gap characters
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
233 $char ||= '-.';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
234
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
235 $self->warn("I hope you know what you are doing setting gap to [$char]")
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
236 unless $char =~ /[-.]/;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
237
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
238 $seq = $self->seq;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
239 return 0 unless $seq; # empty sequence does not have gaps
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
240
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
241 $seq =~ s/^([$char]+)//;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
242 $seq =~ s/([$char]+)$//;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
243 $count++ while $seq =~ /[$char]+/g;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
244
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
245 return $count;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
246
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
247 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
248
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
249
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
250 =head2 column_from_residue_number
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
251
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
252 Title : column_from_residue_number
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
253 Usage : $col = $seq->column_from_residue_number($resnumber)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
254 Function:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
255
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
256 This function gives the position in the alignment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
257 (i.e. column number) of the given residue number in the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
258 sequence. For example, for the sequence
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
259
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
260 Seq1/91-97 AC..DEF.GH
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
261
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
262 column_from_residue_number(94) returns 5.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
263
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
264 An exception is thrown if the residue number would lie
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
265 outside the length of the aligment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
266 (e.g. column_from_residue_number( "Seq2", 22 )
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
267
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
268 Returns : A column number for the position of the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
269 given residue in the given sequence (1 = first column)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
270 Args : A residue number in the whole sequence (not just that
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
271 segment of it in the alignment)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
272
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
273 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
274
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
275 sub column_from_residue_number {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
276 my ($self, $resnumber) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
277
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
278 $self->throw("Residue number has to be a positive integer, not [$resnumber]")
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
279 unless $resnumber =~ /^\d+$/ and $resnumber > 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
280
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
281 if ($resnumber >= $self->start() and $resnumber <= $self->end()) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
282 my @residues = split //, $self->seq;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
283 my $count = $self->start();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
284 my $i;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
285 for ($i=0; $i < @residues; $i++) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
286 if ($residues[$i] ne '.' and $residues[$i] ne '-') {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
287 $count == $resnumber and last;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
288 $count++;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
289 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
290 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
291 # $i now holds the index of the column.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
292 # The actual column number is this index + 1
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
293
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
294 return $i+1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
295 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
296
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
297 $self->throw("Could not find residue number $resnumber");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
298
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
299 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
300
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
301
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
302 =head2 location_from_column
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
303
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
304 Title : location_from_column
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
305 Usage : $loc = $ali->location_from_column( $seq_number, $column_number)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
306 Function:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
307
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
308 This function gives the residue number in the sequence with
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
309 the given name for a given position in the alignment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
310 (i.e. column number) of the given. Gaps complicate this
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
311 process and force the output to be a L<Bio::Range> where
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
312 values can be undefined. For example, for the alignment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
313
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
314 Seq1/91-97 AC..DEF.G.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
315 Seq2/1-9 .CYHDEFKGK
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
316
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
317 location_from_column( Seq1/91-97, 3 ) position 93
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
318 location_from_column( Seq1/91-97, 2 ) position 92^93
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
319 location_from_column( Seq1/91-97, 10) position 97^98
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
320 location_from_column( Seq2/1-9, 1 ) position undef
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
321
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
322 An exact position returns a Bio::Location::Simple object
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
323 where where location_type() returns 'EXACT', if a position
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
324 is between bases location_type() returns 'IN-BETWEEN'.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
325 Column before the first residue returns undef. Note that if
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
326 the position is after the last residue in the alignment,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
327 that there is no guarantee that the original sequence has
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
328 residues after that position.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
329
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
330 An exception is thrown if the column number is not within
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
331 the sequence.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
332
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
333 Returns : Bio::Location::Simple or undef
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
334 Args : A column number
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
335 Throws : If column is not within the sequence
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
336
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
337 See L<Bio::Location::Simple> for more.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
338
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
339 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
340
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
341 sub location_from_column {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
342 my ($self, $column) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
343
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
344 $self->throw("Column number has to be a positive integer, not [$column]")
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
345 unless $column =~ /^\d+$/ and $column > 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
346 $self->throw("Column number [column] is larger than".
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
347 " sequence length [". $self->length. "]")
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
348 unless $column <= $self->length;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
349
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
350 my ($loc);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
351 my $s = $self->subseq(1,$column);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
352 $s =~ s/\W//g;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
353 my $pos = CORE::length $s;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
354
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
355 my $start = $self->start || 0 ;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
356 if ($self->subseq($column, $column) =~ /[a-zA-Z]/ ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
357 $loc = new Bio::Location::Simple
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
358 (-start => $pos + $start - 1,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
359 -end => $pos + $start - 1,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
360 -strand => 1
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
361 );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
362 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
363 elsif ($pos == 0 and $self->start == 1) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
364 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
365 $loc = new Bio::Location::Simple
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
366 (-start => $pos + $start - 1,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
367 -end => $pos +1 + $start - 1,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
368 -strand => 1,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
369 -location_type => 'IN-BETWEEN'
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
370 );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
371 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
372 return $loc;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
373 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
374
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
375 =head2 revcom
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
376
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
377 Title : revcom
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
378 Usage : $rev = $seq->revcom()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
379 Function: Produces a new Bio::LocatableSeq object which
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
380 has the reversed complement of the sequence. For protein
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
381 sequences this throws an exception of "Sequence is a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
382 protein. Cannot revcom"
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
383
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
384 Returns : A new Bio::LocatableSeq object
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
385 Args : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
386
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
387 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
388
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
389 sub revcom {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
390 my ($self) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
391
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
392 my $new = $self->SUPER::revcom;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
393 $new->strand($self->strand * -1);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
394 $new->start($self->start) if $self->start;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
395 $new->end($self->end) if $self->end;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
396 return $new;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
397 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
398
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
399
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
400 =head2 trunc
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
401
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
402 Title : trunc
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
403 Usage : $subseq = $myseq->trunc(10,100);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
404 Function: Provides a truncation of a sequence,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
405
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
406 Example :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
407 Returns : a fresh Bio::PrimarySeqI implementing object
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
408 Args : Two integers denoting first and last columns of the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
409 sequence to be included into sub-sequence.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
410
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
411
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
412 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
413
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
414 sub trunc {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
415
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
416 my ($self, $start, $end) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
417 my $new = $self->SUPER::trunc($start, $end);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
418
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
419 $start = $self->location_from_column($start);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
420 $start ? ($start = $start->start) : ($start = 1);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
421
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
422 $end = $self->location_from_column($end);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
423 $end = $end->start if $end;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
424
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
425 $new->strand($self->strand);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
426 $new->start($start) if $start;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
427 $new->end($end) if $end;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
428
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
429 return $new;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
430 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
431
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
432 1;