Mercurial > repos > willmclaren > ensembl_vep
comparison variant_effect_predictor/Bio/Tools/Blast/HSP.pm @ 0:21066c0abaf5 draft
Uploaded
author | willmclaren |
---|---|
date | Fri, 03 Aug 2012 10:04:48 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:21066c0abaf5 |
---|---|
1 #---------------------------------------------------------------------------- | |
2 # PACKAGE : Bio::Tools::Blast::HSP | |
3 # AUTHOR : Steve Chervitz (sac@bioperl.org) | |
4 # CREATED : March 1996 | |
5 # STATUS : Alpha | |
6 # REVISION: $Id: HSP.pm,v 1.18 2002/10/22 07:38:48 lapp Exp $ | |
7 # | |
8 # For the latest version and documentation, visit the distribution site: | |
9 # http://genome-www.stanford.edu/perlOOP/bioperl/blast/ | |
10 # | |
11 # To generate documentation, run this module through pod2html | |
12 # (preferably from Perl v5.004 or better). | |
13 # | |
14 # Copyright (c) 1996-2000 Steve Chervitz. All Rights Reserved. | |
15 # This module is free software; you can redistribute it and/or | |
16 # modify it under the same terms as Perl itself. | |
17 #---------------------------------------------------------------------------- | |
18 | |
19 package Bio::Tools::Blast::HSP; | |
20 | |
21 use Bio::Root::Global qw(:devel); | |
22 use Bio::Root::Object (); | |
23 #use Bio::Root::Err qw(:std); | |
24 | |
25 @ISA = qw( Bio::Root::Object); | |
26 | |
27 use strict; | |
28 use vars qw($ID $GAP_SYMBOL @SCORE_CUTOFFS $Revision %STRAND_SYMBOL); | |
29 $ID = 'Bio::Tools::Blast::HSP'; | |
30 $Revision = '$Id: HSP.pm,v 1.18 2002/10/22 07:38:48 lapp Exp $'; #' | |
31 | |
32 $GAP_SYMBOL = '-'; # Need a more general way to handle gap symbols. | |
33 @SCORE_CUTOFFS = ( 100, 30 ); # Bit score cutoffs (see homol_score()). | |
34 %STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1); | |
35 | |
36 ## POD Documentation: | |
37 | |
38 =head1 NAME | |
39 | |
40 Bio::Tools::Blast::HSP - Bioperl BLAST High-Scoring Segment Pair object | |
41 | |
42 =head1 SYNOPSIS | |
43 | |
44 =head2 Object Creation | |
45 | |
46 The construction of HSP objects is handled by Bio::Tools::Blast:: Sbjct.pm. | |
47 You should not need to use this package directly. See L<_initialize()|_initialize> | |
48 for a description of constructor parameters. | |
49 | |
50 require Bio::Tools::Blast::HSP; | |
51 | |
52 $hspObj = eval{ new Bio::Tools::Blast::HSP(-DATA =>\@hspData, | |
53 -PARENT =>$sbjct_object, | |
54 -NAME =>$hspCount, | |
55 -PROGRAM =>'TBLASTN', | |
56 ); | |
57 }; | |
58 | |
59 @hspData includes the raw BLAST report data for a specific HSP, | |
60 and is prepared by Bio::Tools::Blast::Sbjct.pm. | |
61 | |
62 =head1 INSTALLATION | |
63 | |
64 This module is included with the central Bioperl distribution: | |
65 | |
66 http://bio.perl.org/Core/Latest | |
67 ftp://bio.perl.org/pub/DIST | |
68 | |
69 Follow the installation instructions included in the README file. | |
70 | |
71 =head1 DESCRIPTION | |
72 | |
73 The Bio::Tools::Blast::HSP.pm module encapsulates data and methods for | |
74 manipulating, parsing, and analyzing HSPs ("High-scoring Segment Pairs") | |
75 derived from BLAST sequence analysis. | |
76 | |
77 This module is a utility module used by the B<Bio::Tools::Blast::Sbjct.pm> | |
78 and is not intended for separate use. Please see documentation for | |
79 B<Bio::Tools::Blast.pm> for some basic information about using | |
80 HSP objects (L<Links:>). | |
81 | |
82 =over 0 | |
83 | |
84 =item * Supports BLAST versions 1.x and 2.x, gapped and ungapped. | |
85 | |
86 =back | |
87 | |
88 Bio::Tools::Blast::HSP.pm has the ability to extract a list of all | |
89 residue indices for identical and conservative matches along both | |
90 query and sbjct sequences. Since this degree of detail is not always | |
91 needed, this behavior does not occur during construction of the HSP | |
92 object. These data will automatically be collected as necessary as | |
93 the HSP.pm object is used. | |
94 | |
95 =head1 DEPENDENCIES | |
96 | |
97 Bio::Tools::Blast::HSP.pm is a concrete class that inherits from | |
98 B<Bio::Root::Object.pm> and relies on B<Bio::Tools::Sbjct.pm> as a | |
99 container for HSP.pm objects. B<Bio::Seq.pm> and B<Bio::UnivAln.pm> | |
100 are employed for creating sequence and alignment objects, | |
101 respectively. | |
102 | |
103 | |
104 =head2 Relationship to UnivAln.pm & Seq.pm | |
105 | |
106 HSP.pm can provide the query or sbjct sequence as a B<Bio::Seq.pm> | |
107 object via the L<seq()|seq> method. The HSP.pm object can also create a | |
108 two-sequence B<Bio::UnivAln.pm> alignment object using the the query | |
109 and sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment | |
110 objects is not automatic when constructing the HSP.pm object since | |
111 this level of functionality is not always required and would generate | |
112 a lot of extra overhead when crunching many reports. | |
113 | |
114 | |
115 =head1 FEEDBACK | |
116 | |
117 =head2 Mailing Lists | |
118 | |
119 User feedback is an integral part of the evolution of this and other | |
120 Bioperl modules. Send your comments and suggestions preferably to one | |
121 of the Bioperl mailing lists. Your participation is much appreciated. | |
122 | |
123 bioperl-l@bioperl.org - General discussion | |
124 http://bio.perl.org/MailList.html - About the mailing lists | |
125 | |
126 =head2 Reporting Bugs | |
127 | |
128 Report bugs to the Bioperl bug tracking system to help us keep track | |
129 the bugs and their resolution. Bug reports can be submitted via email | |
130 or the web: | |
131 | |
132 bioperl-bugs@bio.perl.org | |
133 http://bugzilla.bioperl.org/ | |
134 | |
135 =head1 AUTHOR | |
136 | |
137 Steve Chervitz, E<lt>sac@bioperl.orgE<gt> | |
138 | |
139 =head1 SEE ALSO | |
140 | |
141 Bio::Tools::Blast::Sbjct.pm - Blast hit object. | |
142 Bio::Tools::Blast.pm - Blast object. | |
143 Bio::Seq.pm - Biosequence object | |
144 Bio::UnivAln.pm - Biosequence alignment object. | |
145 Bio::Root::Object.pm - Proposed base class for all Bioperl objects. | |
146 | |
147 =head2 Links: | |
148 | |
149 http://bio.perl.org/Core/POD/Tools/Blast/Sbjct.pm.html | |
150 | |
151 http://bio.perl.org/Projects/modules.html - Online module documentation | |
152 http://bio.perl.org/Projects/Blast/ - Bioperl Blast Project | |
153 http://bio.perl.org/ - Bioperl Project Homepage | |
154 | |
155 =head1 COPYRIGHT | |
156 | |
157 Copyright (c) 1996-98 Steve Chervitz. All Rights Reserved. | |
158 This module is free software; you can redistribute it and/or | |
159 modify it under the same terms as Perl itself. | |
160 | |
161 =cut | |
162 | |
163 | |
164 | |
165 # | |
166 ## | |
167 ### | |
168 #### END of main POD documentation. | |
169 ### | |
170 ## | |
171 # | |
172 | |
173 =head1 APPENDIX | |
174 | |
175 Methods beginning with a leading underscore are considered private | |
176 and are intended for internal use by this module. They are | |
177 B<not> considered part of the public interface and are described here | |
178 for documentation purposes only. | |
179 | |
180 =cut | |
181 | |
182 ##################################################################################### | |
183 ## CONSTRUCTOR ## | |
184 ##################################################################################### | |
185 | |
186 =head2 _initialize | |
187 | |
188 Usage : n/a; automatically called by Bio::Root::Object::new() | |
189 : Bio::Tools::Blast::HSP.pm objects are constructed | |
190 : automatically by Bio::Tools::Sbjct.pm, so there is no need | |
191 : for direct consumption. | |
192 Purpose : Initializes HSP data and calls private methods to extract | |
193 : the data for a given HSP. | |
194 : Calls superclass constructor first (Bio::Root::Object.pm). | |
195 Returns : n/a | |
196 Argument : Named parameters passed from new(): | |
197 : All tags must be uppercase (does not call _rearrange()). | |
198 : -DATA => array ref containing raw data for one HSP. | |
199 : -PARENT => Sbjct.pm object ref. | |
200 : -NAME => integer (1..n). | |
201 : -PROGRAM => string ('TBLASTN', 'BLASTP', etc.). | |
202 | |
203 See Also : L<_set_data()|_set_data>, B<Bio::Root::Object::new()>, B<Bio::Tools::Blast::Sbjct::_set_hsps()> | |
204 | |
205 =cut | |
206 | |
207 #---------------- | |
208 sub _initialize { | |
209 #---------------- | |
210 my( $self, %param ) = @_; | |
211 | |
212 $self->SUPER::_initialize( %param ); | |
213 | |
214 # The gapped and program booleans may be needed after the HSP object | |
215 # is built. | |
216 # $self->{'_gapped'} = $param{-GAPPED} || 0; | |
217 $self->{'_prog'} = $param{-PROGRAM} || 0; | |
218 $self->_set_data( @{$param{-DATA}} ); | |
219 } | |
220 | |
221 ##################################################################################### | |
222 ## ACCESSORS ## | |
223 ##################################################################################### | |
224 | |
225 | |
226 =head2 _set_data | |
227 | |
228 Usage : n/a; called automatically during object construction. | |
229 Purpose : Sets the query sequence, sbjct sequence, and the "match" data | |
230 : which consists of the symbols between the query and sbjct lines | |
231 : in the alignment. | |
232 Argument : Array (all lines from a single, complete HSP, one line per element) | |
233 Throws : Propagates any exceptions from the methods called ("See Also") | |
234 | |
235 See Also : L<_set_seq()|_set_seq>, L<_set_residues()|_set_residues>, L<_set_score_stats()|_set_score_stats>, L<_set_match_stats()|_set_match_stats>, L<_initialize()|_initialize> | |
236 | |
237 =cut | |
238 | |
239 #-------------- | |
240 sub _set_data { | |
241 #-------------- | |
242 my $self = shift; | |
243 my @data = @_; | |
244 my @queryList = (); # 'Query' = SEQUENCE USED TO QUERY THE DATABASE. | |
245 my @sbjctList = (); # 'Sbjct' = HOMOLOGOUS SEQUENCE FOUND IN THE DATABASE. | |
246 my @matchList = (); | |
247 my $matchLine = 0; # Alternating boolean: when true, load 'match' data. | |
248 my @linedat = (); | |
249 | |
250 $DEBUG and print STDERR "$ID: set_data()\n"; | |
251 | |
252 my($line, $aln_row_len, $length_diff); | |
253 $length_diff = 0; | |
254 | |
255 # Collecting data for all lines in the alignment | |
256 # and then storing the collections for possible processing later. | |
257 # | |
258 # Note that "match" lines may not be properly padded with spaces. | |
259 # This loop now properly handles such cases: | |
260 # Query: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVIXXXXX 1200 | |
261 # PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVI | |
262 # Sbjct: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVILSLKL 1200 | |
263 | |
264 foreach $line( @data ) { | |
265 next if $line =~ /^\s*$/; | |
266 | |
267 if( $line =~ /^ ?Score/ ) { | |
268 $self->_set_score_stats( $line ); | |
269 } elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) { | |
270 $self->_set_match_stats( $line ); | |
271 } elsif( $line =~ /^ ?Frame = ([\d+-]+)/ ) { | |
272 # Version 2.0.8 has Frame information on a separate line. | |
273 $self->{'_frame'} = $1; | |
274 } elsif( $line =~ /^(Query:?[\s\d]+)([^\s\d]+)/ ) { | |
275 push @queryList, $line; | |
276 $self->{'_match_indent'} = CORE::length $1; | |
277 $aln_row_len = (CORE::length $1) + (CORE::length $2); | |
278 $matchLine = 1; | |
279 } elsif( $matchLine ) { | |
280 # Pad the match line with spaces if necessary. | |
281 $length_diff = $aln_row_len - CORE::length $line; | |
282 $length_diff and $line .= ' 'x $length_diff; | |
283 push @matchList, $line; | |
284 $matchLine = 0; | |
285 } elsif( $line =~ /^Sbjct/ ) { | |
286 push @sbjctList, $line; | |
287 } | |
288 } | |
289 | |
290 # Storing the query and sbjct lists in case they are needed later. | |
291 # We could make this conditional to save memory. | |
292 $self->{'_queryList'} = \@queryList; | |
293 $self->{'_sbjctList'} = \@sbjctList; | |
294 | |
295 # Storing the match list in case it is needed later. | |
296 $self->{'_matchList'} = \@matchList; | |
297 | |
298 if(not defined ($self->{'_numIdentical'})) { | |
299 $self->throw("Can't parse match statistics.", | |
300 "Possibly a new or unrecognized Blast format."); | |
301 } | |
302 | |
303 if(!scalar @queryList or !scalar @sbjctList) { | |
304 $self->throw("Can't find query or sbjct alignment lines.", | |
305 "Possibly unrecognized Blast format."); | |
306 } | |
307 } | |
308 | |
309 | |
310 | |
311 =head2 _set_score_stats | |
312 | |
313 Usage : n/a; called automatically by _set_data() | |
314 Purpose : Sets various score statistics obtained from the HSP listing. | |
315 Argument : String with any of the following formats: | |
316 : blast2: Score = 30.1 bits (66), Expect = 9.2 | |
317 : blast2: Score = 158.2 bits (544), Expect(2) = e-110 | |
318 : blast1: Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40 | |
319 : blast1: Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99 | |
320 Throws : Exception if the stats cannot be parsed, probably due to a change | |
321 : in the Blast report format. | |
322 | |
323 See Also : L<_set_data()|_set_data> | |
324 | |
325 =cut | |
326 | |
327 #-------------------- | |
328 sub _set_score_stats { | |
329 #-------------------- | |
330 my ($self, $data) = @_; | |
331 | |
332 my ($expect, $p); | |
333 | |
334 if($data =~ /Score = +([\d.e+-]+) bits \(([\d.e+-]+)\), +Expect = +([\d.e+-]+)/) { | |
335 # blast2 format n = 1 | |
336 $self->{'_bits'} = $1; | |
337 $self->{'_score'} = $2; | |
338 $expect = $3; | |
339 } elsif($data =~ /Score = +([\d.e+-]+) bits \(([\d.e+-]+)\), +Expect\((\d+)\) = +([\d.e+-]+)/) { | |
340 # blast2 format n > 1 | |
341 $self->{'_bits'} = $1; | |
342 $self->{'_score'} = $2; | |
343 $self->{'_n'} = $3; | |
344 $expect = $4; | |
345 | |
346 } elsif($data =~ /Score = +([\d.e+-]+) \(([\d.e+-]+) bits\), +Expect = +([\d.e+-]+), P = +([\d.e-]+)/) { | |
347 # blast1 format, n = 1 | |
348 $self->{'_score'} = $1; | |
349 $self->{'_bits'} = $2; | |
350 $expect = $3; | |
351 $p = $4; | |
352 | |
353 } elsif($data =~ /Score = +([\d.e+-]+) \(([\d.e+-]+) bits\), +Expect = +([\d.e+-]+), +Sum P\((\d+)\) = +([\d.e-]+)/) { | |
354 # blast1 format, n > 1 | |
355 $self->{'_score'} = $1; | |
356 $self->{'_bits'} = $2; | |
357 $expect = $3; | |
358 $self->{'_n'} = $4; | |
359 $p = $5; | |
360 | |
361 } else { | |
362 $self->throw("Can't parse score statistics: unrecognized format.", "$data"); | |
363 } | |
364 | |
365 $expect = "1$expect" if $expect =~ /^e/i; | |
366 $p = "1$p" if defined $p and $p=~ /^e/i; | |
367 | |
368 $self->{'_expect'} = $expect; | |
369 $self->{'_p'} = $p || undef; | |
370 | |
371 } | |
372 | |
373 | |
374 | |
375 =head2 _set_match_stats | |
376 | |
377 Usage : n/a; called automatically by _set_data() | |
378 Purpose : Sets various matching statistics obtained from the HSP listing. | |
379 Argument : blast2: Identities = 23/74 (31%), Positives = 29/74 (39%), Gaps = 17/74 (22%) | |
380 : blast2: Identities = 57/98 (58%), Positives = 74/98 (75%) | |
381 : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%) | |
382 : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%), Frame = -3 | |
383 : WU-blast: Identities = 310/553 (56%), Positives = 310/553 (56%), Strand = Minus / Plus | |
384 Throws : Exception if the stats cannot be parsed, probably due to a change | |
385 : in the Blast report format. | |
386 Comments : The "Gaps = " data in the HSP header has a different meaning depending | |
387 : on the type of Blast: for BLASTP, this number is the total number of | |
388 : gaps in query+sbjct; for TBLASTN, it is the number of gaps in the | |
389 : query sequence only. Thus, it is safer to collect the data | |
390 : separately by examining the actual sequence strings as is done | |
391 : in _set_seq(). | |
392 | |
393 See Also : L<_set_data()|_set_data>, L<_set_seq()|_set_seq> | |
394 | |
395 =cut | |
396 | |
397 #-------------------- | |
398 sub _set_match_stats { | |
399 #-------------------- | |
400 my ($self, $data) = @_; | |
401 | |
402 if($data =~ m!Identities = (\d+)/(\d+)!) { | |
403 # blast1 or 2 format | |
404 $self->{'_numIdentical'} = $1; | |
405 $self->{'_totalLength'} = $2; | |
406 } | |
407 | |
408 if($data =~ m!Positives = (\d+)/(\d+)!) { | |
409 # blast1 or 2 format | |
410 $self->{'_numConserved'} = $1; | |
411 $self->{'_totalLength'} = $2; | |
412 } | |
413 | |
414 if($data =~ m!Frame = ([\d+-]+)!) { | |
415 $self->{'_frame'} = $1; | |
416 } | |
417 | |
418 # Strand data is not always present in this line. | |
419 # _set_seq() will also set strand information. | |
420 if($data =~ m!Strand = (\w+) / (\w+)!) { | |
421 $self->{'_queryStrand'} = $1; | |
422 $self->{'_sbjctStrand'} = $2; | |
423 } | |
424 | |
425 # if($data =~ m!Gaps = (\d+)/(\d+)!) { | |
426 # $self->{'_totalGaps'} = $1; | |
427 # } else { | |
428 # $self->{'_totalGaps'} = 0; | |
429 # } | |
430 } | |
431 | |
432 | |
433 | |
434 =head2 _set_seq_data | |
435 | |
436 Usage : n/a; called automatically when sequence data is requested. | |
437 Purpose : Sets the HSP sequence data for both query and sbjct sequences. | |
438 : Includes: start, stop, length, gaps, and raw sequence. | |
439 Argument : n/a | |
440 Throws : Propagates any exception thrown by _set_match_seq() | |
441 Comments : Uses raw data stored by _set_data() during object construction. | |
442 : These data are not always needed, so it is conditionally | |
443 : executed only upon demand by methods such as gaps(), _set_residues(), | |
444 : etc. _set_seq() does the dirty work. | |
445 | |
446 See Also : L<_set_seq()|_set_seq> | |
447 | |
448 =cut | |
449 | |
450 sub _set_seq_data { | |
451 my $self = shift; | |
452 | |
453 $self->_set_seq('query', @{$self->{'_queryList'}}); | |
454 $self->_set_seq('sbjct', @{$self->{'_sbjctList'}}); | |
455 | |
456 # Liberate some memory. | |
457 @{$self->{'_queryList'}} = @{$self->{'_sbjctList'}} = (); | |
458 undef $self->{'_queryList'}; | |
459 undef $self->{'_sbjctList'}; | |
460 | |
461 $self->{'_set_seq_data'} = 1; | |
462 } | |
463 | |
464 | |
465 | |
466 =head2 _set_seq | |
467 | |
468 Usage : n/a; called automatically by _set_seq_data() | |
469 : $hsp_obj->($seq_type, @data); | |
470 Purpose : Sets sequence information for both the query and sbjct sequences. | |
471 : Directly counts the number of gaps in each sequence (if gapped Blast). | |
472 Argument : $seq_type = 'query' or 'sbjct' | |
473 : @data = all seq lines with the form: | |
474 : Query: 61 SPHNVKDRKEQNGSINNAISPTATANTSGSQQINIDSALRDRSSNVAAQPSLSDASSGSN 120 | |
475 Throws : Exception if data strings cannot be parsed, probably due to a change | |
476 : in the Blast report format. | |
477 Comments : Uses first argument to determine which data members to set | |
478 : making this method sensitive data member name changes. | |
479 : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc). | |
480 Warning : Sequence endpoints are normalized so that start < end. This affects HSPs | |
481 : for TBLASTN/X hits on the minus strand. Normalization facilitates use | |
482 : of range information by methods such as match(). | |
483 | |
484 See Also : L<_set_seq_data()|_set_seq_data>, L<matches()|matches>, L<range()|range>, L<start()|start>, L<end()|end> | |
485 | |
486 =cut | |
487 | |
488 #------------- | |
489 sub _set_seq { | |
490 #------------- | |
491 my $self = shift; | |
492 my $seqType = shift; | |
493 my @data = @_; | |
494 my @ranges = (); | |
495 my @sequence = (); | |
496 my $numGaps = 0; | |
497 | |
498 foreach( @data ) { | |
499 if( m/(\d+) *(\D+) *(\d+)/) { | |
500 push @ranges, ( $1, $3 ) ; | |
501 push @sequence, $2; | |
502 } else { | |
503 $self->warn("Bad sequence data: $_"); | |
504 } | |
505 } | |
506 | |
507 (scalar(@sequence) and scalar(@ranges)) || $self->throw("Can't set sequence: missing data", | |
508 "Possibly unrecognized Blast format."); | |
509 | |
510 # Sensitive to member name changes. | |
511 $seqType = "_\L$seqType\E"; | |
512 $self->{$seqType.'Start'} = $ranges[0]; | |
513 $self->{$seqType.'Stop'} = $ranges[ $#ranges ]; | |
514 $self->{$seqType.'Seq'} = \@sequence; | |
515 | |
516 $self->{$seqType.'Length'} = abs($ranges[ $#ranges ] - $ranges[0]) + 1; | |
517 | |
518 # Adjust lengths for BLASTX, TBLASTN, TBLASTX sequences | |
519 # Converting nucl coords to amino acid coords. | |
520 | |
521 my $prog = $self->{'_prog'}; | |
522 if($prog eq 'TBLASTN' and $seqType eq '_sbjct') { | |
523 $self->{$seqType.'Length'} /= 3; | |
524 } elsif($prog eq 'BLASTX' and $seqType eq '_query') { | |
525 $self->{$seqType.'Length'} /= 3; | |
526 } elsif($prog eq 'TBLASTX') { | |
527 $self->{$seqType.'Length'} /= 3; | |
528 } | |
529 | |
530 $self->{$seqType.'Strand'} = 'Plus' if $prog =~ /BLAST[NX]/; | |
531 | |
532 # Normalize sequence endpoints so that start < end. | |
533 # Reverse complement or 'minus strand' HSPs get flipped here. | |
534 if($self->{$seqType.'Start'} > $self->{$seqType.'Stop'}) { | |
535 ($self->{$seqType.'Start'}, $self->{$seqType.'Stop'}) = | |
536 ($self->{$seqType.'Stop'}, $self->{$seqType.'Start'}); | |
537 $self->{$seqType.'Strand'} = 'Minus'; | |
538 } | |
539 | |
540 ## Count number of gaps in each seq. Only need to do this for gapped Blasts. | |
541 # if($self->{'_gapped'}) { | |
542 my $seqstr = join('', @sequence); | |
543 $seqstr =~ s/\s//g; | |
544 my $num_gaps = CORE::length($seqstr) - $self->{$seqType.'Length'}; | |
545 $self->{$seqType.'Gaps'} = $num_gaps if $num_gaps > 0; | |
546 # } | |
547 } | |
548 | |
549 | |
550 =head2 _set_residues | |
551 | |
552 Usage : n/a; called automatically when residue data is requested. | |
553 Purpose : Sets the residue numbers representing the identical and | |
554 : conserved positions. These data are obtained by analyzing the | |
555 : symbols between query and sbjct lines of the alignments. | |
556 Argument : n/a | |
557 Throws : Propagates any exception thrown by _set_seq_data() and _set_match_seq(). | |
558 Comments : These data are not always needed, so it is conditionally | |
559 : executed only upon demand by methods such as seq_inds(). | |
560 : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc). | |
561 | |
562 See Also : L<_set_seq_data()|_set_seq_data>, L<_set_match_seq()|_set_match_seq>, L<seq_inds()|seq_inds> | |
563 | |
564 =cut | |
565 | |
566 #------------------ | |
567 sub _set_residues { | |
568 #------------------ | |
569 my $self = shift; | |
570 my @sequence = (); | |
571 | |
572 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
573 | |
574 # Using hashes to avoid saving duplicate residue numbers. | |
575 my %identicalList_query = (); | |
576 my %identicalList_sbjct = (); | |
577 my %conservedList_query = (); | |
578 my %conservedList_sbjct = (); | |
579 | |
580 my $aref = $self->_set_match_seq() if not ref $self->{'_matchSeq'}; | |
581 $aref ||= $self->{'_matchSeq'}; | |
582 my $seqString = join('', @$aref ); | |
583 | |
584 my $qseq = join('',@{$self->{'_querySeq'}}); | |
585 my $sseq = join('',@{$self->{'_sbjctSeq'}}); | |
586 my $resCount_query = $self->{'_queryStop'} || 0; | |
587 my $resCount_sbjct = $self->{'_sbjctStop'} || 0; | |
588 | |
589 my $prog = $self->{'_prog'}; | |
590 if($prog !~ /^BLASTP|^BLASTN/) { | |
591 if($prog eq 'TBLASTN') { | |
592 $resCount_sbjct /= 3; | |
593 } elsif($prog eq 'BLASTX') { | |
594 $resCount_query /= 3; | |
595 } elsif($prog eq 'TBLASTX') { | |
596 $resCount_query /= 3; | |
597 $resCount_sbjct /= 3; | |
598 } | |
599 } | |
600 | |
601 my ($mchar, $schar, $qchar); | |
602 while( $mchar = chop($seqString) ) { | |
603 ($qchar, $schar) = (chop($qseq), chop($sseq)); | |
604 if( $mchar eq '+' ) { | |
605 $conservedList_query{ $resCount_query } = 1; | |
606 $conservedList_sbjct{ $resCount_sbjct } = 1; | |
607 } elsif( $mchar ne ' ' ) { | |
608 $identicalList_query{ $resCount_query } = 1; | |
609 $identicalList_sbjct{ $resCount_sbjct } = 1; | |
610 } | |
611 $resCount_query-- if $qchar ne $GAP_SYMBOL; | |
612 $resCount_sbjct-- if $schar ne $GAP_SYMBOL; | |
613 } | |
614 $self->{'_identicalRes_query'} = \%identicalList_query; | |
615 $self->{'_conservedRes_query'} = \%conservedList_query; | |
616 $self->{'_identicalRes_sbjct'} = \%identicalList_sbjct; | |
617 $self->{'_conservedRes_sbjct'} = \%conservedList_sbjct; | |
618 | |
619 } | |
620 | |
621 | |
622 | |
623 | |
624 =head2 _set_match_seq | |
625 | |
626 Usage : n/a. Internal method. | |
627 : $hsp_obj->_set_match_seq() | |
628 Purpose : Set the 'match' sequence for the current HSP (symbols in between | |
629 : the query and sbjct lines.) | |
630 Returns : Array reference holding the match sequences lines. | |
631 Argument : n/a | |
632 Throws : Exception if the _matchList field is not set. | |
633 Comments : The match information is not always necessary. This method | |
634 : allows it to be conditionally prepared. | |
635 : Called by _set_residues>() and seq_str(). | |
636 | |
637 See Also : L<_set_residues()|_set_residues>, L<seq_str()|seq_str> | |
638 | |
639 =cut | |
640 | |
641 #------------------- | |
642 sub _set_match_seq { | |
643 #------------------- | |
644 my $self = shift; | |
645 | |
646 ## DEBUGGING CODE: | |
647 # if($self->parent->name eq '1AK5_' and $self->parent->parent->name eq 'YAR073W') { | |
648 # print "\n_set_match_seq() called for HSP ", $self->name, " of hit ${\$self->parent->name} in query ${\$self->parent->parent->name}"; <STDIN>; | |
649 # } | |
650 | |
651 ref($self->{'_matchList'}) || $self->throw("Can't set HSP match sequence: No data"); | |
652 | |
653 my @data = @{$self->{'_matchList'}}; | |
654 | |
655 my(@sequence); | |
656 foreach( @data ) { | |
657 chomp($_); | |
658 ## Remove leading spaces; (note: aln may begin with a space | |
659 ## which is why we can't use s/^ +//). | |
660 s/^ {$self->{'_match_indent'}}//; | |
661 push @sequence, $_; | |
662 } | |
663 # Liberate some memory. | |
664 @{$self->{'_matchList'}} = undef; | |
665 $self->{'_matchList'} = undef; | |
666 | |
667 $self->{'_matchSeq'} = \@sequence; | |
668 | |
669 ## DEBUGGING CODE: | |
670 # if($self->parent->name eq '1AK5_' and $self->parent->parent->name eq 'YAR073W') { | |
671 # print "RETURNING: $self->{'_matchSeq'}:\n @{$self->{'_matchSeq'}}";<STDIN>; | |
672 # } | |
673 | |
674 $self->{'_matchSeq'}; | |
675 } | |
676 | |
677 | |
678 | |
679 =head2 score | |
680 | |
681 Usage : $hsp_obj->score() | |
682 Purpose : Get the Blast score for the HSP. | |
683 Returns : Integer | |
684 Argument : n/a | |
685 Throws : n/a | |
686 | |
687 See Also : L<bits()|bits> | |
688 | |
689 =cut | |
690 | |
691 #--------- | |
692 sub score { my $self = shift; $self->{'_score'}; } | |
693 #--------- | |
694 | |
695 | |
696 | |
697 =head2 bits | |
698 | |
699 Usage : $hsp_obj->bits() | |
700 Purpose : Get the Blast score in bits for the HSP. | |
701 Returns : Float | |
702 Argument : n/a | |
703 Throws : n/a | |
704 | |
705 | |
706 See Also : L<score()|score> | |
707 | |
708 =cut | |
709 | |
710 #-------- | |
711 sub bits { my $self = shift; $self->{'_bits'}; } | |
712 #-------- | |
713 | |
714 | |
715 | |
716 =head2 n | |
717 | |
718 Usage : $hsp_obj->n() | |
719 Purpose : Get the N value (num HSPs on which P/Expect is based). | |
720 : This value is not defined with NCBI Blast2 with gapping. | |
721 Returns : Integer or null string if not defined. | |
722 Argument : n/a | |
723 Throws : n/a | |
724 Comments : The 'N' value is listed in parenthesis with P/Expect value: | |
725 : e.g., P(3) = 1.2e-30 ---> (N = 3). | |
726 : Not defined in NCBI Blast2 with gaps. | |
727 : This typically is equal to the number of HSPs but not always. | |
728 : To obtain the number of HSPs, use Bio::Tools::Blast::Sbjct::num_hsps(). | |
729 | |
730 See Also : L<score()|score> | |
731 | |
732 =cut | |
733 | |
734 #----- | |
735 sub n { my $self = shift; $self->{'_n'} || ''; } | |
736 #----- | |
737 | |
738 | |
739 | |
740 =head2 frame | |
741 | |
742 Usage : $hsp_obj->frame() | |
743 Purpose : Get the reading frame number (-/+ 1, 2, 3) (TBLASTN/X only). | |
744 Returns : Integer or null string if not defined. | |
745 Argument : n/a | |
746 Throws : n/a | |
747 | |
748 =cut | |
749 | |
750 #--------- | |
751 sub frame { my $self = shift; $self->{'_frame'} || ''; } | |
752 #--------- | |
753 | |
754 | |
755 | |
756 =head2 signif() | |
757 | |
758 Usage : $hsp_obj->signif() | |
759 Purpose : Get the P-value or Expect value for the HSP. | |
760 Returns : Float (0.001 or 1.3e-43) | |
761 : Returns P-value if it is defined, otherwise, Expect value. | |
762 Argument : n/a | |
763 Throws : n/a | |
764 Comments : Provided for consistency with Sbjct::signif() | |
765 : Support for returning the significance data in different | |
766 : formats (e.g., exponent only), is not provided for HSP objects. | |
767 : This is only available for the Sbjct or Blast object. | |
768 | |
769 See Also : L<p()|p>, L<expect()|expect>, B<Bio::Tools::Blast::Sbjct::signif()> | |
770 | |
771 =cut | |
772 | |
773 #----------- | |
774 sub signif { | |
775 #----------- | |
776 my $self = shift; | |
777 my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'}; | |
778 $val; | |
779 } | |
780 | |
781 | |
782 | |
783 =head2 expect | |
784 | |
785 Usage : $hsp_obj->expect() | |
786 Purpose : Get the Expect value for the HSP. | |
787 Returns : Float (0.001 or 1.3e-43) | |
788 Argument : n/a | |
789 Throws : n/a | |
790 Comments : Support for returning the expectation data in different | |
791 : formats (e.g., exponent only), is not provided for HSP objects. | |
792 : This is only available for the Sbjct or Blast object. | |
793 | |
794 See Also : L<p()|p> | |
795 | |
796 =cut | |
797 | |
798 #---------- | |
799 sub expect { my $self = shift; $self->{'_expect'}; } | |
800 #---------- | |
801 | |
802 | |
803 | |
804 =head2 p | |
805 | |
806 Usage : $hsp_obj->p() | |
807 Purpose : Get the P-value for the HSP. | |
808 Returns : Float (0.001 or 1.3e-43) or undef if not defined. | |
809 Argument : n/a | |
810 Throws : n/a | |
811 Comments : P-value is not defined with NCBI Blast2 reports. | |
812 : Support for returning the expectation data in different | |
813 : formats (e.g., exponent only) is not provided for HSP objects. | |
814 : This is only available for the Sbjct or Blast object. | |
815 | |
816 See Also : L<expect()|expect> | |
817 | |
818 =cut | |
819 | |
820 #----- | |
821 sub p { my $self = shift; $self->{'_p'}; } | |
822 #----- | |
823 | |
824 | |
825 =head2 length | |
826 | |
827 Usage : $hsp->length( [seq_type] ) | |
828 Purpose : Get the length of the aligned portion of the query or sbjct. | |
829 Example : $hsp->length('query') | |
830 Returns : integer | |
831 Argument : seq_type: 'query' | 'sbjct' | 'total' (default = 'total') | |
832 Throws : n/a | |
833 Comments : 'total' length is the full length of the alignment | |
834 : as reported in the denominators in the alignment section: | |
835 : "Identical = 34/120 Positives = 67/120". | |
836 : Developer note: when using the built-in length function within | |
837 : this module, call it as CORE::length(). | |
838 | |
839 See Also : L<gaps()|gaps> | |
840 | |
841 =cut | |
842 | |
843 #----------- | |
844 sub length { | |
845 #----------- | |
846 my( $self, $type ) = @_; | |
847 $type ||= 'total'; | |
848 | |
849 $type ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
850 | |
851 ## Sensitive to member name format. | |
852 $type = "_\L$type\E"; | |
853 $self->{$type.'Length'}; | |
854 } | |
855 | |
856 | |
857 | |
858 =head2 gaps | |
859 | |
860 Usage : $hsp->gaps( [seq_type] ) | |
861 Purpose : Get the number of gaps in the query, sbjct, or total alignment. | |
862 : Also can return query gaps and sbjct gaps as a two-element list | |
863 : when in array context. | |
864 Example : $total_gaps = $hsp->gaps(); | |
865 : ($qgaps, $sgaps) = $hsp->gaps(); | |
866 : $qgaps = $hsp->gaps('query'); | |
867 Returns : scalar context: integer | |
868 : array context without args: (int, int) = ('queryGaps', 'sbjctGaps') | |
869 Argument : seq_type: 'query' | 'sbjct' | 'total' | |
870 : (default = 'total', scalar context) | |
871 : Array context can be "induced" by providing an argument of 'list' or 'array'. | |
872 Throws : n/a | |
873 | |
874 See Also : L<length()|length>, L<matches()|matches> | |
875 | |
876 =cut | |
877 | |
878 #--------- | |
879 sub gaps { | |
880 #--------- | |
881 my( $self, $seqType ) = @_; | |
882 | |
883 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
884 | |
885 $seqType ||= (wantarray ? 'list' : 'total'); | |
886 | |
887 if($seqType =~ /list|array/i) { | |
888 return (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0)); | |
889 } | |
890 | |
891 if($seqType eq 'total') { | |
892 return ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0; | |
893 } else { | |
894 ## Sensitive to member name format. | |
895 $seqType = "_\L$seqType\E"; | |
896 return $self->{$seqType.'Gaps'} || 0; | |
897 } | |
898 } | |
899 | |
900 | |
901 | |
902 =head2 matches | |
903 | |
904 Usage : $hsp->matches([seq_type], [start], [stop]); | |
905 Purpose : Get the total number of identical and conservative matches | |
906 : in the query or sbjct sequence for the given HSP. Optionally can | |
907 : report data within a defined interval along the seq. | |
908 : (Note: 'conservative' matches are called 'positives' in the | |
909 : Blast report.) | |
910 Example : ($id,$cons) = $hsp_object->matches('sbjct'); | |
911 : ($id,$cons) = $hsp_object->matches('query',300,400); | |
912 Returns : 2-element array of integers | |
913 Argument : (1) seq_type = 'query' | 'sbjct' (default = query) | |
914 : (2) start = Starting coordinate (optional) | |
915 : (3) stop = Ending coordinate (optional) | |
916 Throws : Exception if the supplied coordinates are out of range. | |
917 Comments : Relies on seq_str('match') to get the string of alignment symbols | |
918 : between the query and sbjct lines which are used for determining | |
919 : the number of identical and conservative matches. | |
920 | |
921 See Also : L<length()|length>, L<gaps()|gaps>, L<seq_str()|seq_str>, B<Bio::Tools::Blast::Sbjct::_adjust_contigs()> | |
922 | |
923 =cut | |
924 | |
925 #----------- | |
926 sub matches { | |
927 #----------- | |
928 my( $self, %param ) = @_; | |
929 my(@data); | |
930 my($seqType, $beg, $end) = ($param{-SEQ}, $param{-START}, $param{-STOP}); | |
931 $seqType ||= 'query'; | |
932 | |
933 if(!defined $beg && !defined $end) { | |
934 ## Get data for the whole alignment. | |
935 push @data, ($self->{'_numIdentical'}, $self->{'_numConserved'}); | |
936 } else { | |
937 ## Get the substring representing the desired sub-section of aln. | |
938 $beg ||= 0; | |
939 $end ||= 0; | |
940 my($start,$stop) = $self->range($seqType); | |
941 if($beg == 0) { $beg = $start; $end = $beg+$end; } | |
942 elsif($end == 0) { $end = $stop; $beg = $end-$beg; } | |
943 | |
944 if($end >= $stop) { $end = $stop; } ##ML changed from if (end >stop) | |
945 else { $end += 1;} ##ML moved from commented position below, makes | |
946 ##more sense here | |
947 # if($end > $stop) { $end = $stop; } | |
948 if($beg < $start) { $beg = $start; } | |
949 # else { $end += 1;} | |
950 | |
951 # my $seq = substr($self->seq_str('match'), $beg-$start, ($end-$beg)); | |
952 | |
953 ## ML: START fix for substr out of range error ------------------ | |
954 my $seq = ""; | |
955 if (($self->{'_prog'} eq 'TBLASTN') and ($seqType eq 'sbjct')) | |
956 { | |
957 $seq = substr($self->seq_str('match'), | |
958 int(($beg-$start)/3), int(($end-$beg+1)/3)); | |
959 | |
960 } elsif (($self->{'_prog'} eq 'BLASTX') and ($seqType eq 'query')) | |
961 { | |
962 $seq = substr($self->seq_str('match'), | |
963 int(($beg-$start)/3), int(($end-$beg+1)/3)); | |
964 } else { | |
965 $seq = substr($self->seq_str('match'), | |
966 $beg-$start, ($end-$beg)); | |
967 } | |
968 ## ML: End of fix for substr out of range error ----------------- | |
969 | |
970 | |
971 ## ML: debugging code | |
972 ## This is where we get our exception. Try printing out the values going | |
973 ## into this: | |
974 ## | |
975 # print STDERR | |
976 # qq(*------------MY EXCEPTION --------------------\nSeq: ") , | |
977 # $self->seq_str("$seqType"), qq("\n),$self->name,",( index:"; | |
978 # print STDERR $beg-$start, ", len: ", $end-$beg," ), (HSPRealLen:", | |
979 # CORE::length $self->seq_str("$seqType"); | |
980 # print STDERR ", HSPCalcLen: ", $stop - $start +1 ," ), | |
981 # ( beg: $beg, end: $end ), ( start: $start, stop: stop )\n"; | |
982 ## ML: END DEBUGGING CODE---------- | |
983 | |
984 if(!CORE::length $seq) { | |
985 $self->throw("Undefined sub-sequence ($beg,$end). Valid range = $start - $stop"); | |
986 } | |
987 ## Get data for a substring. | |
988 # printf "Collecting HSP subsection data: beg,end = %d,%d; start,stop = %d,%d\n%s<---\n", $beg, $end, $start, $stop, $seq; | |
989 # printf "Original match seq:\n%s\n",$self->seq_str('match'); | |
990 $seq =~ s/ //g; # remove space (no info). | |
991 my $len_cons = CORE::length $seq; | |
992 $seq =~ s/\+//g; # remove '+' characters (conservative substitutions) | |
993 my $len_id = CORE::length $seq; | |
994 push @data, ($len_id, $len_cons); | |
995 # printf " HSP = %s\n id = %d; cons = %d\n", $self->name, $len_id, $len_cons; <STDIN>; | |
996 } | |
997 @data; | |
998 } | |
999 | |
1000 | |
1001 | |
1002 =head2 frac_identical | |
1003 | |
1004 Usage : $hsp_object->frac_identical( [seq_type] ); | |
1005 Purpose : Get the fraction of identical positions within the given HSP. | |
1006 Example : $frac_iden = $hsp_object->frac_identical('query'); | |
1007 Returns : Float (2-decimal precision, e.g., 0.75). | |
1008 Argument : seq_type: 'query' | 'sbjct' | 'total' | |
1009 : default = 'total' (but see comments below). | |
1010 Throws : n/a | |
1011 Comments : Different versions of Blast report different values for the total | |
1012 : length of the alignment. This is the number reported in the | |
1013 : denominators in the stats section: | |
1014 : "Identical = 34/120 Positives = 67/120". | |
1015 : BLAST-GP uses the total length of the alignment (with gaps) | |
1016 : WU-BLAST uses the length of the query sequence (without gaps). | |
1017 : Therefore, when called without an argument or an argument of 'total', | |
1018 : this method will report different values depending on the | |
1019 : version of BLAST used. | |
1020 : | |
1021 : To get the fraction identical among only the aligned residues, | |
1022 : ignoring the gaps, call this method with an argument of 'query' | |
1023 : or 'sbjct'. | |
1024 | |
1025 See Also : L<frac_conserved()|frac_conserved>, L<num_identical()|num_identical>, L<matches()|matches> | |
1026 | |
1027 =cut | |
1028 | |
1029 #------------------- | |
1030 sub frac_identical { | |
1031 #------------------- | |
1032 # The value is calculated as opposed to storing it from the parsed results. | |
1033 # This saves storage and also permits flexibility in determining for which | |
1034 # sequence (query or sbjct) the figure is to be calculated. | |
1035 | |
1036 my( $self, $seqType ) = @_; | |
1037 $seqType ||= 'total'; | |
1038 | |
1039 if($seqType ne 'total') { | |
1040 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
1041 } | |
1042 ## Sensitive to member name format. | |
1043 $seqType = "_\L$seqType\E"; | |
1044 | |
1045 sprintf( "%.2f", $self->{'_numIdentical'}/$self->{$seqType.'Length'}); | |
1046 } | |
1047 | |
1048 | |
1049 =head2 frac_conserved | |
1050 | |
1051 Usage : $hsp_object->frac_conserved( [seq_type] ); | |
1052 Purpose : Get the fraction of conserved positions within the given HSP. | |
1053 : (Note: 'conservative' positions are called 'positives' in the | |
1054 : Blast report.) | |
1055 Example : $frac_cons = $hsp_object->frac_conserved('query'); | |
1056 Returns : Float (2-decimal precision, e.g., 0.75). | |
1057 Argument : seq_type: 'query' | 'sbjct' | |
1058 : default = 'total' (but see comments below). | |
1059 Throws : n/a | |
1060 Comments : Different versions of Blast report different values for the total | |
1061 : length of the alignment. This is the number reported in the | |
1062 : denominators in the stats section: | |
1063 : "Identical = 34/120 Positives = 67/120". | |
1064 : BLAST-GP uses the total length of the alignment (with gaps) | |
1065 : WU-BLAST uses the length of the query sequence (without gaps). | |
1066 : Therefore, when called without an argument or an argument of 'total', | |
1067 : this method will report different values depending on the | |
1068 : version of BLAST used. | |
1069 : | |
1070 : To get the fraction conserved among only the aligned residues, | |
1071 : ignoring the gaps, call this method with an argument of 'query' | |
1072 : or 'sbjct'. | |
1073 | |
1074 See Also : L<frac_conserved()|frac_conserved>, L<num_conserved()|num_conserved>, L<matches()|matches> | |
1075 | |
1076 =cut | |
1077 | |
1078 #-------------------- | |
1079 sub frac_conserved { | |
1080 #-------------------- | |
1081 # The value is calculated as opposed to storing it from the parsed results. | |
1082 # This saves storage and also permits flexibility in determining for which | |
1083 # sequence (query or sbjct) the figure is to be calculated. | |
1084 | |
1085 my( $self, $seqType ) = @_; | |
1086 $seqType ||= 'total'; | |
1087 | |
1088 if($seqType ne 'total') { | |
1089 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
1090 } | |
1091 | |
1092 ## Sensitive to member name format. | |
1093 $seqType = "_\L$seqType\E"; | |
1094 | |
1095 sprintf( "%.2f", $self->{'_numConserved'}/$self->{$seqType.'Length'}); | |
1096 } | |
1097 | |
1098 | |
1099 =head2 num_identical | |
1100 | |
1101 Usage : $hsp_object->num_identical(); | |
1102 Purpose : Get the number of identical positions within the given HSP. | |
1103 Example : $num_iden = $hsp_object->num_identical(); | |
1104 Returns : integer | |
1105 Argument : n/a | |
1106 Throws : n/a | |
1107 | |
1108 See Also : L<num_conserved()|num_conserved>, L<frac_identical()|frac_identical> | |
1109 | |
1110 =cut | |
1111 | |
1112 #------------------- | |
1113 sub num_identical { | |
1114 #------------------- | |
1115 my( $self) = shift; | |
1116 | |
1117 $self->{'_numIdentical'}; | |
1118 } | |
1119 | |
1120 | |
1121 =head2 num_conserved | |
1122 | |
1123 Usage : $hsp_object->num_conserved(); | |
1124 Purpose : Get the number of conserved positions within the given HSP. | |
1125 Example : $num_iden = $hsp_object->num_conserved(); | |
1126 Returns : integer | |
1127 Argument : n/a | |
1128 Throws : n/a | |
1129 | |
1130 See Also : L<num_identical()|num_identical>, L<frac_conserved()|frac_conserved> | |
1131 | |
1132 =cut | |
1133 | |
1134 #------------------- | |
1135 sub num_conserved { | |
1136 #------------------- | |
1137 my( $self) = shift; | |
1138 | |
1139 $self->{'_numConserved'}; | |
1140 } | |
1141 | |
1142 | |
1143 | |
1144 =head2 range | |
1145 | |
1146 Usage : $hsp->range( [seq_type] ); | |
1147 Purpose : Gets the (start, end) coordinates for the query or sbjct sequence | |
1148 : in the HSP alignment. | |
1149 Example : ($qbeg, $qend) = $hsp->range('query'); | |
1150 : ($sbeg, $send) = $hsp->range('sbjct'); | |
1151 Returns : Two-element array of integers | |
1152 Argument : seq_type = string, 'query' or 'sbjct' (default = 'query') | |
1153 : (case insensitive). | |
1154 Throws : n/a | |
1155 | |
1156 See Also : L<start()|start>, L<end()|end> | |
1157 | |
1158 =cut | |
1159 | |
1160 #---------- | |
1161 sub range { | |
1162 #---------- | |
1163 my ($self, $seqType) = @_; | |
1164 | |
1165 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
1166 | |
1167 $seqType ||= 'query'; | |
1168 ## Sensitive to member name changes. | |
1169 $seqType = "_\L$seqType\E"; | |
1170 | |
1171 return ($self->{$seqType.'Start'},$self->{$seqType.'Stop'}); | |
1172 } | |
1173 | |
1174 =head2 start | |
1175 | |
1176 Usage : $hsp->start( [seq_type] ); | |
1177 Purpose : Gets the start coordinate for the query, sbjct, or both sequences | |
1178 : in the HSP alignment. | |
1179 Example : $qbeg = $hsp->start('query'); | |
1180 : $sbeg = $hsp->start('sbjct'); | |
1181 : ($qbeg, $sbeg) = $hsp->start(); | |
1182 Returns : scalar context: integer | |
1183 : array context without args: list of two integers | |
1184 Argument : In scalar context: seq_type = 'query' or 'sbjct' | |
1185 : (case insensitive). If not supplied, 'query' is used. | |
1186 : Array context can be "induced" by providing an argument of 'list' or 'array'. | |
1187 Throws : n/a | |
1188 | |
1189 See Also : L<end()|end>, L<range()|range> | |
1190 | |
1191 =cut | |
1192 | |
1193 #---------- | |
1194 sub start { | |
1195 #---------- | |
1196 my ($self, $seqType) = @_; | |
1197 | |
1198 $seqType ||= (wantarray ? 'list' : 'query'); | |
1199 | |
1200 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
1201 | |
1202 if($seqType =~ /list|array/i) { | |
1203 return ($self->{'_queryStart'}, $self->{'_sbjctStart'}); | |
1204 } else { | |
1205 ## Sensitive to member name changes. | |
1206 $seqType = "_\L$seqType\E"; | |
1207 return $self->{$seqType.'Start'}; | |
1208 } | |
1209 } | |
1210 | |
1211 =head2 end | |
1212 | |
1213 Usage : $hsp->end( [seq_type] ); | |
1214 Purpose : Gets the end coordinate for the query, sbjct, or both sequences | |
1215 : in the HSP alignment. | |
1216 Example : $qbeg = $hsp->end('query'); | |
1217 : $sbeg = $hsp->end('sbjct'); | |
1218 : ($qbeg, $sbeg) = $hsp->end(); | |
1219 Returns : scalar context: integer | |
1220 : array context without args: list of two integers | |
1221 Argument : In scalar context: seq_type = 'query' or 'sbjct' | |
1222 : (case insensitive). If not supplied, 'query' is used. | |
1223 : Array context can be "induced" by providing an argument of 'list' or 'array'. | |
1224 Throws : n/a | |
1225 | |
1226 See Also : L<start()|start>, L<range()|range> | |
1227 | |
1228 =cut | |
1229 | |
1230 #---------- | |
1231 sub end { | |
1232 #---------- | |
1233 my ($self, $seqType) = @_; | |
1234 | |
1235 $seqType ||= (wantarray ? 'list' : 'query'); | |
1236 | |
1237 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
1238 | |
1239 if($seqType =~ /list|array/i) { | |
1240 return ($self->{'_queryStop'}, $self->{'_sbjctStop'}); | |
1241 } else { | |
1242 ## Sensitive to member name changes. | |
1243 $seqType = "_\L$seqType\E"; | |
1244 return $self->{$seqType.'Stop'}; | |
1245 } | |
1246 } | |
1247 | |
1248 | |
1249 | |
1250 =head2 strand | |
1251 | |
1252 Usage : $hsp_object->strand( [seq_type] ) | |
1253 Purpose : Get the strand of the query or sbjct sequence. | |
1254 Example : print $hsp->strand('query'); | |
1255 : ($qstrand, $sstrand) = $hsp->strand(); | |
1256 Returns : -1, 0, or 1 | |
1257 : -1 = Minus strand, +1 = Plus strand | |
1258 : Returns 0 if strand is not defined, which occurs | |
1259 : for non-TBLASTN/X reports. | |
1260 : In scalar context without arguments, returns queryStrand value. | |
1261 : In array context without arguments, returns a two-element list | |
1262 : of strings (queryStrand, sbjctStrand). | |
1263 : Array context can be "induced" by providing an argument of 'list' or 'array'. | |
1264 Argument : seq_type: 'query' | 'sbjct' or undef | |
1265 Throws : n/a | |
1266 | |
1267 See Also : L<_set_seq()|_set_seq>, L<_set_match_stats()|_set_match_stats> | |
1268 | |
1269 =cut | |
1270 | |
1271 #----------- | |
1272 sub strand { | |
1273 #----------- | |
1274 my( $self, $seqType ) = @_; | |
1275 $seqType ||= (wantarray ? 'list' : 'query'); | |
1276 | |
1277 return '' if $seqType eq 'query' and $self->{'_prog'} eq 'TBLASTN'; | |
1278 | |
1279 ## Sensitive to member name format. | |
1280 $seqType = "_\L$seqType\E"; | |
1281 | |
1282 # $seqType could be '_list'. | |
1283 $self->{'_queryStrand'} or $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
1284 | |
1285 if($seqType =~ /list|array/i) { | |
1286 return ('','') unless defined $self->{'_queryStrand'}; | |
1287 return ($self->{'_queryStrand'}, $self->{'_sbjctStrand'}); | |
1288 } | |
1289 local $^W = 0; | |
1290 $STRAND_SYMBOL{$self->{$seqType.'Strand'}} || 0; | |
1291 } | |
1292 | |
1293 | |
1294 ##################################################################################### | |
1295 ## INSTANCE METHODS ## | |
1296 ##################################################################################### | |
1297 | |
1298 | |
1299 =head2 seq | |
1300 | |
1301 Usage : $hsp->seq( [seq_type] ); | |
1302 Purpose : Get the query or sbjct sequence as a Bio::Seq.pm object. | |
1303 Example : $seqObj = $hsp->seq('query'); | |
1304 Returns : Object reference for a Bio::Seq.pm object. | |
1305 Argument : seq_type = 'query' or 'sbjct' (default = 'query'). | |
1306 Throws : Propagates any exception that occurs during construction | |
1307 : of the Bio::Seq.pm object. | |
1308 Comments : The sequence is returned in an array of strings corresponding | |
1309 : to the strings in the original format of the Blast alignment. | |
1310 : (i.e., same spacing). | |
1311 | |
1312 See Also : L<seq_str()|seq_str>, L<seq_inds()|seq_inds>, B<Bio::Seq.pm> | |
1313 | |
1314 =cut | |
1315 | |
1316 #------- | |
1317 sub seq { | |
1318 #------- | |
1319 my($self,$seqType) = @_; | |
1320 $seqType ||= 'query'; | |
1321 my $str = $self->seq_str($seqType); | |
1322 my $num = $self->name; | |
1323 my $name = $seqType =~ /query/i | |
1324 ? $self->parent->parent->name | |
1325 : $self->parent->name; | |
1326 | |
1327 require Bio::Seq; | |
1328 | |
1329 new Bio::Seq (-ID => $name, | |
1330 -SEQ => $str, | |
1331 -DESC => "Blast HSP #$num, $seqType sequence", | |
1332 ); | |
1333 } | |
1334 | |
1335 | |
1336 | |
1337 =head2 seq_str | |
1338 | |
1339 Usage : $hsp->seq_str( seq_type ); | |
1340 Purpose : Get the full query, sbjct, or 'match' sequence as a string. | |
1341 : The 'match' sequence is the string of symbols in between the | |
1342 : query and sbjct sequences. | |
1343 Example : $str = $hsp->seq_str('query'); | |
1344 Returns : String | |
1345 Argument : seq_Type = 'query' or 'sbjct' or 'match' | |
1346 Throws : Exception if the argument does not match an accepted seq_type. | |
1347 Comments : Calls _set_residues() to set the 'match' sequence if it has | |
1348 : not been set already. | |
1349 | |
1350 See Also : L<seq()|seq>, L<seq_inds()|seq_inds>, L<_set_match_seq()|_set_match_seq> | |
1351 | |
1352 =cut | |
1353 | |
1354 #------------ | |
1355 sub seq_str { | |
1356 #------------ | |
1357 my($self,$seqType) = @_; | |
1358 | |
1359 ## Sensitive to member name changes. | |
1360 $seqType = "_\L$seqType\E"; | |
1361 | |
1362 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
1363 | |
1364 if($seqType =~ /sbjct|query/) { | |
1365 my $seq = join('',@{$self->{$seqType.'Seq'}}); | |
1366 $seq =~ s/\s+//g; | |
1367 return $seq; | |
1368 | |
1369 } elsif( $seqType =~ /match/i) { | |
1370 # Only need to call _set_match_seq() if the match seq is requested. | |
1371 my $aref = $self->_set_match_seq() unless ref $self->{'_matchSeq'}; | |
1372 $aref = $self->{'_matchSeq'}; | |
1373 | |
1374 ## DEBUGGING CODE: | |
1375 # if($self->parent->name eq '1AK5_' and $self->parent->parent->name eq 'YAR073W') { | |
1376 # print "seq_str():\n @$aref";<STDIN>; | |
1377 # } | |
1378 | |
1379 return join('',@$aref); | |
1380 | |
1381 } else { | |
1382 $self->throw("Invalid or undefined sequence type: $seqType", | |
1383 "Valid types: query, sbjct, match"); | |
1384 } | |
1385 } | |
1386 | |
1387 | |
1388 | |
1389 | |
1390 =head2 seq_inds | |
1391 | |
1392 Usage : $hsp->seq_inds( seq_type, class, collapse ); | |
1393 Purpose : Get a list of residue positions (indices) for all identical | |
1394 : or conserved residues in the query or sbjct sequence. | |
1395 Example : @ind = $hsp->seq_inds('query', 'identical'); | |
1396 : @ind = $hsp->seq_inds('sbjct', 'conserved'); | |
1397 : @ind = $hsp->seq_inds('sbjct', 'conserved', 1); | |
1398 Returns : List of integers | |
1399 : May include ranges if collapse is true. | |
1400 Argument : seq_type = 'query' or 'sbjct' (default = query) | |
1401 : class = 'identical' or 'conserved' (default = identical) | |
1402 : (can be shortened to 'id' or 'cons') | |
1403 : (actually, anything not 'id' will evaluate to 'conserved'). | |
1404 : collapse = boolean, if true, consecutive positions are merged | |
1405 : using a range notation, e.g., "1 2 3 4 5 7 9 10 11" | |
1406 : collapses to "1-5 7 9-11". This is useful for | |
1407 : consolidating long lists. Default = no collapse. | |
1408 Throws : n/a. | |
1409 Comments : Calls _set_residues() to set the 'match' sequence if it has | |
1410 : not been set already. | |
1411 | |
1412 See Also : L<seq()|seq>, L<_set_residues()|_set_residues>, L<collapse_nums()|collapse_nums>, B<Bio::Tools::Blast::Sbjct::seq_inds()> | |
1413 | |
1414 =cut | |
1415 | |
1416 #--------------- | |
1417 sub seq_inds { | |
1418 #--------------- | |
1419 my ($self, $seq, $class, $collapse) = @_; | |
1420 | |
1421 $seq ||= 'query'; | |
1422 $class ||= 'identical'; | |
1423 $collapse ||= 0; | |
1424 | |
1425 $self->_set_residues() unless defined $self->{'_identicalRes_query'}; | |
1426 | |
1427 $seq = ($seq !~ /^q/i ? 'sbjct' : 'query'); | |
1428 $class = ($class !~ /^id/i ? 'conserved' : 'identical'); | |
1429 | |
1430 ## Sensitive to member name changes. | |
1431 $seq = "_\L$seq\E"; | |
1432 $class = "_\L$class\E"; | |
1433 | |
1434 my @ary = sort { $a <=> $b } keys %{ $self->{"${class}Res$seq"}}; | |
1435 | |
1436 return $collapse ? &collapse_nums(@ary) : @ary; | |
1437 } | |
1438 | |
1439 | |
1440 | |
1441 | |
1442 =head2 get_aln | |
1443 | |
1444 Usage : $hsp->get_aln() | |
1445 Purpose : Get a Bio::UnivAln.pm object constructed from the query + sbjct | |
1446 : sequences of the present HSP object. | |
1447 Example : $aln_obj = $hsp->get_aln(); | |
1448 Returns : Object reference for a Bio::UnivAln.pm object. | |
1449 Argument : n/a. | |
1450 Throws : Propagates any exception ocurring during the construction of | |
1451 : the Bio::UnivAln object. | |
1452 Comments : Requires Bio::UnivAln.pm. | |
1453 : The Bio::UnivAln.pm object is constructed from the query + sbjct | |
1454 : sequence objects obtained by calling seq(). | |
1455 : Gap residues are included (see $GAP_SYMBOL). It is important that | |
1456 : Bio::UnivAln.pm recognizes the gaps correctly. A strategy for doing | |
1457 : this is being considered. Currently it is hard-wired. | |
1458 | |
1459 See Also : L<seq()|seq>, B<Bio::UnivAln.pm> | |
1460 | |
1461 =cut | |
1462 | |
1463 #------------ | |
1464 sub get_aln { | |
1465 #------------ | |
1466 my $self = shift; | |
1467 | |
1468 require Bio::UnivAln; | |
1469 | |
1470 my $qseq = $self->seq('query'); | |
1471 my $sseq = $self->seq('sbjct'); | |
1472 | |
1473 my $desc = sprintf "HSP #%s of query %s vs. sbjct %s", | |
1474 $self->name, $self->parent->parent->name, $self->parent->name; | |
1475 | |
1476 my $type = $self->{'_prog'} =~ /P$|^T/ ? 'amino' : 'dna'; | |
1477 | |
1478 Bio::UnivAln->new( -seqs => [$qseq, $sseq], | |
1479 -desc => $desc, | |
1480 -type => $type, | |
1481 ); | |
1482 } | |
1483 | |
1484 | |
1485 =head2 display | |
1486 | |
1487 Usage : $sbjct_object->display( %named_parameters ); | |
1488 Purpose : Display information about Bio::Tools::Blast::Sbjct.pm data members | |
1489 : including: length, gaps, score, significance value, | |
1490 : sequences and sequence indices. | |
1491 Example : $object->display(-SHOW=>'stats'); | |
1492 Argument : Named parameters: (TAGS CAN BE UPPER OR LOWER CASE) | |
1493 : -SHOW => 'hsp', | |
1494 : -WHERE => filehandle (default = STDOUT) | |
1495 Returns : n/a | |
1496 Status : Experimental | |
1497 Comments : For more control over the display of sequence data, | |
1498 : use seq(), seq_str(), seq_inds(). | |
1499 | |
1500 See Also : L<_display_seq()|_display_seq>, L<seq()|seq>, L<seq_str()|seq_str>, L<seq_inds()|seq_inds>, L<_display_matches()|_display_matches>, B<Bio::Root::Object::display()> | |
1501 | |
1502 =cut | |
1503 | |
1504 #----------- | |
1505 sub display { | |
1506 #----------- | |
1507 my( $self, %param ) = @_; | |
1508 | |
1509 my $sbjctName = $self->parent->name(); | |
1510 my $queryName = $self->parent->parent->name(); | |
1511 my $layout = $self->parent->parent->_layout(); | |
1512 | |
1513 my $OUT = $self->set_display(%param); | |
1514 | |
1515 printf( $OUT "%-15s: %d\n", "LENGTH TOTAL", $self->length('total') ); | |
1516 printf( $OUT "%-15s: %d\n", "LENGTH QUERY", $self->length('query') ); | |
1517 printf( $OUT "%-15s: %d\n", "LENGTH SBJCT", $self->length('sbjct') ); | |
1518 printf( $OUT "%-15s: %d\n", "GAPS QUERY", $self->gaps('query') ); | |
1519 printf( $OUT "%-15s: %d\n", "GAPS SBJCT", $self->gaps('sbjct') ); | |
1520 printf( $OUT "%-15s: %d\n", "SCORE", $self->{'_score'} ); | |
1521 printf( $OUT "%-15s: %0.1f\n", "BITS", $self->{'_bits'} ); | |
1522 if($layout == 1) { | |
1523 printf( $OUT "%-15s: %.1e\n", "P-VAL", $self->{'_p'} ); | |
1524 printf( $OUT "%-15s: %.1e\n", "EXPECT", $self->{'_expect'} ); | |
1525 } else { | |
1526 printf( $OUT "%-15s: %.1e\n", "EXPECT", $self->{'_expect'} ); | |
1527 } | |
1528 | |
1529 my $queryLength = $self->length('query'); | |
1530 | |
1531 printf( $OUT "%-15s: %d (%0.0f%%)\n", "IDENTICAL", $self->{'_numIdentical'}, | |
1532 $self->{'_numIdentical'}/$queryLength * 100 ); | |
1533 printf( $OUT "%-15s: %d (%0.0f%%) %s \n", "CONSERVED", $self->{'_numConserved'}, | |
1534 $self->{'_numConserved'}/$queryLength * 100, | |
1535 "includes identical" ); | |
1536 | |
1537 $self->_display_seq('query', $queryName, $OUT); | |
1538 $self->_display_seq('sbjct', $sbjctName, $OUT); | |
1539 $self->_display_matches($queryName, $sbjctName, $OUT); | |
1540 } | |
1541 | |
1542 | |
1543 | |
1544 | |
1545 =head2 _display_seq | |
1546 | |
1547 Usage : n/a; called automatically by display() | |
1548 Purpose : Display information about query and sbjct HSP sequences. | |
1549 : Prints the start, stop coordinates and the actual sequence. | |
1550 Example : n/a | |
1551 Argument : | |
1552 Returns : printf call. | |
1553 Status : Experimental | |
1554 Comments : For more control, use seq(), seq_str(), or seq_inds(). | |
1555 | |
1556 See Also : L<display()|display>, L<seq()|seq>, L<seq_str()|seq_str>, L<seq_inds()|seq_inds>, L<_display_matches()|_display_matches> | |
1557 | |
1558 =cut | |
1559 | |
1560 #------------------ | |
1561 sub _display_seq { | |
1562 #------------------ | |
1563 my( $self, $seqType, $name, $OUT ) = @_; | |
1564 | |
1565 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
1566 | |
1567 # Sensitive to member name changes. | |
1568 my $mem = "_\L$seqType\E"; | |
1569 printf( $OUT "\n%10s: %s\n%10s %s\n", "\U$seqType\E", "$name", "-----", | |
1570 ('-'x ((CORE::length $name) + 2)) ); | |
1571 printf( $OUT "%13s: %d\n", "START", $self->{$mem.'Start'} ); | |
1572 printf( $OUT "%13s: %d\n", "STOP", $self->{$mem.'Stop'} ); | |
1573 printf( $OUT "%13s: \n", "SEQ" ); | |
1574 foreach( @{ $self->{$mem.'Seq'}} ) { | |
1575 printf( $OUT "%15s%s\n", "", $_); | |
1576 } | |
1577 } | |
1578 | |
1579 | |
1580 =head2 _display_matches | |
1581 | |
1582 Usage : n/a; called automatically by display() | |
1583 Purpose : Display information about identical and conserved positions | |
1584 : within both the query and sbjct sequences. | |
1585 Example : n/a | |
1586 Argument : | |
1587 Returns : printf call. | |
1588 Status : Experimental | |
1589 Comments : For more control, use seq_inds(). | |
1590 | |
1591 See Also : L<display()|display>, L<seq_inds()|seq_inds>, L<_display_seq()|_display_seq>, | |
1592 | |
1593 =cut | |
1594 | |
1595 #-------------------- | |
1596 sub _display_matches { | |
1597 #-------------------- | |
1598 my( $self, $queryName, $sbjctName, $OUT) = @_; | |
1599 my($resNum, $count); | |
1600 | |
1601 $self->_set_residues() unless defined $self->{'_identicalRes_query'}; | |
1602 | |
1603 printf( $OUT "\n%10s: \n%10s\n", "HITS", "-----" ); | |
1604 foreach( @{ $self->{'_matchSeq'}} ) { | |
1605 printf( $OUT "%15s%s\n", "", $_ ); | |
1606 } | |
1607 | |
1608 print $OUT "\n\U$queryName\E\n------------\n"; | |
1609 printf( $OUT "\n%5s%s:\n%5s%s\n\t", "", "IDENTICAL RESIDUES IN $queryName (n=$self->{'_numIdentical'})", | |
1610 "", "--------------------------------------------" ); | |
1611 $count = 0; | |
1612 foreach $resNum ( sort keys %{ $self->{'_identicalRes_query' }} ) { | |
1613 $count++; | |
1614 print $OUT "$resNum"; | |
1615 $count > 0 and print $OUT +( $count % 15 ? ", " : "\n\t"); | |
1616 } | |
1617 | |
1618 print $OUT "\n"; | |
1619 | |
1620 my $justConserved = ($self->{'_numConserved'})-($self->{'_numIdentical'}); | |
1621 printf( $OUT "\n%5s%s:\n%5s%s\n\t", "","CONSERVED RESIDUES IN $queryName (n=$justConserved)", | |
1622 "", "--------------------------------------------" ); | |
1623 $count = 0; | |
1624 foreach $resNum ( sort keys %{ $self->{'_conservedRes_query' }} ) { | |
1625 $count++; | |
1626 print $OUT "$resNum"; | |
1627 $count > 0 and print $OUT +( $count % 15 ? ", " : "\n\t"); | |
1628 } | |
1629 | |
1630 | |
1631 print $OUT "\n\n\U$sbjctName\E\n------------\n"; | |
1632 printf( $OUT "\n%5s%s:\n%5s%s\n\t", "", "IDENTICAL RESIDUES IN $sbjctName (n=$self->{'_numIdentical'})", | |
1633 "", "--------------------------------------------" ); | |
1634 $count = 0; | |
1635 foreach $resNum ( sort keys %{ $self->{'_identicalRes_sbjct' }} ) { | |
1636 $count++; | |
1637 print $OUT "$resNum"; | |
1638 $count > 0 and print $OUT +( $count % 15 ? ", " : "\n\t"); | |
1639 } | |
1640 | |
1641 print $OUT "\n"; | |
1642 $justConserved = ($self->{'_numConserved'})-($self->{'_numIdentical'}); | |
1643 printf( $OUT "\n%5s%s:\n%5s%s\n\t", "","CONSERVED RESIDUES IN $sbjctName (n=$justConserved)", | |
1644 "", "--------------------------------------------" ); | |
1645 $count = 0; | |
1646 foreach $resNum ( sort keys %{ $self->{'_conservedRes_sbjct' }} ) { | |
1647 $count++; | |
1648 print $OUT "$resNum"; | |
1649 $count > 0 and print $OUT +( $count % 15 ? ", " : "\n\t"); | |
1650 } | |
1651 } | |
1652 | |
1653 | |
1654 | |
1655 | |
1656 =head2 homol_data | |
1657 | |
1658 Usage : $data = $hsp_object->homo_data( %named_params ); | |
1659 Purpose : Gets similarity data for a single HSP. | |
1660 Returns : String: | |
1661 : "Homology data" for each HSP is in the format: | |
1662 : "<integer> <start> <stop>" | |
1663 : where integer is the value returned by homol_score(). | |
1664 Argument : Named params: (UPPER OR LOWERCASE TAGS) | |
1665 : currently just one param is used: | |
1666 : -SEQ =>'query' or 'sbjct' | |
1667 Throws : n/a | |
1668 Status : Experimental | |
1669 Comments : This is a very experimental method used for obtaining a | |
1670 : coarse indication of: | |
1671 : 1) how strong the similarity is between the sequences in the HSP, | |
1672 : 3) the endpoints of the alignment (sequence monomer numbers) | |
1673 | |
1674 See Also : L<homol_score()|homol_score>, B<Bio::Tools::Blast.::homol_data()>, B<Bio::Tools::Blast::Sbjct::homol_data()> | |
1675 | |
1676 =cut | |
1677 | |
1678 #--------------- | |
1679 sub homol_data { | |
1680 #--------------- | |
1681 my ($self, %param) = @_; | |
1682 my $seq = $param{-SEQ} || $param{'-seq'} || 'sbjct'; # 'query' or 'sbjct' | |
1683 my $homolScore = $self->homol_score(); | |
1684 # Sensitive to member name changes. | |
1685 $seq = "_\L$seq\E"; | |
1686 | |
1687 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
1688 return ( $homolScore.' '.$self->{$seq.'Start'}.' '.$self->{$seq.'Stop'}); | |
1689 } | |
1690 | |
1691 | |
1692 =head2 homol_score | |
1693 | |
1694 Usage : $self->homol_score(); | |
1695 Purpose : Get a homology score (integer 1 - 3) as a coarse representation of | |
1696 : the strength of the similarity independent of sequence composition. | |
1697 : Based on the Blast bit score. | |
1698 Example : $hscore = $hsp->homol_score(); | |
1699 Returns : Integer | |
1700 Argument : n/a | |
1701 Throws : n/a | |
1702 Status : Experimental | |
1703 Comments : See @Bio::Tools::Blast::HSP::SCORE_CUTOFFS for the specific values. | |
1704 : Currently, BIT_SCORE HOMOL_SCORE | |
1705 : --------- ----------- | |
1706 : >=100 --> 3 | |
1707 : 30-100 --> 2 | |
1708 : < 30 --> 1 | |
1709 | |
1710 See Also : L<homol_data()|homol_data> | |
1711 | |
1712 =cut | |
1713 | |
1714 #---------------- | |
1715 sub homol_score { | |
1716 #---------------- | |
1717 my $self = shift; | |
1718 | |
1719 if( $self->{'_bits'} >= $SCORE_CUTOFFS[0] ) { 1 } | |
1720 elsif($self->{'_bits'} < $SCORE_CUTOFFS[0] and | |
1721 $self->{'_bits'} >= $SCORE_CUTOFFS[1] ) { 2 } | |
1722 else { 3 } | |
1723 } | |
1724 | |
1725 | |
1726 ##################################################################################### | |
1727 ## CLASS METHODS ## | |
1728 ##################################################################################### | |
1729 | |
1730 =head1 CLASS METHODS | |
1731 | |
1732 =head2 collapse_nums | |
1733 | |
1734 Usage : @cnums = collapse_nums( @numbers ); | |
1735 Purpose : Collapses a list of numbers into a set of ranges of consecutive terms: | |
1736 : Useful for condensing long lists of consecutive numbers. | |
1737 : EXPANDED: | |
1738 : 1 2 3 4 5 6 10 12 13 14 15 17 18 20 21 22 24 26 30 31 32 | |
1739 : COLLAPSED: | |
1740 : 1-6 10 12-15 17 18 20-22 24 26 30-32 | |
1741 Argument : List of numbers and sorted numerically. | |
1742 Returns : List of numbers mixed with ranges of numbers (see above). | |
1743 Throws : n/a | |
1744 Comments : Probably belongs in a more general utility class. | |
1745 | |
1746 See Also : L<seq_inds()|seq_inds> | |
1747 | |
1748 =cut | |
1749 | |
1750 #------------------ | |
1751 sub collapse_nums { | |
1752 #------------------ | |
1753 # This is not the slickest connectivity algorithm, but will do for now. | |
1754 my @a = @_; | |
1755 my ($from, $to, $i, @ca, $consec); | |
1756 | |
1757 $consec = 0; | |
1758 for($i=0; $i < @a; $i++) { | |
1759 not $from and do{ $from = $a[$i]; next; }; | |
1760 if($a[$i] == $a[$i-1]+1) { | |
1761 $to = $a[$i]; | |
1762 $consec++; | |
1763 } else { | |
1764 if($consec == 1) { $from .= ",$to"; } | |
1765 else { $from .= $consec>1 ? "\-$to" : ""; } | |
1766 push @ca, split(',', $from); | |
1767 $from = $a[$i]; | |
1768 $consec = 0; | |
1769 $to = undef; | |
1770 } | |
1771 } | |
1772 if(defined $to) { | |
1773 if($consec == 1) { $from .= ",$to"; } | |
1774 else { $from .= $consec>1 ? "\-$to" : ""; } | |
1775 } | |
1776 push @ca, split(',', $from) if $from; | |
1777 | |
1778 @ca; | |
1779 } | |
1780 | |
1781 | |
1782 1; | |
1783 __END__ | |
1784 | |
1785 ##################################################################################### | |
1786 # END OF CLASS | |
1787 ##################################################################################### | |
1788 | |
1789 =head1 FOR DEVELOPERS ONLY | |
1790 | |
1791 =head2 Data Members | |
1792 | |
1793 Information about the various data members of this module is provided for those | |
1794 wishing to modify or understand the code. Two things to bear in mind: | |
1795 | |
1796 =over 4 | |
1797 | |
1798 =item 1 Do NOT rely on these in any code outside of this module. | |
1799 | |
1800 All data members are prefixed with an underscore to signify that they are private. | |
1801 Always use accessor methods. If the accessor doesn't exist or is inadequate, | |
1802 create or modify an accessor (and let me know, too!). | |
1803 | |
1804 =item 2 This documentation may be incomplete and out of date. | |
1805 | |
1806 It is easy for these data member descriptions to become obsolete as | |
1807 this module is still evolving. Always double check this info and search | |
1808 for members not described here. | |
1809 | |
1810 =back | |
1811 | |
1812 An instance of Bio::Tools::Blast::HSP.pm is a blessed reference to a hash containing | |
1813 all or some of the following fields: | |
1814 | |
1815 FIELD VALUE | |
1816 -------------------------------------------------------------- | |
1817 (member names are mostly self-explanatory) | |
1818 | |
1819 _score : | |
1820 _bits : | |
1821 _p : | |
1822 _n : Integer. The 'N' value listed in parenthesis with P/Expect value: | |
1823 : e.g., P(3) = 1.2e-30 ---> (N = 3). | |
1824 : Not defined in NCBI Blast2 with gaps. | |
1825 : To obtain the number of HSPs, use Bio::Tools::Blast::Sbjct::num_hsps(). | |
1826 _expect : | |
1827 _queryLength : | |
1828 _queryGaps : | |
1829 _queryStart : | |
1830 _queryStop : | |
1831 _querySeq : | |
1832 _sbjctLength : | |
1833 _sbjctGaps : | |
1834 _sbjctStart : | |
1835 _sbjctStop : | |
1836 _sbjctSeq : | |
1837 _matchSeq : String. Contains the symbols between the query and sbjct lines | |
1838 which indicate identical (letter) and conserved ('+') matches | |
1839 or a mismatch (' '). | |
1840 _numIdentical : | |
1841 _numConserved : | |
1842 _identicalRes_query : | |
1843 _identicalRes_sbjct : | |
1844 _conservedRes_query : | |
1845 _conservedRes_sbjct : | |
1846 _match_indent : The number of leading space characters on each line containing | |
1847 the match symbols. _match_indent is 13 in this example: | |
1848 Query: 285 QNSAPWGLARISHRERLNLGSFNKYLYDDDAG | |
1849 Q +APWGLARIS G+ + Y YD+ AG | |
1850 ^^^^^^^^^^^^^ | |
1851 | |
1852 INHERITED DATA MEMBERS | |
1853 | |
1854 _name : From Bio::Root::Object.pm. | |
1855 : | |
1856 _parent : From Bio::Root::Object.pm. This member contains a reference to the | |
1857 : Bio::Tools::Blast::Sbjct.pm object to which this hit belongs. | |
1858 | |
1859 | |
1860 =cut | |
1861 | |
1862 1; | |
1863 |