comparison variant_effect_predictor/Bio/Search/Hit/BlastHit.pm @ 0:1f6dce3d34e0

Uploaded
author mahtabm
date Thu, 11 Apr 2013 02:01:53 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:1f6dce3d34e0
1 #-----------------------------------------------------------------
2 # $Id: BlastHit.pm,v 1.13 2002/10/22 09:36:19 sac Exp $
3 #
4 # BioPerl module Bio::Search::Hit::BlastHit
5 #
6 # (This module was originally called Bio::Tools::Blast::Sbjct)
7 #
8 # Cared for by Steve Chervitz <sac@bioperl.org>
9 #
10 # You may distribute this module under the same terms as perl itself
11 #-----------------------------------------------------------------
12
13 ## POD Documentation:
14
15 =head1 NAME
16
17 Bio::Search::Hit::BlastHit - Bioperl BLAST Hit object
18
19 =head1 SYNOPSIS
20
21 The construction of BlastHit objects is performed by
22 Bio::SearchIO::blast::BlastHitFactory in a process that is
23 orchestrated by the Blast parser (B<Bio::SearchIO::blast::blast>).
24 The resulting BlastHits are then accessed via
25 B<Bio::Search::Result::BlastResult>). Therefore, you do not need to
26 use B<Bio::Search::Hit::BlastHit>) directly. If you need to construct
27 BlastHits directly, see the new() function for details.
28
29 For B<Bio::SearchIO> BLAST parsing usage examples, see the
30 B<examples/search-blast> directory of the Bioperl distribution.
31
32
33 =head1 DESCRIPTION
34
35 The Bio::Search::Hit::BlastHit.pm module encapsulates data and methods
36 for manipulating "hits" from a BLAST report. A BLAST hit is a
37 collection of HSPs along with other metadata such as sequence name
38 and score information. Hit objects are accessed via
39 B<Bio::Search::Result::BlastResult> objects after parsing a BLAST report using
40 the B<Bio::SearchIO> system.
41
42 In Blast lingo, the "sbjct" sequences are all the sequences
43 in a target database which were compared against a "query" sequence.
44 The terms "sbjct" and "hit" will be used interchangeably in this module.
45 All methods that take 'sbjct' as an argument also support 'hit' as a
46 synonym.
47
48 This module supports BLAST versions 1.x and 2.x, gapped and ungapped,
49 and PSI-BLAST.
50
51
52 =head2 HSP Tiling and Ambiguous Alignments
53
54 If a Blast hit has more than one HSP, the Bio::Search::Hit::BlastHit.pm
55 object has the ability to merge overlapping HSPs into contiguous
56 blocks. This permits the BlastHit object to sum data across all HSPs
57 without counting data in the overlapping regions multiple times, which
58 would happen if data from each overlapping HSP are simply summed. HSP
59 tiling is performed automatically when methods of the BlastHit object
60 that rely on tiled data are invoked. These include
61 L<frac_identical()|frac_identical>, L<frac_conserved()|frac_conserved>, L<gaps()|gaps>,
62 L<frac_aligned_query()|frac_aligned_query>, L<frac_aligned_hit()|frac_aligned_hit>,
63 L<num_unaligned_query()|num_unaligned_query>, L<num_unaligned_hit()|num_unaligned_hit>.
64
65 It also permits the assessment of an "ambiguous alignment" if the
66 query (or sbjct) sequences from different HSPs overlap
67 (see L<ambiguous_aln()|ambiguous_aln>). The existence
68 of an overlap could indicate a biologically interesting region in the
69 sequence, such as a repeated domain. The BlastHit object uses the
70 C<-OVERLAP> parameter to determine when two sequences overlap; if this is
71 set to 2 -- the default -- then any two sbjct or query HSP sequences
72 must overlap by more than two residues to get merged into the same
73 contig and counted as an overlap. See the L<BUGS | BUGS> section below for
74 "issues" with HSP tiling.
75
76
77 The results of the HSP tiling is reported with the following ambiguity codes:
78
79 'q' = Query sequence contains multiple sub-sequences matching
80 a single region in the sbjct sequence.
81
82 's' = Subject (BlastHit) sequence contains multiple sub-sequences matching
83 a single region in the query sequence.
84
85 'qs' = Both query and sbjct sequences contain more than one
86 sub-sequence with similarity to the other sequence.
87
88
89 For addition information about ambiguous BLAST alignments, see
90 L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils> and
91
92 http://www-genome.stanford.edu/Sacch3D/help/ambig_aln.html
93
94 =head1 DEPENDENCIES
95
96 Bio::Search::Hit::BlastHit.pm is a concrete class that inherits from
97 B<Bio::Root::Root> and B<Bio::Search::Hit::HitI>. and relies on
98 B<Bio::Search::HSP::BlastHSP>.
99
100
101 =head1 BUGS
102
103 One consequence of the HSP tiling is that methods that rely on HSP
104 tiling such as L<frac_identical()|frac_identical>, L<frac_conserved()|frac_conserved>, L<gaps()|gaps>
105 etc. may report misleading numbers when C<-OVERLAP> is set to a large
106 number. For example, say we have two HSPs and the query sequence tile
107 as follows:
108
109 1 8 22 30 40 60
110 Full seq: ------------------------------------------------------------
111 * ** * **
112 HSP1: --------------- (6 identical matches)
113 ** ** **
114 HSP2: ------------- (6 identical matches)
115
116
117 If C<-OVERLAP> is set to some number over 4, HSP1 and HSP2 will not be
118 tiled into a single contig and their numbers of identical matches will
119 be added, giving a total of 12, not 10 if they had be combined into
120 one contig. This can lead to number greater than 1.0 for methods
121 L<frac_identical()|frac_identical> and L<frac_conserved()|frac_conserved>. This is less of an issue
122 with gapped Blast since it tends to combine HSPs that would be listed
123 separately without gapping. (Fractions E<gt>1.0 can be viewed as a
124 signal for an interesting alignment that warrants further inspection,
125 thus turning this bug into a feature :-).
126
127 Using large values for C<-OVERLAP> can lead to incorrect numbers
128 reported by methods that rely on HSP tiling but can be useful if you
129 care more about detecting ambiguous alignments. Setting C<-OVERLAP>
130 to zero will lead to the most accurate numbers for the
131 tiling-dependent methods but will be useless for detecting overlapping
132 HSPs since all HSPs will appear to overlap.
133
134
135 =head1 SEE ALSO
136
137 Bio::Search::HSP::BlastHSP.pm - Blast HSP object.
138 Bio::Search::Result::BlastResult.pm - Blast Result object.
139 Bio::Search::Hit::HitI.pm - Interface implemented by BlastHit.pm
140 Bio::Root::Root.pm - Base class for BlastHit.pm
141
142 Links:
143
144 http://bio.perl.org/Core/POD/Search/Hit/Blast/BlastHSP.pm.html
145
146 http://bio.perl.org/Projects/modules.html - Online module documentation
147 http://bio.perl.org/Projects/Blast/ - Bioperl Blast Project
148 http://bio.perl.org/ - Bioperl Project Homepage
149
150
151 =head1 FEEDBACK
152
153 =head2 Mailing Lists
154
155 User feedback is an integral part of the evolution of this and other
156 Bioperl modules. Send your comments and suggestions preferably to one
157 of the Bioperl mailing lists. Your participation is much appreciated.
158
159 bioperl-l@bioperl.org - General discussion
160 http://bio.perl.org/MailList.html - About the mailing lists
161
162 =head2 Reporting Bugs
163
164 Report bugs to the Bioperl bug tracking system to help us keep track
165 the bugs and their resolution. Bug reports can be submitted via email
166 or the web:
167
168 bioperl-bugs@bio.perl.org
169 http://bio.perl.org/bioperl-bugs/
170
171 =head1 AUTHOR
172
173 Steve Chervitz E<lt>sac@bioperl.orgE<gt>
174
175 See L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.
176
177 =head1 ACKNOWLEDGEMENTS
178
179 This software was originally developed in the Department of Genetics
180 at Stanford University. I would also like to acknowledge my
181 colleagues at Affymetrix for useful feedback.
182
183 =head1 COPYRIGHT
184
185 Copyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.
186
187 =head1 DISCLAIMER
188
189 This software is provided "as is" without warranty of any kind.
190
191 =head1 APPENDIX
192
193 The rest of the documentation details each of the object methods.
194 Internal methods are usually preceded with a _
195
196 =cut
197
198
199 # Let the code begin...
200
201 package Bio::Search::Hit::BlastHit;
202
203 use strict;
204 use Bio::Search::Hit::HitI;
205 use Bio::Root::Root;
206 require Bio::Search::BlastUtils;
207 use vars qw( @ISA %SUMMARY_OFFSET $Revision);
208
209 use overload
210 '""' => \&to_string;
211
212 @ISA = qw( Bio::Root::Root Bio::Search::Hit::HitI );
213
214 $Revision = '$Id: BlastHit.pm,v 1.13 2002/10/22 09:36:19 sac Exp $'; #'
215
216
217 =head2 new
218
219 Usage : $hit = Bio::Search::Hit::BlastHit->new( %named_params );
220 : Bio::Search::Hit::BlastHit.pm objects are constructed
221 : automatically by Bio::SearchIO::BlastHitFactory.pm,
222 : so there is no need for direct instantiation.
223 Purpose : Constructs a new BlastHit object and Initializes key variables
224 : for the hit.
225 Returns : A Bio::Search::Hit::BlastHit object
226 Argument : Named Parameters:
227 : Parameter keys are case-insensitive.
228 : -RAW_DATA => array reference holding raw BLAST report data
229 : for a single hit. This includes all lines
230 : within the HSP alignment listing section of a
231 : traditional BLAST or PSI-BLAST (non-XML) report,
232 : starting at (or just after) the leading '>'.
233 : -HOLD_RAW_DATA => boolean, should -RAW_DATA be saved within the object.
234 : -QUERY_LEN => Length of the query sequence
235 : -ITERATION => integer (PSI-BLAST iteration number in which hit was found)
236 : -OVERLAP => integer (maximum overlap between adjacent
237 : HSPs when tiling)
238 : -PROGRAM => string (type of Blast: BLASTP, BLASTN, etc)
239 : -SIGNIF => significance
240 : -IS_PVAL => boolean, true if -SIGNIF contains a P-value
241 : -SCORE => raw BLAST score
242 : -FOUND_AGAIN => boolean, true if this was a hit from the
243 : section of a PSI-BLAST with iteration > 1
244 : containing sequences that were also found
245 : in iteration 1.
246 Comments : This object accepts raw Blast report data not because it
247 : is required for parsing, but in order to retrieve it
248 : (only available if -HOLD_RAW_DATA is set to true).
249
250 See Also : L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>, L<Bio::Root::Root::new()|Bio::Root::Root>
251
252 =cut
253
254 #-------------------
255 sub new {
256 #-------------------
257 my ($class, @args ) = @_;
258 my $self = $class->SUPER::new( @args );
259
260 my ($raw_data, $signif, $is_pval, $hold_raw);
261
262 ($self->{'_blast_program'}, $self->{'_query_length'}, $raw_data, $hold_raw,
263 $self->{'_overlap'}, $self->{'_iteration'}, $signif, $is_pval,
264 $self->{'_score'}, $self->{'_found_again'} ) =
265 $self->_rearrange( [qw(PROGRAM
266 QUERY_LEN
267 RAW_DATA
268 HOLD_RAW_DATA
269 OVERLAP
270 ITERATION
271 SIGNIF
272 IS_PVAL
273 SCORE
274 FOUND_AGAIN )], @args );
275
276 # TODO: Handle this in parser. Just pass in name parameter.
277 $self->_set_id( $raw_data->[0] );
278
279 if($is_pval) {
280 $self->{'_p'} = $signif;
281 } else {
282 $self->{'_expect'} = $signif;
283 }
284
285 if( $hold_raw ) {
286 $self->{'_hit_data'} = $raw_data;
287 }
288
289 return $self;
290 }
291
292 sub DESTROY {
293 my $self=shift;
294 #print STDERR "-->DESTROYING $self\n";
295 }
296
297
298 #=================================================
299 # Begin Bio::Search::Hit::HitI implementation
300 #=================================================
301
302 =head2 algorithm
303
304 Title : algorithm
305 Usage : $alg = $hit->algorithm();
306 Function: Gets the algorithm specification that was used to obtain the hit
307 For BLAST, the algorithm denotes what type of sequence was aligned
308 against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated
309 dna-prt, TBLASTN prt-translated dna, TBLASTX translated
310 dna-translated dna).
311 Returns : a scalar string
312 Args : none
313
314 =cut
315
316 #----------------
317 sub algorithm {
318 #----------------
319 my ($self,@args) = @_;
320 return $self->{'_blast_program'};
321 }
322
323 =head2 name
324
325 Usage : $hit->name([string]);
326 Purpose : Set/Get a string to identify the hit.
327 Example : $name = $hit->name;
328 : $hit->name('M81707');
329 Returns : String consisting of the hit's name or undef if not set.
330 Comments : The name is parsed out of the "Query=" line as the first chunk of
331 non-whitespace text. If you want the rest of the line, use
332 $hit->description().
333
334 See Also: L<accession()|accession>
335
336 =cut
337
338 #'
339
340 #----------------
341 sub name {
342 #----------------
343 my $self = shift;
344 if (@_) {
345 my $name = shift;
346 $name =~ s/^\s+|(\s+|,)$//g;
347 $self->{'_name'} = $name;
348 }
349 return $self->{'_name'};
350 }
351
352 =head2 description
353
354 Usage : $hit_object->description( [integer] );
355 Purpose : Set/Get a description string for the hit.
356 This is parsed out of the "Query=" line as everything after
357 the first chunk of non-whitespace text. Use $hit->name()
358 to get the first chunk (the ID of the sequence).
359 Example : $description = $hit->description;
360 : $desc_60char = $hit->description(60);
361 Argument : Integer (optional) indicating the desired length of the
362 : description string to be returned.
363 Returns : String consisting of the hit's description or undef if not set.
364
365 =cut
366
367 #'
368
369 #----------------
370 sub description {
371 #----------------
372 my( $self, $len ) = @_;
373 $len = (defined $len) ? $len : (CORE::length $self->{'_description'});
374 return substr( $self->{'_description'}, 0 ,$len );
375 }
376
377 =head2 accession
378
379 Title : accession
380 Usage : $acc = $hit->accession();
381 Function: Retrieve the accession (if available) for the hit
382 Returns : a scalar string (empty string if not set)
383 Args : none
384 Comments: Accession numbers are extracted based on the assumption that they
385 are delimited by | characters (NCBI-style). If this is not the case,
386 use the name() method and parse it as necessary.
387
388 See Also: L<name()|name>
389
390 =cut
391
392 #--------------------
393 sub accession {
394 #--------------------
395 my $self = shift;
396 if(@_) { $self->{'_accession'} = shift; }
397 $self->{'_accession'} || '';
398 }
399
400 =head2 raw_score
401
402 Usage : $hit_object->raw_score();
403 Purpose : Gets the BLAST score of the best HSP for the current Blast hit.
404 Example : $score = $hit_object->raw_score();
405 Returns : Integer
406 Argument : n/a
407 Throws : n/a
408
409 See Also : L<bits()|bits>
410
411 =cut
412
413 #----------
414 sub raw_score {
415 #----------
416 my $self = shift;
417
418 # The check for $self->{'_score'} is a remnant from the 'query' mode days
419 # in which the sbjct object would collect data from the description line only.
420
421 my ($score);
422 if(not defined($self->{'_score'})) {
423 $score = $self->hsp->score;
424 } else {
425 $score = $self->{'_score'};
426 }
427 return $score;
428 }
429
430
431 =head2 length
432
433 Usage : $hit_object->length();
434 Purpose : Get the total length of the hit sequence.
435 Example : $len = $hit_object->length();
436 Returns : Integer
437 Argument : n/a
438 Throws : n/a
439 Comments : Developer note: when using the built-in length function within
440 : this module, call it as CORE::length().
441
442 See Also : L<logical_length()|logical_length>, L<length_aln()|length_aln>
443
444 =cut
445
446 #-----------
447 sub length {
448 #-----------
449 my $self = shift;
450 return $self->{'_length'};
451 }
452
453 =head2 significance
454
455 Equivalent to L<signif()|signif>
456
457 =cut
458
459 #----------------
460 sub significance { shift->signif( @_ ); }
461 #----------------
462
463
464 =head2 next_hsp
465
466 Title : next_hsp
467 Usage : $hsp = $obj->next_hsp();
468 Function : returns the next available High Scoring Pair object
469 Example :
470 Returns : Bio::Search::HSP::BlastHSP or undef if finished
471 Args : none
472
473 =cut
474
475 #----------------
476 sub next_hsp {
477 #----------------
478 my $self = shift;
479
480 unless($self->{'_hsp_queue_started'}) {
481 $self->{'_hsp_queue'} = [$self->hsps()];
482 $self->{'_hsp_queue_started'} = 1;
483 }
484 pop @{$self->{'_hsp_queue'}};
485 }
486
487 #=================================================
488 # End Bio::Search::Hit::HitI implementation
489 #=================================================
490
491
492 # Providing a more explicit method for getting name of hit
493 # (corresponds with column name in HitTableWriter)
494 #----------------
495 sub hit_name {
496 #----------------
497 my $self = shift;
498 $self->name( @_ );
499 }
500
501 # Older method Delegates to description()
502 #----------------
503 sub desc {
504 #----------------
505 my $self = shift;
506 return $self->description( @_ );
507 }
508
509 # Providing a more explicit method for getting description of hit
510 # (corresponds with column name in HitTableWriter)
511 #----------------
512 sub hit_description {
513 #----------------
514 my $self = shift;
515 return $self->description( @_ );
516 }
517
518 =head2 score
519
520 Equivalent to L<raw_score()|raw_score>
521
522 =cut
523
524 #----------------
525 sub score { shift->raw_score( @_ ); }
526 #----------------
527
528
529 =head2 hit_length
530
531 Equivalent to L<length()|length>
532
533 =cut
534
535 # Providing a more explicit method for getting length of hit
536 #----------------
537 sub hit_length { shift->length( @_ ); }
538 #----------------
539
540
541 =head2 signif
542
543 Usage : $hit_object->signif( [format] );
544 Purpose : Get the P or Expect value for the best HSP of the given BLAST hit.
545 : The value returned is the one which is reported in the description
546 : section of the Blast report. For Blast1 and WU-Blast2, this
547 : is a P-value, for Blast2, it is an Expect value.
548 Example : $obj->signif() # returns 1.3e-34
549 : $obj->signif('exp') # returns -34
550 : $obj->signif('parts') # returns (1.3, -34)
551 Returns : Float or scientific notation number (the raw P/Expect value, DEFAULT).
552 : Integer if format == 'exp' (the magnitude of the base 10 exponent).
553 : 2-element list (float, int) if format == 'parts' and P/Expect value
554 : is in scientific notation (see Comments).
555 Argument : format: string of 'raw' | 'exp' | 'parts'
556 : 'raw' returns value given in report. Default. (1.2e-34)
557 : 'exp' returns exponent value only (34)
558 : 'parts' returns the decimal and exponent as a
559 : 2-element list (1.2, -34) (see Comments).
560 Throws : n/a
561 Comments : The signif() method provides a way to deal with the fact that
562 : Blast1 and Blast2 formats (and WU- vs. NCBI-BLAST) differ in
563 : what is reported in the description lines of each hit in the
564 : Blast report. The signif() method frees any client code from
565 : having to know if this is a P-value or an Expect value,
566 : making it easier to write code that can process both
567 : Blast1 and Blast2 reports. This is not necessarily a good thing,
568 : since one should always know when one is working with P-values or
569 : Expect values (hence the deprecated status).
570 : Use of expect() is recommended since all hits will have an Expect value.
571 :
572 : Using the 'parts' argument is not recommended since it will not
573 : work as expected if the expect value is not in scientific notation.
574 : That is, floats are not converted into sci notation before
575 : splitting into parts.
576
577 See Also : L<p()|p>, L<expect()|expect>, L<Bio::Search::BlastUtils::get_exponent()|Bio::Search::BlastUtils>
578
579 =cut
580
581 #-------------
582 sub signif {
583 #-------------
584 # Some duplication of logic for p(), expect() and signif() for the sake of performance.
585 my ($self, $fmt) = @_;
586
587 my $val = defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};
588
589 # $val can be zero.
590 defined($val) or $self->throw("Can't get P- or Expect value: HSPs may not have been set.");
591
592 return $val if not $fmt or $fmt =~ /^raw/i;
593 ## Special formats: exponent-only or as list.
594 return &Bio::Search::BlastUtils::get_exponent($val) if $fmt =~ /^exp/i;
595 return (split (/eE/, $val)) if $fmt =~ /^parts/i;
596
597 ## Default: return the raw P/Expect-value.
598 return $val;
599 }
600
601 #----------------
602 sub raw_hit_data {
603 #----------------
604 my $self = shift;
605 my $data = '>';
606 # Need to add blank lines where we've removed them.
607 foreach( @{$self->{'_hit_data'}} ) {
608 if( $_ eq 'end') {
609 $data .= "\n";
610 }
611 else {
612 $data .= /^\s*(Score|Query)/ ? "\n$_" : $_;
613 }
614 }
615 return $data;
616 }
617
618
619 #=head2 _set_length
620 #
621 # Usage : $hit_object->_set_length( "233" );
622 # Purpose : Set the total length of the hit sequence.
623 # Example : $hit_object->_set_length( $len );
624 # Returns : n/a
625 # Argument : Integer (only when setting). Any commas will be stripped out.
626 # Throws : n/a
627 #
628 #=cut
629
630 #-----------
631 sub _set_length {
632 #-----------
633 my ($self, $len) = @_;
634 $len =~ s/,//g; # get rid of commas
635 $self->{'_length'} = $len;
636 }
637
638 #=head2 _set_description
639 #
640 # Usage : Private method; called automatically during construction
641 # Purpose : Sets the description of the hit sequence.
642 # : For sequence without descriptions, does not set any description.
643 # Argument : Array containing description (multiple lines).
644 # Comments : Processes the supplied description:
645 # 1. Join all lines into one string.
646 # 2. Remove sequence id at the beginning of description.
647 # 3. Removes junk charactes at begin and end of description.
648 #
649 #=cut
650
651 #--------------
652 sub _set_description {
653 #--------------
654 my( $self, @desc ) = @_;
655 my( $desc);
656
657 # print STDERR "BlastHit: RAW DESC:\n@desc\n";
658
659 $desc = join(" ", @desc);
660
661 my $name = $self->name;
662
663 if($desc) {
664 $desc =~ s/^\s*\S+\s+//; # remove the sequence ID(s)
665 # This won't work if there's no description.
666 $desc =~ s/^\s*$name//; # ...but this should.
667 $desc =~ s/^[\s!]+//;
668 $desc =~ s/ \d+$//;
669 $desc =~ s/\.+$//;
670 $self->{'_description'} = $desc;
671 }
672
673 # print STDERR "BlastHit: _set_description = $desc\n";
674 }
675
676 =head2 to_string
677
678 Title : to_string
679 Usage : print $hit->to_string;
680 Function: Returns a string representation for the Blast Hit.
681 Primarily intended for debugging purposes.
682 Example : see usage
683 Returns : A string of the form:
684 [BlastHit] <name> <description>
685 e.g.:
686 [BlastHit] emb|Z46660|SC9725 S.cerevisiae chromosome XIII cosmid
687 Args : None
688
689 =cut
690
691 #----------------
692 sub to_string {
693 #----------------
694 my $self = shift;
695 return "[BlastHit] " . $self->name . " " . $self->description;
696 }
697
698
699 #=head2 _set_id
700 #
701 # Usage : Private method; automatically called by new()
702 # Purpose : Sets the name of the BlastHit sequence from the BLAST summary line.
703 # : The identifier is assumed to be the first
704 # : chunk of non-whitespace characters in the description line
705 # : Does not assume any semantics in the structure of the identifier
706 # : (Formerly, this method attempted to extract database name from
707 # : the seq identifiers, but this was prone to break).
708 # Returns : n/a
709 # Argument : String containing description line of the hit from Blast report
710 # : or first line of an alignment section (with or without the leading '>').
711 # Throws : Warning if cannot locate sequence ID.
712 #
713 #See Also : L<new()|new>, L<accession()|accession>
714 #
715 #=cut
716
717 #---------------
718 sub _set_id {
719 #---------------
720 my( $self, $desc ) = @_;
721
722 # New strategy: Assume only that the ID is the first white space
723 # delimited chunk. Not attempting to extract accession & database name.
724 # Clients will have to interpret it as necessary.
725 if($desc =~ /^>?(\S+)\s*(.*)/) {
726 my ($name, $desc) = ($1, $2);
727 $self->name($name);
728 $self->{'_description'} = $desc;
729 # Note that this description comes from the summary section of the
730 # BLAST report and so may be truncated. The full description will be
731 # set from the alignment section. We're setting description here in case
732 # the alignment section isn't being parsed.
733
734 # Assuming accession is delimited with | symbols (NCBI-style)
735 my @pieces = split(/\|/,$name);
736 my $acc = pop @pieces;
737 $self->accession( $acc );
738 }
739 else {
740 $self->warn("Can't locate sequence identifier in summary line.", "Line = $desc");
741 $desc = 'Unknown sequence ID' if not $desc;
742 $self->name($desc);
743 }
744 }
745
746
747 =head2 ambiguous_aln
748
749 Usage : $ambig_code = $hit_object->ambiguous_aln();
750 Purpose : Sets/Gets ambiguity code data member.
751 Example : (see usage)
752 Returns : String = 'q', 's', 'qs', '-'
753 : 'q' = query sequence contains overlapping sub-sequences
754 : while sbjct does not.
755 : 's' = sbjct sequence contains overlapping sub-sequences
756 : while query does not.
757 : 'qs' = query and sbjct sequence contains overlapping sub-sequences
758 : relative to each other.
759 : '-' = query and sbjct sequence do not contains multiple domains
760 : relative to each other OR both contain the same distribution
761 : of similar domains.
762 Argument : n/a
763 Throws : n/a
764 Status : Experimental
765
766 See Also : L<Bio::Search::BlastUtils::tile_hsps>, L<HSP Tiling and Ambiguous Alignments>
767
768 =cut
769
770 #--------------------
771 sub ambiguous_aln {
772 #--------------------
773 my $self = shift;
774 if(@_) { $self->{'_ambiguous_aln'} = shift; }
775 $self->{'_ambiguous_aln'} || '-';
776 }
777
778
779
780 =head2 overlap
781
782 Usage : $blast_object->overlap( [integer] );
783 Purpose : Gets/Sets the allowable amount overlap between different HSP sequences.
784 Example : $blast_object->overlap(5);
785 : $overlap = $blast_object->overlap;
786 Returns : Integer.
787 Argument : integer.
788 Throws : n/a
789 Status : Experimental
790 Comments : Any two HSPs whose sequences overlap by less than or equal
791 : to the overlap() number of resides will be considered separate HSPs
792 : and will not get tiled by Bio::Search::BlastUtils::_adjust_contigs().
793
794 See Also : L<Bio::Search::BlastUtils::_adjust_contigs()|Bio::Search::BlastUtils>, L<BUGS | BUGS>
795
796 =cut
797
798 #-------------
799 sub overlap {
800 #-------------
801 my $self = shift;
802 if(@_) { $self->{'_overlap'} = shift; }
803 defined $self->{'_overlap'} ? $self->{'_overlap'} : 0;
804 }
805
806
807
808
809
810
811 =head2 bits
812
813 Usage : $hit_object->bits();
814 Purpose : Gets the BLAST bit score of the best HSP for the current Blast hit.
815 Example : $bits = $hit_object->bits();
816 Returns : Integer
817 Argument : n/a
818 Throws : Exception if bit score is not set.
819 Comments : For BLAST1, the non-bit score is listed in the summary line.
820
821 See Also : L<score()|score>
822
823 =cut
824
825 #---------
826 sub bits {
827 #---------
828 my $self = shift;
829
830 # The check for $self->{'_bits'} is a remnant from the 'query' mode days
831 # in which the sbjct object would collect data from the description line only.
832
833 my ($bits);
834 if(not defined($self->{'_bits'})) {
835 $bits = $self->hsp->bits;
836 } else {
837 $bits = $self->{'_bits'};
838 }
839 return $bits;
840 }
841
842
843
844 =head2 n
845
846 Usage : $hit_object->n();
847 Purpose : Gets the N number for the current Blast hit.
848 : This is the number of HSPs in the set which was ascribed
849 : the lowest P-value (listed on the description line).
850 : This number is not the same as the total number of HSPs.
851 : To get the total number of HSPs, use num_hsps().
852 Example : $n = $hit_object->n();
853 Returns : Integer
854 Argument : n/a
855 Throws : Exception if HSPs have not been set (BLAST2 reports).
856 Comments : Note that the N parameter is not reported in gapped BLAST2.
857 : Calling n() on such reports will result in a call to num_hsps().
858 : The num_hsps() method will count the actual number of
859 : HSPs in the alignment listing, which may exceed N in
860 : some cases.
861
862 See Also : L<num_hsps()|num_hsps>
863
864 =cut
865
866 #-----
867 sub n {
868 #-----
869 my $self = shift;
870
871 # The check for $self->{'_n'} is a remnant from the 'query' mode days
872 # in which the sbjct object would collect data from the description line only.
873
874 my ($n);
875 if(not defined($self->{'_n'})) {
876 $n = $self->hsp->n;
877 } else {
878 $n = $self->{'_n'};
879 }
880 $n ||= $self->num_hsps;
881
882 return $n;
883 }
884
885
886
887 =head2 frame
888
889 Usage : $hit_object->frame();
890 Purpose : Gets the reading frame for the best HSP after HSP tiling.
891 : This is only valid for BLASTX and TBLASTN/X reports.
892 Example : $frame = $hit_object->frame();
893 Returns : Integer (-2 .. +2)
894 Argument : n/a
895 Throws : Exception if HSPs have not been set (BLAST2 reports).
896 Comments : This method requires that all HSPs be tiled. If they have not
897 : already been tiled, they will be tiled first automatically..
898 : If you don't want the tiled data, iterate through each HSP
899 : calling frame() on each (use hsps() to get all HSPs).
900
901 See Also : L<hsps()|hsps>
902
903 =cut
904
905 #----------'
906 sub frame {
907 #----------
908 my $self = shift;
909
910 Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
911
912 # The check for $self->{'_frame'} is a remnant from the 'query' mode days
913 # in which the sbjct object would collect data from the description line only.
914
915 my ($frame);
916 if(not defined($self->{'_frame'})) {
917 $frame = $self->hsp->frame;
918 } else {
919 $frame = $self->{'_frame'};
920 }
921 return $frame;
922 }
923
924
925
926
927
928 =head2 p
929
930 Usage : $hit_object->p( [format] );
931 Purpose : Get the P-value for the best HSP of the given BLAST hit.
932 : (Note that P-values are not provided with NCBI Blast2 reports).
933 Example : $p = $sbjct->p;
934 : $p = $sbjct->p('exp'); # get exponent only.
935 : ($num, $exp) = $sbjct->p('parts'); # split sci notation into parts
936 Returns : Float or scientific notation number (the raw P-value, DEFAULT).
937 : Integer if format == 'exp' (the magnitude of the base 10 exponent).
938 : 2-element list (float, int) if format == 'parts' and P-value
939 : is in scientific notation (See Comments).
940 Argument : format: string of 'raw' | 'exp' | 'parts'
941 : 'raw' returns value given in report. Default. (1.2e-34)
942 : 'exp' returns exponent value only (34)
943 : 'parts' returns the decimal and exponent as a
944 : 2-element list (1.2, -34) (See Comments).
945 Throws : Warns if no P-value is defined. Uses expect instead.
946 Comments : Using the 'parts' argument is not recommended since it will not
947 : work as expected if the P-value is not in scientific notation.
948 : That is, floats are not converted into sci notation before
949 : splitting into parts.
950
951 See Also : L<expect()|expect>, L<signif()|signif>, L<Bio::Search::BlastUtils::get_exponent()|Bio::Search::BlastUtils>
952
953 =cut
954
955 #--------
956 sub p {
957 #--------
958 # Some duplication of logic for p(), expect() and signif() for the sake of performance.
959 my ($self, $fmt) = @_;
960
961 my $val = $self->{'_p'};
962
963 # $val can be zero.
964 if(not defined $val) {
965 # P-value not defined, must be a NCBI Blast2 report.
966 # Use expect instead.
967 $self->warn( "P-value not defined. Using expect() instead.");
968 $val = $self->{'_expect'};
969 }
970
971 return $val if not $fmt or $fmt =~ /^raw/i;
972 ## Special formats: exponent-only or as list.
973 return &Bio::Search::BlastUtils::get_exponent($val) if $fmt =~ /^exp/i;
974 return (split (/eE/, $val)) if $fmt =~ /^parts/i;
975
976 ## Default: return the raw P-value.
977 return $val;
978 }
979
980
981
982 =head2 expect
983
984 Usage : $hit_object->expect( [format] );
985 Purpose : Get the Expect value for the best HSP of the given BLAST hit.
986 Example : $e = $sbjct->expect;
987 : $e = $sbjct->expect('exp'); # get exponent only.
988 : ($num, $exp) = $sbjct->expect('parts'); # split sci notation into parts
989 Returns : Float or scientific notation number (the raw expect value, DEFAULT).
990 : Integer if format == 'exp' (the magnitude of the base 10 exponent).
991 : 2-element list (float, int) if format == 'parts' and Expect
992 : is in scientific notation (see Comments).
993 Argument : format: string of 'raw' | 'exp' | 'parts'
994 : 'raw' returns value given in report. Default. (1.2e-34)
995 : 'exp' returns exponent value only (34)
996 : 'parts' returns the decimal and exponent as a
997 : 2-element list (1.2, -34) (see Comments).
998 Throws : Exception if the Expect value is not defined.
999 Comments : Using the 'parts' argument is not recommended since it will not
1000 : work as expected if the expect value is not in scientific notation.
1001 : That is, floats are not converted into sci notation before
1002 : splitting into parts.
1003
1004 See Also : L<p()|p>, L<signif()|signif>, L<Bio::Search::BlastUtils::get_exponent()|Bio::Search::BlastUtils>
1005
1006 =cut
1007
1008 #-----------
1009 sub expect {
1010 #-----------
1011 # Some duplication of logic for p(), expect() and signif() for the sake of performance.
1012 my ($self, $fmt) = @_;
1013
1014 my $val;
1015
1016 # For Blast reports that list the P value on the description line,
1017 # getting the expect value requires fully parsing the HSP data.
1018 # For NCBI blast, there's no problem.
1019 if(not defined($self->{'_expect'})) {
1020 if( defined $self->{'_hsps'}) {
1021 $self->{'_expect'} = $val = $self->hsp->expect;
1022 } else {
1023 # If _expect is not set and _hsps are not set,
1024 # then this must be a P-value-based report that was
1025 # run without setting the HSPs (shallow parsing).
1026 $self->throw("Can't get expect value. HSPs have not been set.");
1027 }
1028 } else {
1029 $val = $self->{'_expect'};
1030 }
1031
1032 # $val can be zero.
1033 defined($val) or $self->throw("Can't get Expect value.");
1034
1035 return $val if not $fmt or $fmt =~ /^raw/i;
1036 ## Special formats: exponent-only or as list.
1037 return &Bio::Search::BlastUtils::get_exponent($val) if $fmt =~ /^exp/i;
1038 return (split (/eE/, $val)) if $fmt =~ /^parts/i;
1039
1040 ## Default: return the raw Expect-value.
1041 return $val;
1042 }
1043
1044
1045 =head2 hsps
1046
1047 Usage : $hit_object->hsps();
1048 Purpose : Get a list containing all HSP objects.
1049 : Get the numbers of HSPs for the current hit.
1050 Example : @hsps = $hit_object->hsps();
1051 : $num = $hit_object->hsps(); # alternatively, use num_hsps()
1052 Returns : Array context : list of Bio::Search::HSP::BlastHSP.pm objects.
1053 : Scalar context: integer (number of HSPs).
1054 : (Equivalent to num_hsps()).
1055 Argument : n/a. Relies on wantarray
1056 Throws : Exception if the HSPs have not been collected.
1057
1058 See Also : L<hsp()|hsp>, L<num_hsps()|num_hsps>
1059
1060 =cut
1061
1062 #---------
1063 sub hsps {
1064 #---------
1065 my $self = shift;
1066
1067 if (not ref $self->{'_hsps'}) {
1068 $self->throw("Can't get HSPs: data not collected.");
1069 }
1070
1071 return wantarray
1072 # returning list containing all HSPs.
1073 ? @{$self->{'_hsps'}}
1074 # returning number of HSPs.
1075 : scalar(@{$self->{'_hsps'}});
1076 }
1077
1078
1079
1080 =head2 hsp
1081
1082 Usage : $hit_object->hsp( [string] );
1083 Purpose : Get a single BlastHSP.pm object for the present BlastHit.pm object.
1084 Example : $hspObj = $hit_object->hsp; # same as 'best'
1085 : $hspObj = $hit_object->hsp('best');
1086 : $hspObj = $hit_object->hsp('worst');
1087 Returns : Object reference for a Bio::Search::HSP::BlastHSP.pm object.
1088 Argument : String (or no argument).
1089 : No argument (default) = highest scoring HSP (same as 'best').
1090 : 'best' or 'first' = highest scoring HSP.
1091 : 'worst' or 'last' = lowest scoring HSP.
1092 Throws : Exception if the HSPs have not been collected.
1093 : Exception if an unrecognized argument is used.
1094
1095 See Also : L<hsps()|hsps>, L<num_hsps>()
1096
1097 =cut
1098
1099 #----------
1100 sub hsp {
1101 #----------
1102 my( $self, $option ) = @_;
1103 $option ||= 'best';
1104
1105 if (not ref $self->{'_hsps'}) {
1106 $self->throw("Can't get HSPs: data not collected.");
1107 }
1108
1109 my @hsps = @{$self->{'_hsps'}};
1110
1111 return $hsps[0] if $option =~ /best|first|1/i;
1112 return $hsps[$#hsps] if $option =~ /worst|last/i;
1113
1114 $self->throw("Can't get HSP for: $option\n" .
1115 "Valid arguments: 'best', 'worst'");
1116 }
1117
1118
1119
1120 =head2 num_hsps
1121
1122 Usage : $hit_object->num_hsps();
1123 Purpose : Get the number of HSPs for the present Blast hit.
1124 Example : $nhsps = $hit_object->num_hsps();
1125 Returns : Integer
1126 Argument : n/a
1127 Throws : Exception if the HSPs have not been collected.
1128
1129 See Also : L<hsps()|hsps>
1130
1131 =cut
1132
1133 #-------------
1134 sub num_hsps {
1135 #-------------
1136 my $self = shift;
1137
1138 if (not defined $self->{'_hsps'}) {
1139 $self->throw("Can't get HSPs: data not collected.");
1140 }
1141
1142 return scalar(@{$self->{'_hsps'}});
1143 }
1144
1145
1146
1147 =head2 logical_length
1148
1149 Usage : $hit_object->logical_length( [seq_type] );
1150 : (mostly intended for internal use).
1151 Purpose : Get the logical length of the hit sequence.
1152 : For query sequence of BLASTX and TBLASTX reports and the hit
1153 : sequence of TBLASTN and TBLASTX reports, the returned length
1154 : is the length of the would-be amino acid sequence (length/3).
1155 : For all other BLAST flavors, this function is the same as length().
1156 Example : $len = $hit_object->logical_length();
1157 Returns : Integer
1158 Argument : seq_type = 'query' or 'hit' or 'sbjct' (default = 'query')
1159 ('sbjct' is synonymous with 'hit')
1160 Throws : n/a
1161 Comments : This is important for functions like frac_aligned_query()
1162 : which need to operate in amino acid coordinate space when dealing
1163 : with T?BLASTX type reports.
1164
1165 See Also : L<length()|length>, L<frac_aligned_query()|frac_aligned_query>, L<frac_aligned_hit()|frac_aligned_hit>
1166
1167 =cut
1168
1169 #--------------------
1170 sub logical_length {
1171 #--------------------
1172 my $self = shift;
1173 my $seqType = shift || 'query';
1174 $seqType = 'sbjct' if $seqType eq 'hit';
1175
1176 my $length;
1177
1178 # For the sbjct, return logical sbjct length
1179 if( $seqType eq 'sbjct' ) {
1180 $length = $self->{'_logical_length'} || $self->{'_length'};
1181 }
1182 else {
1183 # Otherwise, return logical query length
1184 $length = $self->{'_query_length'};
1185
1186 # Adjust length based on BLAST flavor.
1187 if($self->{'_blast_program'} =~ /T?BLASTX/ ) {
1188 $length /= 3;
1189 }
1190 }
1191 return $length;
1192 }
1193
1194
1195 =head2 length_aln
1196
1197 Usage : $hit_object->length_aln( [seq_type] );
1198 Purpose : Get the total length of the aligned region for query or sbjct seq.
1199 : This number will include all HSPs
1200 Example : $len = $hit_object->length_aln(); # default = query
1201 : $lenAln = $hit_object->length_aln('query');
1202 Returns : Integer
1203 Argument : seq_Type = 'query' or 'hit' or 'sbjct' (Default = 'query')
1204 ('sbjct' is synonymous with 'hit')
1205 Throws : Exception if the argument is not recognized.
1206 Comments : This method will report the logical length of the alignment,
1207 : meaning that for TBLAST[NX] reports, the length is reported
1208 : using amino acid coordinate space (i.e., nucleotides / 3).
1209 :
1210 : This method requires that all HSPs be tiled. If they have not
1211 : already been tiled, they will be tiled first automatically..
1212 : If you don't want the tiled data, iterate through each HSP
1213 : calling length() on each (use hsps() to get all HSPs).
1214
1215 See Also : L<length()|length>, L<frac_aligned_query()|frac_aligned_query>, L<frac_aligned_hit()|frac_aligned_hit>, L<gaps()|gaps>, L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>, L<Bio::Search::HSP::BlastHSP::length()|Bio::Search::HSP::BlastHSP>
1216
1217 =cut
1218
1219 #---------------'
1220 sub length_aln {
1221 #---------------
1222 my( $self, $seqType ) = @_;
1223
1224 $seqType ||= 'query';
1225 $seqType = 'sbjct' if $seqType eq 'hit';
1226
1227 Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
1228
1229 my $data = $self->{'_length_aln_'.$seqType};
1230
1231 ## If we don't have data, figure out what went wrong.
1232 if(!$data) {
1233 $self->throw("Can't get length aln for sequence type \"$seqType\"" .
1234 "Valid types are 'query', 'hit', 'sbjct' ('sbjct' = 'hit')");
1235 }
1236 $data;
1237 }
1238
1239
1240 =head2 gaps
1241
1242 Usage : $hit_object->gaps( [seq_type] );
1243 Purpose : Get the number of gaps in the aligned query, sbjct, or both sequences.
1244 : Data is summed across all HSPs.
1245 Example : $qgaps = $hit_object->gaps('query');
1246 : $hgaps = $hit_object->gaps('hit');
1247 : $tgaps = $hit_object->gaps(); # default = total (query + hit)
1248 Returns : scalar context: integer
1249 : array context without args: two-element list of integers
1250 : (queryGaps, sbjctGaps)
1251 : Array context can be forced by providing an argument of 'list' or 'array'.
1252 :
1253 : CAUTION: Calling this method within printf or sprintf is arrray context.
1254 : So this function may not give you what you expect. For example:
1255 : printf "Total gaps: %d", $hit->gaps();
1256 : Actually returns a two-element array, so what gets printed
1257 : is the number of gaps in the query, not the total
1258 :
1259 Argument : seq_type: 'query' | 'hit' or 'sbjct' | 'total' | 'list' (default = 'total')
1260 ('sbjct' is synonymous with 'hit')
1261 Throws : n/a
1262 Comments : If you need data for each HSP, use hsps() and then interate
1263 : through each HSP object.
1264 : This method requires that all HSPs be tiled. If they have not
1265 : already been tiled, they will be tiled first automatically..
1266 : Not relying on wantarray since that will fail in situations
1267 : such as printf "%d", $hit->gaps() in which you might expect to
1268 : be printing the total gaps, but evaluates to array context.
1269
1270 See Also : L<length_aln()|length_aln>
1271
1272 =cut
1273
1274 #----------
1275 sub gaps {
1276 #----------
1277 my( $self, $seqType ) = @_;
1278
1279 $seqType ||= (wantarray ? 'list' : 'total');
1280 $seqType = 'sbjct' if $seqType eq 'hit';
1281
1282 Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
1283
1284 $seqType = lc($seqType);
1285
1286 if($seqType =~ /list|array/i) {
1287 return ($self->{'_gaps_query'}, $self->{'_gaps_sbjct'});
1288 }
1289
1290 if($seqType eq 'total') {
1291 return ($self->{'_gaps_query'} + $self->{'_gaps_sbjct'}) || 0;
1292 } else {
1293 return $self->{'_gaps_'.$seqType} || 0;
1294 }
1295 }
1296
1297
1298
1299 =head2 matches
1300
1301 Usage : $hit_object->matches( [class] );
1302 Purpose : Get the total number of identical or conserved matches
1303 : (or both) across all HSPs.
1304 : (Note: 'conservative' matches are indicated as 'positives'
1305 : in the Blast report.)
1306 Example : ($id,$cons) = $hit_object->matches(); # no argument
1307 : $id = $hit_object->matches('id');
1308 : $cons = $hit_object->matches('cons');
1309 Returns : Integer or a 2-element array of integers
1310 Argument : class = 'id' | 'cons' OR none.
1311 : If no argument is provided, both identical and conservative
1312 : numbers are returned in a two element list.
1313 : (Other terms can be used to refer to the conservative
1314 : matches, e.g., 'positive'. All that is checked is whether or
1315 : not the supplied string starts with 'id'. If not, the
1316 : conservative matches are returned.)
1317 Throws : Exception if the requested data cannot be obtained.
1318 Comments : If you need data for each HSP, use hsps() and then interate
1319 : through the HSP objects.
1320 : Does not rely on wantarray to return a list. Only checks for
1321 : the presence of an argument (no arg = return list).
1322
1323 See Also : L<Bio::Search::HSP::BlastHSP::matches()|Bio::Search::HSP::BlastHSP>, L<hsps()|hsps>
1324
1325 =cut
1326
1327 #---------------
1328 sub matches {
1329 #---------------
1330 my( $self, $arg) = @_;
1331 my(@data,$data);
1332
1333 if(!$arg) {
1334 @data = ($self->{'_totalIdentical'}, $self->{'_totalConserved'});
1335
1336 return @data if @data;
1337
1338 } else {
1339
1340 if($arg =~ /^id/i) {
1341 $data = $self->{'_totalIdentical'};
1342 } else {
1343 $data = $self->{'_totalConserved'};
1344 }
1345 return $data if $data;
1346 }
1347
1348 ## Something went wrong if we make it to here.
1349 $self->throw("Can't get identical or conserved data: no data.");
1350 }
1351
1352
1353 =head2 start
1354
1355 Usage : $sbjct->start( [seq_type] );
1356 Purpose : Gets the start coordinate for the query, sbjct, or both sequences
1357 : in the BlastHit object. If there is more than one HSP, the lowest start
1358 : value of all HSPs is returned.
1359 Example : $qbeg = $sbjct->start('query');
1360 : $sbeg = $sbjct->start('hit');
1361 : ($qbeg, $sbeg) = $sbjct->start();
1362 Returns : scalar context: integer
1363 : array context without args: list of two integers (queryStart, sbjctStart)
1364 : Array context can be "induced" by providing an argument of 'list' or 'array'.
1365 Argument : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default = 'query')
1366 ('sbjct' is synonymous with 'hit')
1367 Throws : n/a
1368 Comments : This method requires that all HSPs be tiled. If there is more than one
1369 : HSP and they have not already been tiled, they will be tiled first automatically..
1370 : Remember that the start and end coordinates of all HSPs are
1371 : normalized so that start < end. Strand information can be
1372 : obtained by calling $hit->strand().
1373
1374 See Also : L<end()|end>, L<range()|range>, L<strand()|strand>, L<HSP Tiling and Ambiguous Alignments>, L<Bio::Search::HSP::BlastHSP::start|Bio::Search::HSP::BlastHSP>
1375
1376 =cut
1377
1378 #----------
1379 sub start {
1380 #----------
1381 my ($self, $seqType) = @_;
1382
1383 $seqType ||= (wantarray ? 'list' : 'query');
1384 $seqType = 'sbjct' if $seqType eq 'hit';
1385
1386 # If there is only one HSP, defer this call to the solitary HSP.
1387 if($self->num_hsps == 1) {
1388 return $self->hsp->start($seqType);
1389 } else {
1390 Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
1391 if($seqType =~ /list|array/i) {
1392 return ($self->{'_queryStart'}, $self->{'_sbjctStart'});
1393 } else {
1394 ## Sensitive to member name changes.
1395 $seqType = "_\L$seqType\E";
1396 return $self->{$seqType.'Start'};
1397 }
1398 }
1399 }
1400
1401
1402 =head2 end
1403
1404 Usage : $sbjct->end( [seq_type] );
1405 Purpose : Gets the end coordinate for the query, sbjct, or both sequences
1406 : in the BlastHit object. If there is more than one HSP, the largest end
1407 : value of all HSPs is returned.
1408 Example : $qend = $sbjct->end('query');
1409 : $send = $sbjct->end('hit');
1410 : ($qend, $send) = $sbjct->end();
1411 Returns : scalar context: integer
1412 : array context without args: list of two integers (queryEnd, sbjctEnd)
1413 : Array context can be "induced" by providing an argument of 'list' or 'array'.
1414 Argument : In scalar context: seq_type = 'query' or 'sbjct'
1415 : (case insensitive). If not supplied, 'query' is used.
1416 Throws : n/a
1417 Comments : This method requires that all HSPs be tiled. If there is more than one
1418 : HSP and they have not already been tiled, they will be tiled first automatically..
1419 : Remember that the start and end coordinates of all HSPs are
1420 : normalized so that start < end. Strand information can be
1421 : obtained by calling $hit->strand().
1422
1423 See Also : L<start()|start>, L<range()|range>, L<strand()|strand>, L<HSP Tiling and Ambiguous Alignments>, L<Bio::Search::HSP::BlastHSP::end|Bio::Search::HSP::BlastHSP>
1424
1425 =cut
1426
1427 #----------
1428 sub end {
1429 #----------
1430 my ($self, $seqType) = @_;
1431
1432 $seqType ||= (wantarray ? 'list' : 'query');
1433 $seqType = 'sbjct' if $seqType eq 'hit';
1434
1435 # If there is only one HSP, defer this call to the solitary HSP.
1436 if($self->num_hsps == 1) {
1437 return $self->hsp->end($seqType);
1438 } else {
1439 Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
1440 if($seqType =~ /list|array/i) {
1441 return ($self->{'_queryStop'}, $self->{'_sbjctStop'});
1442 } else {
1443 ## Sensitive to member name changes.
1444 $seqType = "_\L$seqType\E";
1445 return $self->{$seqType.'Stop'};
1446 }
1447 }
1448 }
1449
1450 =head2 range
1451
1452 Usage : $sbjct->range( [seq_type] );
1453 Purpose : Gets the (start, end) coordinates for the query or sbjct sequence
1454 : in the HSP alignment.
1455 Example : ($qbeg, $qend) = $sbjct->range('query');
1456 : ($sbeg, $send) = $sbjct->range('hit');
1457 Returns : Two-element array of integers
1458 Argument : seq_type = string, 'query' or 'hit' or 'sbjct' (default = 'query')
1459 ('sbjct' is synonymous with 'hit')
1460 Throws : n/a
1461
1462 See Also : L<start()|start>, L<end()|end>
1463
1464 =cut
1465
1466 #----------
1467 sub range {
1468 #----------
1469 my ($self, $seqType) = @_;
1470 $seqType ||= 'query';
1471 $seqType = 'sbjct' if $seqType eq 'hit';
1472 return ($self->start($seqType), $self->end($seqType));
1473 }
1474
1475
1476 =head2 frac_identical
1477
1478 Usage : $hit_object->frac_identical( [seq_type] );
1479 Purpose : Get the overall fraction of identical positions across all HSPs.
1480 : The number refers to only the aligned regions and does not
1481 : account for unaligned regions in between the HSPs, if any.
1482 Example : $frac_iden = $hit_object->frac_identical('query');
1483 Returns : Float (2-decimal precision, e.g., 0.75).
1484 Argument : seq_type: 'query' | 'hit' or 'sbjct' | 'total'
1485 : default = 'query' (but see comments below).
1486 : ('sbjct' is synonymous with 'hit')
1487 Throws : n/a
1488 Comments : Different versions of Blast report different values for the total
1489 : length of the alignment. This is the number reported in the
1490 : denominators in the stats section:
1491 : "Identical = 34/120 Positives = 67/120".
1492 : NCBI BLAST uses the total length of the alignment (with gaps)
1493 : WU-BLAST uses the length of the query sequence (without gaps).
1494 :
1495 : Therefore, when called with an argument of 'total',
1496 : this method will report different values depending on the
1497 : version of BLAST used. Total does NOT take into account HSP
1498 : tiling, so it should not be used.
1499 :
1500 : To get the fraction identical among only the aligned residues,
1501 : ignoring the gaps, call this method without an argument or
1502 : with an argument of 'query' or 'hit'.
1503 :
1504 : If you need data for each HSP, use hsps() and then iterate
1505 : through the HSP objects.
1506 : This method requires that all HSPs be tiled. If they have not
1507 : already been tiled, they will be tiled first automatically.
1508
1509 See Also : L<frac_conserved()|frac_conserved>, L<frac_aligned_query()|frac_aligned_query>, L<matches()|matches>, L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>
1510
1511 =cut
1512
1513 #------------------
1514 sub frac_identical {
1515 #------------------
1516 my ($self, $seqType) = @_;
1517 $seqType ||= 'query';
1518 $seqType = 'sbjct' if $seqType eq 'hit';
1519
1520 ## Sensitive to member name format.
1521 $seqType = lc($seqType);
1522
1523 Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
1524
1525 sprintf( "%.2f", $self->{'_totalIdentical'}/$self->{'_length_aln_'.$seqType});
1526 }
1527
1528
1529
1530 =head2 frac_conserved
1531
1532 Usage : $hit_object->frac_conserved( [seq_type] );
1533 Purpose : Get the overall fraction of conserved positions across all HSPs.
1534 : The number refers to only the aligned regions and does not
1535 : account for unaligned regions in between the HSPs, if any.
1536 Example : $frac_cons = $hit_object->frac_conserved('hit');
1537 Returns : Float (2-decimal precision, e.g., 0.75).
1538 Argument : seq_type: 'query' | 'hit' or 'sbjct' | 'total'
1539 : default = 'query' (but see comments below).
1540 : ('sbjct' is synonymous with 'hit')
1541 Throws : n/a
1542 Comments : Different versions of Blast report different values for the total
1543 : length of the alignment. This is the number reported in the
1544 : denominators in the stats section:
1545 : "Positives = 34/120 Positives = 67/120".
1546 : NCBI BLAST uses the total length of the alignment (with gaps)
1547 : WU-BLAST uses the length of the query sequence (without gaps).
1548 :
1549 : Therefore, when called with an argument of 'total',
1550 : this method will report different values depending on the
1551 : version of BLAST used. Total does NOT take into account HSP
1552 : tiling, so it should not be used.
1553 :
1554 : To get the fraction conserved among only the aligned residues,
1555 : ignoring the gaps, call this method without an argument or
1556 : with an argument of 'query' or 'hit'.
1557 :
1558 : If you need data for each HSP, use hsps() and then interate
1559 : through the HSP objects.
1560 : This method requires that all HSPs be tiled. If they have not
1561 : already been tiled, they will be tiled first automatically.
1562
1563 See Also : L<frac_identical()|frac_identical>, L<matches()|matches>, L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>
1564
1565 =cut
1566
1567 #--------------------
1568 sub frac_conserved {
1569 #--------------------
1570 my ($self, $seqType) = @_;
1571 $seqType ||= 'query';
1572 $seqType = 'sbjct' if $seqType eq 'hit';
1573
1574 ## Sensitive to member name format.
1575 $seqType = lc($seqType);
1576
1577 Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
1578
1579 sprintf( "%.2f", $self->{'_totalConserved'}/$self->{'_length_aln_'.$seqType});
1580 }
1581
1582
1583
1584
1585 =head2 frac_aligned_query
1586
1587 Usage : $hit_object->frac_aligned_query();
1588 Purpose : Get the fraction of the query sequence which has been aligned
1589 : across all HSPs (not including intervals between non-overlapping
1590 : HSPs).
1591 Example : $frac_alnq = $hit_object->frac_aligned_query();
1592 Returns : Float (2-decimal precision, e.g., 0.75).
1593 Argument : n/a
1594 Throws : n/a
1595 Comments : If you need data for each HSP, use hsps() and then interate
1596 : through the HSP objects.
1597 : To compute the fraction aligned, the logical length of the query
1598 : sequence is used, meaning that for [T]BLASTX reports, the
1599 : full length of the query sequence is converted into amino acids
1600 : by dividing by 3. This is necessary because of the way
1601 : the lengths of aligned sequences are computed.
1602 : This method requires that all HSPs be tiled. If they have not
1603 : already been tiled, they will be tiled first automatically.
1604
1605 See Also : L<frac_aligned_hit()|frac_aligned_hit>, L<logical_length()|logical_length>, L<length_aln()|length_aln>, L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>
1606
1607 =cut
1608
1609 #----------------------
1610 sub frac_aligned_query {
1611 #----------------------
1612 my $self = shift;
1613
1614 Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
1615
1616 sprintf( "%.2f", $self->{'_length_aln_query'}/$self->logical_length('query'));
1617 }
1618
1619
1620
1621 =head2 frac_aligned_hit
1622
1623 Usage : $hit_object->frac_aligned_hit();
1624 Purpose : Get the fraction of the hit (sbjct) sequence which has been aligned
1625 : across all HSPs (not including intervals between non-overlapping
1626 : HSPs).
1627 Example : $frac_alnq = $hit_object->frac_aligned_hit();
1628 Returns : Float (2-decimal precision, e.g., 0.75).
1629 Argument : n/a
1630 Throws : n/a
1631 Comments : If you need data for each HSP, use hsps() and then interate
1632 : through the HSP objects.
1633 : To compute the fraction aligned, the logical length of the sbjct
1634 : sequence is used, meaning that for TBLAST[NX] reports, the
1635 : full length of the sbjct sequence is converted into amino acids
1636 : by dividing by 3. This is necessary because of the way
1637 : the lengths of aligned sequences are computed.
1638 : This method requires that all HSPs be tiled. If they have not
1639 : already been tiled, they will be tiled first automatically.
1640
1641 See Also : L<frac_aligned_query()|frac_aligned_query>, L<matches()|matches>, , L<logical_length()|logical_length>, L<length_aln()|length_aln>, L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>
1642
1643 =cut
1644
1645 #--------------------
1646 sub frac_aligned_hit {
1647 #--------------------
1648 my $self = shift;
1649
1650 Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
1651
1652 sprintf( "%.2f", $self->{'_length_aln_sbjct'}/$self->logical_length('sbjct'));
1653 }
1654
1655
1656 ## These methods are being maintained for backward compatibility.
1657
1658 =head2 frac_aligned_sbjct
1659
1660 Same as L<frac_aligned_hit()|frac_aligned_hit>
1661
1662 =cut
1663
1664 #----------------
1665 sub frac_aligned_sbjct { my $self=shift; $self->frac_aligned_hit(@_); }
1666 #----------------
1667
1668 =head2 num_unaligned_sbjct
1669
1670 Same as L<num_unaligned_hit()|num_unaligned_hit>
1671
1672 =cut
1673
1674 #----------------
1675 sub num_unaligned_sbjct { my $self=shift; $self->num_unaligned_hit(@_); }
1676 #----------------
1677
1678
1679
1680 =head2 num_unaligned_hit
1681
1682 Usage : $hit_object->num_unaligned_hit();
1683 Purpose : Get the number of the unaligned residues in the hit sequence.
1684 : Sums across all all HSPs.
1685 Example : $num_unaln = $hit_object->num_unaligned_hit();
1686 Returns : Integer
1687 Argument : n/a
1688 Throws : n/a
1689 Comments : See notes regarding logical lengths in the comments for frac_aligned_hit().
1690 : They apply here as well.
1691 : If you need data for each HSP, use hsps() and then interate
1692 : through the HSP objects.
1693 : This method requires that all HSPs be tiled. If they have not
1694 : already been tiled, they will be tiled first automatically..
1695
1696 See Also : L<num_unaligned_query()|num_unaligned_query>, L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>, L<frac_aligned_hit()|frac_aligned_hit>
1697
1698 =cut
1699
1700 #---------------------
1701 sub num_unaligned_hit {
1702 #---------------------
1703 my $self = shift;
1704
1705 Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
1706
1707 my $num = $self->logical_length('sbjct') - $self->{'_length_aln_sbjct'};
1708 ($num < 0 ? 0 : $num );
1709 }
1710
1711
1712 =head2 num_unaligned_query
1713
1714 Usage : $hit_object->num_unaligned_query();
1715 Purpose : Get the number of the unaligned residues in the query sequence.
1716 : Sums across all all HSPs.
1717 Example : $num_unaln = $hit_object->num_unaligned_query();
1718 Returns : Integer
1719 Argument : n/a
1720 Throws : n/a
1721 Comments : See notes regarding logical lengths in the comments for frac_aligned_query().
1722 : They apply here as well.
1723 : If you need data for each HSP, use hsps() and then interate
1724 : through the HSP objects.
1725 : This method requires that all HSPs be tiled. If they have not
1726 : already been tiled, they will be tiled first automatically..
1727
1728 See Also : L<num_unaligned_hit()|num_unaligned_hit>, L<frac_aligned_query()|frac_aligned_query>, L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>
1729
1730 =cut
1731
1732 #-----------------------
1733 sub num_unaligned_query {
1734 #-----------------------
1735 my $self = shift;
1736
1737 Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
1738
1739 my $num = $self->logical_length('query') - $self->{'_length_aln_query'};
1740 ($num < 0 ? 0 : $num );
1741 }
1742
1743
1744
1745 =head2 seq_inds
1746
1747 Usage : $hit->seq_inds( seq_type, class, collapse );
1748 Purpose : Get a list of residue positions (indices) across all HSPs
1749 : for identical or conserved residues in the query or sbjct sequence.
1750 Example : @s_ind = $hit->seq_inds('query', 'identical');
1751 : @h_ind = $hit->seq_inds('hit', 'conserved');
1752 : @h_ind = $hit->seq_inds('hit', 'conserved', 1);
1753 Returns : Array of integers
1754 : May include ranges if collapse is non-zero.
1755 Argument : [0] seq_type = 'query' or 'hit' or 'sbjct' (default = 'query')
1756 : ('sbjct' is synonymous with 'hit')
1757 : [1] class = 'identical' or 'conserved' (default = 'identical')
1758 : (can be shortened to 'id' or 'cons')
1759 : (actually, anything not 'id' will evaluate to 'conserved').
1760 : [2] collapse = boolean, if non-zero, consecutive positions are merged
1761 : using a range notation, e.g., "1 2 3 4 5 7 9 10 11"
1762 : collapses to "1-5 7 9-11". This is useful for
1763 : consolidating long lists. Default = no collapse.
1764 Throws : n/a.
1765 Comments : Note that HSPs are not tiled for this. This could be a problem
1766 : for hits containing mutually exclusive HSPs.
1767 : TODO: Consider tiling and then reporting seq_inds for the
1768 : best HSP contig.
1769
1770 See Also : L<Bio::Search::HSP::BlastHSP::seq_inds()|Bio::Search::HSP::BlastHSP>
1771
1772 =cut
1773
1774 #-------------
1775 sub seq_inds {
1776 #-------------
1777 my ($self, $seqType, $class, $collapse) = @_;
1778
1779 $seqType ||= 'query';
1780 $class ||= 'identical';
1781 $collapse ||= 0;
1782
1783 $seqType = 'sbjct' if $seqType eq 'hit';
1784
1785 my (@inds, $hsp);
1786 foreach $hsp ($self->hsps) {
1787 # This will merge data for all HSPs together.
1788 push @inds, $hsp->seq_inds($seqType, $class);
1789 }
1790
1791 # Need to remove duplicates and sort the merged positions.
1792 if(@inds) {
1793 my %tmp = map { $_, 1 } @inds;
1794 @inds = sort {$a <=> $b} keys %tmp;
1795 }
1796
1797 $collapse ? &Bio::Search::BlastUtils::collapse_nums(@inds) : @inds;
1798 }
1799
1800
1801 =head2 iteration
1802
1803 Usage : $sbjct->iteration( );
1804 Purpose : Gets the iteration number in which the Hit was found.
1805 Example : $iteration_num = $sbjct->iteration();
1806 Returns : Integer greater than or equal to 1
1807 Non-PSI-BLAST reports will report iteration as 1, but this number
1808 is only meaningful for PSI-BLAST reports.
1809 Argument : none
1810 Throws : none
1811
1812 See Also : L<found_again()|found_again>
1813
1814 =cut
1815
1816 #----------------
1817 sub iteration { shift->{'_iteration'} }
1818 #----------------
1819
1820
1821 =head2 found_again
1822
1823 Usage : $sbjct->found_again;
1824 Purpose : Gets a boolean indicator whether or not the hit has
1825 been found in a previous iteration.
1826 This is only applicable to PSI-BLAST reports.
1827
1828 This method indicates if the hit was reported in the
1829 "Sequences used in model and found again" section of the
1830 PSI-BLAST report or if it was reported in the
1831 "Sequences not found previously or not previously below threshold"
1832 section of the PSI-BLAST report. Only for hits in iteration > 1.
1833
1834 Example : if( $sbjct->found_again()) { ... };
1835 Returns : Boolean (1 or 0) for PSI-BLAST report iterations greater than 1.
1836 Returns undef for PSI-BLAST report iteration 1 and non PSI_BLAST
1837 reports.
1838 Argument : none
1839 Throws : none
1840
1841 See Also : L<found_again()|found_again>
1842
1843 =cut
1844
1845 #----------------
1846 sub found_again { shift->{'_found_again'} }
1847 #----------------
1848
1849
1850 =head2 strand
1851
1852 Usage : $sbjct->strand( [seq_type] );
1853 Purpose : Gets the strand(s) for the query, sbjct, or both sequences
1854 : in the best HSP of the BlastHit object after HSP tiling.
1855 : Only valid for BLASTN, TBLASTX, BLASTX-query, TBLASTN-hit.
1856 Example : $qstrand = $sbjct->strand('query');
1857 : $sstrand = $sbjct->strand('hit');
1858 : ($qstrand, $sstrand) = $sbjct->strand();
1859 Returns : scalar context: integer '1', '-1', or '0'
1860 : array context without args: list of two strings (queryStrand, sbjctStrand)
1861 : Array context can be "induced" by providing an argument of 'list' or 'array'.
1862 Argument : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default = 'query')
1863 ('sbjct' is synonymous with 'hit')
1864 Throws : n/a
1865 Comments : This method requires that all HSPs be tiled. If they have not
1866 : already been tiled, they will be tiled first automatically..
1867 : If you don't want the tiled data, iterate through each HSP
1868 : calling strand() on each (use hsps() to get all HSPs).
1869 :
1870 : Formerly (prior to 10/21/02), this method would return the
1871 : string "-1/1" for hits with HSPs on both strands.
1872 : However, now that strand and frame is properly being accounted
1873 : for during HSP tiling, it makes more sense for strand()
1874 : to return the strand data for the best HSP after tiling.
1875 :
1876 : If you really want to know about hits on opposite strands,
1877 : you should be iterating through the HSPs using methods on the
1878 : HSP objects.
1879 :
1880 : A possible use case where knowing whether a hit has HSPs
1881 : on both strands would be when filtering via SearchIO for hits with
1882 : this property. However, in this case it would be better to have a
1883 : dedicated method such as $hit->hsps_on_both_strands(). Similarly
1884 : for frame. This could be provided if there is interest.
1885
1886 See Also : B<Bio::Search::HSP::BlastHSP::strand>()
1887
1888 =cut
1889
1890 #----------'
1891 sub strand {
1892 #----------
1893 my ($self, $seqType) = @_;
1894
1895 Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
1896
1897 $seqType ||= (wantarray ? 'list' : 'query');
1898 $seqType = 'sbjct' if $seqType eq 'hit';
1899
1900 my ($qstr, $hstr);
1901 # If there is only one HSP, defer this call to the solitary HSP.
1902 if($self->num_hsps == 1) {
1903 return $self->hsp->strand($seqType);
1904 }
1905 elsif( defined $self->{'_qstrand'}) {
1906 # Get the data computed during hsp tiling.
1907 $qstr = $self->{'_qstrand'};
1908 $hstr = $self->{'_sstrand'};
1909 }
1910 else {
1911 # otherwise, iterate through all HSPs collecting strand info.
1912 # This will return the string "-1/1" if there are HSPs on different strands.
1913 # NOTE: This was the pre-10/21/02 procedure which will no longer be used,
1914 # (unless the above elsif{} is commented out).
1915 my (%qstr, %hstr);
1916 foreach my $hsp( $self->hsps ) {
1917 my ( $q, $h ) = $hsp->strand();
1918 $qstr{ $q }++;
1919 $hstr{ $h }++;
1920 }
1921 $qstr = join( '/', sort keys %qstr);
1922 $hstr = join( '/', sort keys %hstr);
1923 }
1924
1925 if($seqType =~ /list|array/i) {
1926 return ($qstr, $hstr);
1927 } elsif( $seqType eq 'query' ) {
1928 return $qstr;
1929 } else {
1930 return $hstr;
1931 }
1932 }
1933
1934
1935 1;
1936 __END__
1937
1938 #####################################################################################
1939 # END OF CLASS #
1940 #####################################################################################
1941
1942
1943 =head1 FOR DEVELOPERS ONLY
1944
1945 =head2 Data Members
1946
1947 Information about the various data members of this module is provided for those
1948 wishing to modify or understand the code. Two things to bear in mind:
1949
1950 =over 4
1951
1952 =item 1 Do NOT rely on these in any code outside of this module.
1953
1954 All data members are prefixed with an underscore to signify that they are private.
1955 Always use accessor methods. If the accessor doesn't exist or is inadequate,
1956 create or modify an accessor (and let me know, too!). (An exception to this might
1957 be for BlastHSP.pm which is more tightly coupled to BlastHit.pm and
1958 may access BlastHit data members directly for efficiency purposes, but probably
1959 should not).
1960
1961 =item 2 This documentation may be incomplete and out of date.
1962
1963 It is easy for these data member descriptions to become obsolete as
1964 this module is still evolving. Always double check this info and search
1965 for members not described here.
1966
1967 =back
1968
1969 An instance of Bio::Search::Hit::BlastHit.pm is a blessed reference to a hash containing
1970 all or some of the following fields:
1971
1972 FIELD VALUE
1973 --------------------------------------------------------------
1974 _hsps : Array ref for a list of Bio::Search::HSP::BlastHSP.pm objects.
1975 :
1976 _db : Database identifier from the summary line.
1977 :
1978 _desc : Description data for the hit from the summary line.
1979 :
1980 _length : Total length of the hit sequence.
1981 :
1982 _score : BLAST score.
1983 :
1984 _bits : BLAST score (in bits). Matrix-independent.
1985 :
1986 _p : BLAST P value. Obtained from summary section. (Blast1/WU-Blast only)
1987 :
1988 _expect : BLAST Expect value. Obtained from summary section.
1989 :
1990 _n : BLAST N value (number of HSPs) (Blast1/WU-Blast2 only)
1991 :
1992 _frame : Reading frame for TBLASTN and TBLASTX analyses.
1993 :
1994 _totalIdentical: Total number of identical aligned monomers.
1995 :
1996 _totalConserved: Total number of conserved aligned monomers (a.k.a. "positives").
1997 :
1998 _overlap : Maximum number of overlapping residues between adjacent HSPs
1999 : before considering the alignment to be ambiguous.
2000 :
2001 _ambiguous_aln : Boolean. True if the alignment of all HSPs is ambiguous.
2002 :
2003 _length_aln_query : Length of the aligned region of the query sequence.
2004 :
2005 _length_aln_sbjct : Length of the aligned region of the sbjct sequence.
2006
2007
2008 =cut
2009
2010 1;