diff variant_effect_predictor/Bio/Search/Hit/BlastHit.pm @ 0:21066c0abaf5 draft

Uploaded
author willmclaren
date Fri, 03 Aug 2012 10:04:48 -0400
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/variant_effect_predictor/Bio/Search/Hit/BlastHit.pm	Fri Aug 03 10:04:48 2012 -0400
@@ -0,0 +1,2010 @@
+#-----------------------------------------------------------------
+# $Id: BlastHit.pm,v 1.13 2002/10/22 09:36:19 sac Exp $
+#
+# BioPerl module Bio::Search::Hit::BlastHit
+#
+# (This module was originally called Bio::Tools::Blast::Sbjct)
+#
+# Cared for by Steve Chervitz <sac@bioperl.org>
+#
+# You may distribute this module under the same terms as perl itself
+#-----------------------------------------------------------------
+
+## POD Documentation:
+
+=head1 NAME
+
+Bio::Search::Hit::BlastHit - Bioperl BLAST Hit object
+
+=head1 SYNOPSIS
+
+The construction of BlastHit objects is performed by
+Bio::SearchIO::blast::BlastHitFactory in a process that is
+orchestrated by the Blast parser (B<Bio::SearchIO::blast::blast>).
+The resulting BlastHits are then accessed via
+B<Bio::Search::Result::BlastResult>). Therefore, you do not need to
+use B<Bio::Search::Hit::BlastHit>) directly. If you need to construct
+BlastHits directly, see the new() function for details.
+
+For B<Bio::SearchIO> BLAST parsing usage examples, see the
+B<examples/search-blast> directory of the Bioperl distribution.
+
+
+=head1 DESCRIPTION
+
+The Bio::Search::Hit::BlastHit.pm module encapsulates data and methods
+for manipulating "hits" from a BLAST report. A BLAST hit is a
+collection of HSPs along with other metadata such as sequence name
+and score information. Hit objects are accessed via
+B<Bio::Search::Result::BlastResult> objects after parsing a BLAST report using
+the B<Bio::SearchIO> system.
+
+In Blast lingo, the "sbjct" sequences are all the sequences 
+in a target database which were compared against a "query" sequence.
+The terms "sbjct" and "hit" will be used interchangeably in this module.
+All methods that take 'sbjct' as an argument also support 'hit' as a
+synonym.
+
+This module supports BLAST versions 1.x and 2.x, gapped and ungapped,
+and PSI-BLAST.
+
+
+=head2 HSP Tiling and Ambiguous Alignments
+
+If a Blast hit has more than one HSP, the Bio::Search::Hit::BlastHit.pm
+object has the ability to merge overlapping HSPs into contiguous
+blocks. This permits the BlastHit object to sum data across all HSPs
+without counting data in the overlapping regions multiple times, which
+would happen if data from each overlapping HSP are simply summed.  HSP
+tiling is performed automatically when methods of the BlastHit object
+that rely on tiled data are invoked. These include
+L<frac_identical()|frac_identical>, L<frac_conserved()|frac_conserved>, L<gaps()|gaps>,
+L<frac_aligned_query()|frac_aligned_query>, L<frac_aligned_hit()|frac_aligned_hit>,
+L<num_unaligned_query()|num_unaligned_query>, L<num_unaligned_hit()|num_unaligned_hit>.
+
+It also permits the assessment of an "ambiguous alignment" if the
+query (or sbjct) sequences from different HSPs overlap 
+(see L<ambiguous_aln()|ambiguous_aln>). The existence
+of an overlap could indicate a biologically interesting region in the
+sequence, such as a repeated domain.  The BlastHit object uses the
+C<-OVERLAP> parameter to determine when two sequences overlap; if this is
+set to 2 -- the default -- then any two sbjct or query HSP sequences
+must overlap by more than two residues to get merged into the same
+contig and counted as an overlap. See the L<BUGS | BUGS> section below for
+"issues" with HSP tiling.
+
+
+The results of the HSP tiling is reported with the following ambiguity codes:
+
+   'q' = Query sequence contains multiple sub-sequences matching
+         a single region in the sbjct sequence. 
+
+   's' = Subject (BlastHit) sequence contains multiple sub-sequences matching
+         a single region in the query sequence. 
+
+   'qs' = Both query and sbjct sequences contain more than one
+          sub-sequence with similarity to the other sequence.
+
+
+For addition information about ambiguous BLAST alignments, see
+L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils> and 
+
+ http://www-genome.stanford.edu/Sacch3D/help/ambig_aln.html
+
+=head1 DEPENDENCIES
+
+Bio::Search::Hit::BlastHit.pm is a concrete class that inherits from
+B<Bio::Root::Root> and B<Bio::Search::Hit::HitI>.  and relies on
+B<Bio::Search::HSP::BlastHSP>.
+
+
+=head1 BUGS
+
+One consequence of the HSP tiling is that methods that rely on HSP
+tiling such as L<frac_identical()|frac_identical>, L<frac_conserved()|frac_conserved>, L<gaps()|gaps>
+etc. may report misleading numbers when C<-OVERLAP> is set to a large
+number.  For example, say we have two HSPs and the query sequence tile
+as follows:
+
+            1      8             22      30        40             60 
+ Full seq:  ------------------------------------------------------------
+                    *  ** *   **
+ HSP1:             ---------------                    (6 identical matches)
+                              **   **  **
+ HSP2:                        -------------           (6 identical matches)
+
+
+If C<-OVERLAP> is set to some number over 4, HSP1 and HSP2 will not be
+tiled into a single contig and their numbers of identical matches will
+be added, giving a total of 12, not 10 if they had be combined into
+one contig. This can lead to number greater than 1.0 for methods
+L<frac_identical()|frac_identical> and L<frac_conserved()|frac_conserved>. This is less of an issue
+with gapped Blast since it tends to combine HSPs that would be listed
+separately without gapping.  (Fractions E<gt>1.0 can be viewed as a
+signal for an interesting alignment that warrants further inspection,
+thus turning this bug into a feature :-).
+
+Using large values for C<-OVERLAP> can lead to incorrect numbers
+reported by methods that rely on HSP tiling but can be useful if you
+care more about detecting ambiguous alignments.  Setting C<-OVERLAP>
+to zero will lead to the most accurate numbers for the
+tiling-dependent methods but will be useless for detecting overlapping
+HSPs since all HSPs will appear to overlap.
+
+
+=head1 SEE ALSO
+
+ Bio::Search::HSP::BlastHSP.pm         - Blast HSP object.
+ Bio::Search::Result::BlastResult.pm   - Blast Result object.
+ Bio::Search::Hit::HitI.pm             - Interface implemented by BlastHit.pm
+ Bio::Root::Root.pm                    - Base class for BlastHit.pm
+
+Links:
+
+ http://bio.perl.org/Core/POD/Search/Hit/Blast/BlastHSP.pm.html
+
+ http://bio.perl.org/Projects/modules.html  - Online module documentation
+ http://bio.perl.org/Projects/Blast/        - Bioperl Blast Project     
+ http://bio.perl.org/                       - Bioperl Project Homepage
+
+
+=head1 FEEDBACK
+
+=head2 Mailing Lists 
+
+User feedback is an integral part of the evolution of this and other
+Bioperl modules.  Send your comments and suggestions preferably to one
+of the Bioperl mailing lists.  Your participation is much appreciated.
+
+    bioperl-l@bioperl.org              - General discussion
+    http://bio.perl.org/MailList.html  - About the mailing lists
+
+=head2 Reporting Bugs
+
+Report bugs to the Bioperl bug tracking system to help us keep track
+the bugs and their resolution. Bug reports can be submitted via email
+or the web:
+
+    bioperl-bugs@bio.perl.org                   
+    http://bio.perl.org/bioperl-bugs/           
+
+=head1 AUTHOR 
+
+Steve Chervitz E<lt>sac@bioperl.orgE<gt>
+
+See L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.
+
+=head1 ACKNOWLEDGEMENTS
+
+This software was originally developed in the Department of Genetics
+at Stanford University. I would also like to acknowledge my
+colleagues at Affymetrix for useful feedback.
+
+=head1 COPYRIGHT
+
+Copyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.
+
+=head1 DISCLAIMER
+
+This software is provided "as is" without warranty of any kind.
+
+=head1 APPENDIX
+
+The rest of the documentation details each of the object methods.
+Internal methods are usually preceded with a _
+
+=cut
+
+
+# Let the code begin...
+
+package Bio::Search::Hit::BlastHit;
+
+use strict;
+use Bio::Search::Hit::HitI;
+use Bio::Root::Root;
+require Bio::Search::BlastUtils;
+use vars qw( @ISA %SUMMARY_OFFSET $Revision);
+
+use overload 
+    '""' => \&to_string;
+
+@ISA = qw( Bio::Root::Root Bio::Search::Hit::HitI );
+
+$Revision = '$Id: BlastHit.pm,v 1.13 2002/10/22 09:36:19 sac Exp $';  #'
+
+
+=head2 new
+
+ Usage     : $hit = Bio::Search::Hit::BlastHit->new( %named_params );
+           : Bio::Search::Hit::BlastHit.pm objects are constructed 
+           : automatically by Bio::SearchIO::BlastHitFactory.pm, 
+           : so there is no need for direct instantiation.
+ Purpose   : Constructs a new BlastHit object and Initializes key variables 
+           : for the hit.
+ Returns   : A Bio::Search::Hit::BlastHit object
+ Argument  : Named Parameters:
+           : Parameter keys are case-insensitive.
+           :     -RAW_DATA   => array reference holding raw BLAST report data 
+           :                    for a single hit. This includes all lines 
+           :                    within the HSP alignment listing section of a 
+           :                    traditional BLAST or PSI-BLAST (non-XML) report,
+           :                    starting at (or just after) the leading '>'.
+	   :	 -HOLD_RAW_DATA => boolean, should -RAW_DATA be saved within the object.
+	   :	 -QUERY_LEN  => Length of the query sequence
+	   :	 -ITERATION  => integer (PSI-BLAST iteration number in which hit was found)
+	   :	 -OVERLAP    => integer (maximum overlap between adjacent
+           :                    HSPs when tiling)
+	   :	 -PROGRAM    => string (type of Blast: BLASTP, BLASTN, etc)
+	   :	 -SIGNIF     => significance
+	   :	 -IS_PVAL    => boolean, true if -SIGNIF contains a P-value
+	   :	 -SCORE      => raw BLAST score
+	   :	 -FOUND_AGAIN   => boolean, true if this was a hit from the
+           :                       section of a PSI-BLAST with iteration > 1 
+           :                       containing sequences that were also found 
+           :                       in iteration 1.
+ Comments  : This object accepts raw Blast report data not because it 
+           : is required for parsing, but in order to retrieve it
+           : (only available if -HOLD_RAW_DATA is set to true).
+
+See Also   : L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>, L<Bio::Root::Root::new()|Bio::Root::Root>
+
+=cut
+
+#-------------------
+sub new {
+#-------------------
+    my ($class, @args ) = @_;
+    my $self = $class->SUPER::new( @args );
+
+    my ($raw_data, $signif, $is_pval, $hold_raw);
+
+    ($self->{'_blast_program'}, $self->{'_query_length'}, $raw_data, $hold_raw,
+     $self->{'_overlap'}, $self->{'_iteration'}, $signif, $is_pval, 
+     $self->{'_score'}, $self->{'_found_again'} ) = 
+       $self->_rearrange( [qw(PROGRAM
+			      QUERY_LEN
+			      RAW_DATA
+			      HOLD_RAW_DATA
+			      OVERLAP
+			      ITERATION
+			      SIGNIF
+			      IS_PVAL
+			      SCORE
+			      FOUND_AGAIN )], @args );
+
+    # TODO: Handle this in parser. Just pass in name parameter.
+    $self->_set_id( $raw_data->[0] );
+
+    if($is_pval) {
+        $self->{'_p'} = $signif;
+    } else {
+        $self->{'_expect'} = $signif;
+    }
+
+    if( $hold_raw ) {
+        $self->{'_hit_data'} = $raw_data;
+    }
+
+    return $self;
+}
+
+sub DESTROY {
+    my $self=shift; 
+    #print STDERR "-->DESTROYING $self\n";
+}
+
+
+#=================================================
+# Begin Bio::Search::Hit::HitI implementation
+#=================================================
+
+=head2 algorithm
+
+ Title   : algorithm
+ Usage   : $alg = $hit->algorithm();
+ Function: Gets the algorithm specification that was used to obtain the hit
+           For BLAST, the algorithm denotes what type of sequence was aligned 
+           against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated 
+           dna-prt, TBLASTN prt-translated dna, TBLASTX translated 
+           dna-translated dna).
+ Returns : a scalar string 
+ Args    : none
+
+=cut
+
+#----------------
+sub algorithm {
+#----------------
+    my ($self,@args) = @_;
+    return $self->{'_blast_program'};
+}
+
+=head2 name
+
+ Usage     : $hit->name([string]);
+ Purpose   : Set/Get a string to identify the hit.
+ Example   : $name = $hit->name;
+           : $hit->name('M81707');
+ Returns   : String consisting of the hit's name or undef if not set.
+ Comments  : The name is parsed out of the "Query=" line as the first chunk of 
+             non-whitespace text. If you want the rest of the line, use
+             $hit->description().
+
+See Also: L<accession()|accession>
+
+=cut
+
+#'
+
+#----------------
+sub name {
+#----------------
+    my $self = shift;
+    if (@_) { 
+        my $name = shift;
+        $name =~ s/^\s+|(\s+|,)$//g;
+        $self->{'_name'} = $name;
+    }
+    return $self->{'_name'};
+}
+
+=head2 description
+
+ Usage     : $hit_object->description( [integer] );
+ Purpose   : Set/Get a description string for the hit.
+             This is parsed out of the "Query=" line as everything after 
+             the first chunk of non-whitespace text. Use $hit->name()
+             to get the first chunk (the ID of the sequence).
+ Example   : $description = $hit->description;
+           : $desc_60char = $hit->description(60);
+ Argument  : Integer (optional) indicating the desired length of the
+           : description string to be returned.
+ Returns   : String consisting of the hit's description or undef if not set.
+
+=cut
+
+#'
+
+#----------------
+sub description {
+#----------------
+    my( $self, $len ) = @_; 
+    $len = (defined $len) ? $len : (CORE::length $self->{'_description'});
+    return substr( $self->{'_description'}, 0 ,$len ); 
+}
+
+=head2 accession
+
+ Title   : accession
+ Usage   : $acc = $hit->accession();
+ Function: Retrieve the accession (if available) for the hit
+ Returns : a scalar string (empty string if not set)
+ Args    : none
+ Comments: Accession numbers are extracted based on the assumption that they
+           are delimited by | characters (NCBI-style). If this is not the case,
+           use the name() method and parse it as necessary.
+
+See Also: L<name()|name>
+
+=cut
+
+#--------------------
+sub accession {
+#--------------------
+    my $self = shift;
+    if(@_) { $self->{'_accession'} = shift; }
+    $self->{'_accession'} || '';
+}
+
+=head2 raw_score
+
+ Usage     : $hit_object->raw_score();
+ Purpose   : Gets the BLAST score of the best HSP for the current Blast hit.
+ Example   : $score = $hit_object->raw_score();
+ Returns   : Integer
+ Argument  : n/a
+ Throws    : n/a
+
+See Also   : L<bits()|bits>
+
+=cut
+
+#----------
+sub raw_score { 
+#----------
+    my $self = shift;  
+
+    # The check for $self->{'_score'} is a remnant from the 'query' mode days
+    # in which the sbjct object would collect data from the description line only.
+
+    my ($score);
+    if(not defined($self->{'_score'})) {
+	$score = $self->hsp->score;
+    } else {
+	$score = $self->{'_score'}; 
+    } 
+    return $score;
+}
+
+
+=head2 length
+
+ Usage     : $hit_object->length();
+ Purpose   : Get the total length of the hit sequence.
+ Example   : $len = $hit_object->length();
+ Returns   : Integer 
+ Argument  : n/a
+ Throws    : n/a
+ Comments  : Developer note: when using the built-in length function within
+           : this module, call it as CORE::length().
+
+See Also   : L<logical_length()|logical_length>,  L<length_aln()|length_aln>
+
+=cut
+
+#-----------
+sub length {
+#-----------
+    my $self = shift;
+    return $self->{'_length'}; 
+}
+
+=head2 significance
+
+Equivalent to L<signif()|signif>
+
+=cut
+
+#----------------
+sub significance { shift->signif( @_ ); }
+#----------------
+
+
+=head2 next_hsp
+
+ Title    : next_hsp
+ Usage    : $hsp = $obj->next_hsp();
+ Function : returns the next available High Scoring Pair object
+ Example  : 
+ Returns  : Bio::Search::HSP::BlastHSP or undef if finished
+ Args     : none
+
+=cut
+
+#----------------
+sub next_hsp {
+#----------------
+    my $self = shift;
+
+    unless($self->{'_hsp_queue_started'}) {
+        $self->{'_hsp_queue'} = [$self->hsps()];
+        $self->{'_hsp_queue_started'} = 1;
+    }
+    pop @{$self->{'_hsp_queue'}};
+}
+
+#=================================================
+# End Bio::Search::Hit::HitI implementation
+#=================================================
+
+
+# Providing a more explicit method for getting name of hit
+# (corresponds with column name in HitTableWriter)
+#----------------
+sub hit_name {
+#----------------
+    my $self = shift;
+    $self->name( @_ );
+}
+
+# Older method Delegates to description()
+#----------------
+sub desc { 
+#----------------
+    my $self = shift;
+    return $self->description( @_ );
+}
+
+# Providing a more explicit method for getting description of hit
+# (corresponds with column name in HitTableWriter)
+#----------------
+sub hit_description { 
+#----------------
+    my $self = shift;
+    return $self->description( @_ );
+}
+
+=head2 score
+
+Equivalent to L<raw_score()|raw_score>
+
+=cut
+
+#----------------
+sub score { shift->raw_score( @_ ); }
+#----------------
+
+
+=head2 hit_length
+
+Equivalent to L<length()|length>
+
+=cut
+
+# Providing a more explicit method for getting length of hit
+#----------------
+sub hit_length { shift->length( @_ ); }
+#----------------
+
+
+=head2 signif
+
+ Usage     : $hit_object->signif( [format] );
+ Purpose   : Get the P or Expect value for the best HSP of the given BLAST hit.
+           : The value returned is the one which is reported in the description
+           : section of the Blast report. For Blast1 and WU-Blast2, this
+           : is a P-value, for Blast2, it is an Expect value.
+ Example   : $obj->signif()        # returns 1.3e-34
+           : $obj->signif('exp')   # returns -34
+           : $obj->signif('parts') # returns (1.3, -34)
+ Returns   : Float or scientific notation number (the raw P/Expect value, DEFAULT).
+           : Integer if format == 'exp' (the magnitude of the base 10 exponent).
+           : 2-element list (float, int) if format == 'parts' and P/Expect value
+           :                is in scientific notation (see Comments).
+ Argument  : format: string of 'raw' | 'exp' | 'parts'
+           :    'raw' returns value given in report. Default. (1.2e-34)
+           :    'exp' returns exponent value only (34)
+           :    'parts' returns the decimal and exponent as a 
+           :            2-element list (1.2, -34)  (see Comments).
+ Throws    : n/a
+ Comments  : The signif() method provides a way to deal with the fact that
+           : Blast1 and Blast2 formats (and WU- vs. NCBI-BLAST) differ in 
+           : what is reported in the description lines of each hit in the 
+           : Blast report. The signif() method frees any client code from 
+           : having to know if this is a P-value or an Expect value, 
+           : making it easier to write code that can process both 
+           : Blast1 and Blast2 reports. This is not necessarily a good thing, 
+           : since one should always know when one is working with P-values or
+           : Expect values (hence the deprecated status).
+           : Use of expect() is recommended since all hits will have an Expect value.
+           :
+           : Using the 'parts' argument is not recommended since it will not
+           : work as expected if the expect value is not in scientific notation.
+           : That is, floats are not converted into sci notation before
+           : splitting into parts.
+
+See Also   : L<p()|p>, L<expect()|expect>, L<Bio::Search::BlastUtils::get_exponent()|Bio::Search::BlastUtils>
+
+=cut
+
+#-------------
+sub signif {
+#-------------
+# Some duplication of logic for p(), expect() and signif() for the sake of performance.
+    my ($self, $fmt) = @_;
+
+    my $val = defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};
+
+    # $val can be zero.
+    defined($val) or $self->throw("Can't get P- or Expect value: HSPs may not have been set.");
+
+    return $val if not $fmt or $fmt =~ /^raw/i;
+    ## Special formats: exponent-only or as list.
+    return &Bio::Search::BlastUtils::get_exponent($val) if $fmt =~ /^exp/i;
+    return (split (/eE/, $val)) if $fmt =~ /^parts/i;
+
+    ## Default: return the raw P/Expect-value.
+    return $val;
+}
+
+#----------------
+sub raw_hit_data {
+#----------------
+    my $self = shift;
+    my $data = '>';
+    # Need to add blank lines where we've removed them.
+    foreach( @{$self->{'_hit_data'}} ) {
+        if( $_ eq 'end') {
+            $data .= "\n";
+        }
+        else {
+            $data .= /^\s*(Score|Query)/ ? "\n$_" : $_;
+        }
+    }
+    return $data;
+}
+
+
+#=head2 _set_length
+#
+# Usage     : $hit_object->_set_length( "233" );
+# Purpose   : Set the total length of the hit sequence.
+# Example   : $hit_object->_set_length( $len );
+# Returns   : n/a
+# Argument  : Integer (only when setting). Any commas will be stripped out.
+# Throws    : n/a
+#
+#=cut
+
+#-----------
+sub _set_length {
+#-----------
+    my ($self, $len) = @_;
+    $len =~ s/,//g; # get rid of commas
+    $self->{'_length'} = $len;
+}
+
+#=head2 _set_description
+#
+# Usage     : Private method; called automatically during construction
+# Purpose   : Sets the description of the hit sequence.
+#	    : For sequence without descriptions, does not set any description.
+# Argument  : Array containing description (multiple lines).
+# Comments  : Processes the supplied description:
+#                1. Join all lines into one string.
+#                2. Remove sequence id at the beginning of description.
+#                3. Removes junk charactes at begin and end of description.
+#
+#=cut
+
+#--------------
+sub _set_description {
+#--------------
+    my( $self, @desc ) = @_;
+    my( $desc);
+    
+#    print STDERR "BlastHit: RAW DESC:\n@desc\n";
+    
+    $desc = join(" ", @desc);
+    
+    my $name = $self->name;
+
+    if($desc) {
+	$desc =~ s/^\s*\S+\s+//; # remove the sequence ID(s)
+	                         # This won't work if there's no description.
+	$desc =~ s/^\s*$name//;  # ...but this should.
+	$desc =~ s/^[\s!]+//;
+	$desc =~ s/ \d+$//;
+	$desc =~ s/\.+$//;
+	$self->{'_description'} = $desc;
+    }
+
+#    print STDERR "BlastHit: _set_description =  $desc\n";
+}
+
+=head2 to_string
+
+ Title   : to_string
+ Usage   : print $hit->to_string;
+ Function: Returns a string representation for the Blast Hit.
+           Primarily intended for debugging purposes.
+ Example : see usage
+ Returns : A string of the form:
+           [BlastHit] <name> <description>
+           e.g.:
+           [BlastHit] emb|Z46660|SC9725 S.cerevisiae chromosome XIII cosmid
+ Args    : None
+
+=cut
+
+#----------------
+sub to_string {
+#----------------
+    my $self = shift;
+    return "[BlastHit] " . $self->name . " " . $self->description;
+}
+
+
+#=head2 _set_id
+#
+# Usage     : Private method; automatically called by new()
+# Purpose   : Sets the name of the BlastHit sequence from the BLAST summary line.
+#           : The identifier is assumed to be the first
+#           : chunk of non-whitespace characters in the description line
+#           : Does not assume any semantics in the structure of the identifier
+#           : (Formerly, this method attempted to extract database name from
+#           : the seq identifiers, but this was prone to break).
+# Returns   : n/a
+# Argument  : String containing description line of the hit from Blast report
+#           : or first line of an alignment section (with or without the leading '>').
+# Throws    : Warning if cannot locate sequence ID.
+#
+#See Also   : L<new()|new>, L<accession()|accession>
+#
+#=cut
+
+#---------------
+sub _set_id {
+#---------------
+    my( $self, $desc ) = @_;
+
+    # New strategy: Assume only that the ID is the first white space
+    # delimited chunk. Not attempting to extract accession & database name.
+    # Clients will have to interpret it as necessary.
+    if($desc =~ /^>?(\S+)\s*(.*)/) {
+        my ($name, $desc) = ($1, $2);
+        $self->name($name);
+        $self->{'_description'} = $desc;
+	# Note that this description comes from the summary section of the
+	# BLAST report and so may be truncated. The full description will be
+	# set from the alignment section. We're setting description here in case
+	# the alignment section isn't being parsed.
+
+        # Assuming accession is delimited with | symbols (NCBI-style)
+        my @pieces = split(/\|/,$name);
+        my $acc = pop @pieces;
+        $self->accession( $acc );
+    }
+    else {
+        $self->warn("Can't locate sequence identifier in summary line.", "Line = $desc");
+        $desc = 'Unknown sequence ID' if not $desc;
+        $self->name($desc);
+    }
+}
+
+
+=head2 ambiguous_aln
+
+ Usage     : $ambig_code = $hit_object->ambiguous_aln();
+ Purpose   : Sets/Gets ambiguity code data member.
+ Example   : (see usage)
+ Returns   : String = 'q', 's', 'qs', '-'
+           :   'q'  = query sequence contains overlapping sub-sequences 
+           :          while sbjct does not.
+           :   's'  = sbjct sequence contains overlapping sub-sequences 
+           :          while query does not.
+           :   'qs' = query and sbjct sequence contains overlapping sub-sequences
+           :          relative to each other.
+           :   '-'  = query and sbjct sequence do not contains multiple domains 
+           :          relative to each other OR both contain the same distribution
+           :          of similar domains.
+ Argument  : n/a
+ Throws    : n/a
+ Status    : Experimental
+
+See Also   : L<Bio::Search::BlastUtils::tile_hsps>, L<HSP Tiling and Ambiguous Alignments>
+
+=cut
+
+#--------------------
+sub ambiguous_aln { 
+#--------------------
+    my $self = shift;
+    if(@_) { $self->{'_ambiguous_aln'} = shift; }
+    $self->{'_ambiguous_aln'} || '-';
+}
+
+
+
+=head2 overlap
+
+ Usage     : $blast_object->overlap( [integer] );
+ Purpose   : Gets/Sets the allowable amount overlap between different HSP sequences.
+ Example   : $blast_object->overlap(5);
+           : $overlap = $blast_object->overlap;
+ Returns   : Integer.
+ Argument  : integer.
+ Throws    : n/a
+ Status    : Experimental
+ Comments  : Any two HSPs whose sequences overlap by less than or equal
+           : to the overlap() number of resides will be considered separate HSPs
+           : and will not get tiled by Bio::Search::BlastUtils::_adjust_contigs().
+
+See Also   : L<Bio::Search::BlastUtils::_adjust_contigs()|Bio::Search::BlastUtils>, L<BUGS | BUGS>
+
+=cut
+
+#-------------
+sub overlap { 
+#-------------
+    my $self = shift; 
+    if(@_) { $self->{'_overlap'} = shift; }
+    defined $self->{'_overlap'} ? $self->{'_overlap'} : 0;
+}
+
+
+
+
+
+
+=head2 bits
+
+ Usage     : $hit_object->bits();
+ Purpose   : Gets the BLAST bit score of the best HSP for the current Blast hit.
+ Example   : $bits = $hit_object->bits();
+ Returns   : Integer
+ Argument  : n/a
+ Throws    : Exception if bit score is not set.
+ Comments  : For BLAST1, the non-bit score is listed in the summary line.
+
+See Also   : L<score()|score>
+
+=cut
+
+#---------
+sub bits { 
+#---------
+    my $self = shift; 
+
+    # The check for $self->{'_bits'} is a remnant from the 'query' mode days
+    # in which the sbjct object would collect data from the description line only.
+
+    my ($bits);
+    if(not defined($self->{'_bits'})) {
+	$bits = $self->hsp->bits;
+    } else {
+	$bits = $self->{'_bits'}; 
+    } 
+    return $bits;
+}
+
+
+
+=head2 n
+
+ Usage     : $hit_object->n();
+ Purpose   : Gets the N number for the current Blast hit.
+           : This is the number of HSPs in the set which was ascribed
+           : the lowest P-value (listed on the description line).
+           : This number is not the same as the total number of HSPs.
+           : To get the total number of HSPs, use num_hsps().
+ Example   : $n = $hit_object->n();
+ Returns   : Integer
+ Argument  : n/a
+ Throws    : Exception if HSPs have not been set (BLAST2 reports).
+ Comments  : Note that the N parameter is not reported in gapped BLAST2.
+           : Calling n() on such reports will result in a call to num_hsps().
+           : The num_hsps() method will count the actual number of
+           : HSPs in the alignment listing, which may exceed N in
+           : some cases.
+
+See Also   : L<num_hsps()|num_hsps>
+
+=cut
+
+#-----
+sub n { 
+#-----
+    my $self = shift; 
+
+    # The check for $self->{'_n'} is a remnant from the 'query' mode days
+    # in which the sbjct object would collect data from the description line only.
+
+    my ($n);
+    if(not defined($self->{'_n'})) {
+	$n = $self->hsp->n;
+    } else {
+	$n = $self->{'_n'}; 
+    } 
+    $n ||= $self->num_hsps;
+
+    return $n;
+}
+
+
+
+=head2 frame
+
+ Usage     : $hit_object->frame();
+ Purpose   : Gets the reading frame for the best HSP after HSP tiling.
+           : This is only valid for BLASTX and TBLASTN/X reports.
+ Example   : $frame = $hit_object->frame();
+ Returns   : Integer (-2 .. +2)
+ Argument  : n/a
+ Throws    : Exception if HSPs have not been set (BLAST2 reports).
+ Comments  : This method requires that all HSPs be tiled. If they have not
+           : already been tiled, they will be tiled first automatically..
+           : If you don't want the tiled data, iterate through each HSP
+           : calling frame() on each (use hsps() to get all HSPs).
+
+See Also   : L<hsps()|hsps>
+
+=cut
+
+#----------'
+sub frame { 
+#----------
+    my $self = shift; 
+
+    Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
+
+    # The check for $self->{'_frame'} is a remnant from the 'query' mode days
+    # in which the sbjct object would collect data from the description line only.
+
+    my ($frame);
+    if(not defined($self->{'_frame'})) {
+	$frame = $self->hsp->frame;
+    } else {
+	$frame = $self->{'_frame'}; 
+    } 
+    return $frame;
+}
+
+
+
+
+
+=head2 p
+
+ Usage     : $hit_object->p( [format] );
+ Purpose   : Get the P-value for the best HSP of the given BLAST hit.
+           : (Note that P-values are not provided with NCBI Blast2 reports).
+ Example   : $p =  $sbjct->p;
+           : $p =  $sbjct->p('exp');  # get exponent only.
+           : ($num, $exp) =  $sbjct->p('parts');  # split sci notation into parts
+ Returns   : Float or scientific notation number (the raw P-value, DEFAULT).
+           : Integer if format == 'exp' (the magnitude of the base 10 exponent).
+           : 2-element list (float, int) if format == 'parts' and P-value
+           :                is in scientific notation (See Comments).
+ Argument  : format: string of 'raw' | 'exp' | 'parts'
+           :    'raw' returns value given in report. Default. (1.2e-34)
+           :    'exp' returns exponent value only (34)
+           :    'parts' returns the decimal and exponent as a 
+           :            2-element list (1.2, -34) (See Comments).
+ Throws    : Warns if no P-value is defined. Uses expect instead.
+ Comments  : Using the 'parts' argument is not recommended since it will not
+           : work as expected if the P-value is not in scientific notation.
+           : That is, floats are not converted into sci notation before
+           : splitting into parts.
+
+See Also   : L<expect()|expect>, L<signif()|signif>, L<Bio::Search::BlastUtils::get_exponent()|Bio::Search::BlastUtils>
+
+=cut
+
+#--------
+sub p { 
+#--------
+# Some duplication of logic for p(), expect() and signif() for the sake of performance.
+    my ($self, $fmt) = @_;
+
+    my $val = $self->{'_p'};
+
+    # $val can be zero.
+    if(not defined $val) {
+        # P-value not defined, must be a NCBI Blast2 report.
+        # Use expect instead.
+        $self->warn( "P-value not defined. Using expect() instead.");
+        $val = $self->{'_expect'};
+    }
+
+    return $val if not $fmt or $fmt =~ /^raw/i;
+    ## Special formats: exponent-only or as list.
+    return &Bio::Search::BlastUtils::get_exponent($val) if $fmt =~ /^exp/i;
+    return (split (/eE/, $val)) if $fmt =~ /^parts/i;
+
+    ## Default: return the raw P-value.
+    return $val;
+}
+
+
+
+=head2 expect
+
+ Usage     : $hit_object->expect( [format] );
+ Purpose   : Get the Expect value for the best HSP of the given BLAST hit.
+ Example   : $e =  $sbjct->expect;
+           : $e =  $sbjct->expect('exp');  # get exponent only.
+           : ($num, $exp) = $sbjct->expect('parts');  # split sci notation into parts
+ Returns   : Float or scientific notation number (the raw expect value, DEFAULT).
+           : Integer if format == 'exp' (the magnitude of the base 10 exponent).
+           : 2-element list (float, int) if format == 'parts' and Expect 
+           :                is in scientific notation (see Comments).
+ Argument  : format: string of 'raw' | 'exp' | 'parts'
+           :    'raw' returns value given in report. Default. (1.2e-34)
+           :    'exp' returns exponent value only (34)
+           :    'parts' returns the decimal and exponent as a 
+           :            2-element list (1.2, -34)  (see Comments).
+ Throws    : Exception if the Expect value is not defined.
+ Comments  : Using the 'parts' argument is not recommended since it will not
+           : work as expected if the expect value is not in scientific notation.
+           : That is, floats are not converted into sci notation before
+           : splitting into parts.
+
+See Also   : L<p()|p>, L<signif()|signif>, L<Bio::Search::BlastUtils::get_exponent()|Bio::Search::BlastUtils>
+
+=cut
+
+#-----------
+sub expect { 
+#-----------
+# Some duplication of logic for p(), expect() and signif() for the sake of performance.
+    my ($self, $fmt) = @_;
+
+    my $val;
+
+    # For Blast reports that list the P value on the description line,
+    # getting the expect value requires fully parsing the HSP data.
+    # For NCBI blast, there's no problem.
+    if(not defined($self->{'_expect'})) {
+        if( defined $self->{'_hsps'}) {
+            $self->{'_expect'} = $val = $self->hsp->expect;
+        } else {
+            # If _expect is not set and _hsps are not set, 
+            # then this must be a P-value-based report that was
+            # run without setting the HSPs (shallow parsing).
+            $self->throw("Can't get expect value. HSPs have not been set.");
+        }
+    } else {
+        $val = $self->{'_expect'};
+    }
+
+    # $val can be zero.
+    defined($val) or $self->throw("Can't get Expect value.");
+
+    return $val if not $fmt or $fmt =~ /^raw/i;
+    ## Special formats: exponent-only or as list.
+    return &Bio::Search::BlastUtils::get_exponent($val) if $fmt =~ /^exp/i;
+    return (split (/eE/, $val)) if $fmt =~ /^parts/i;
+
+    ## Default: return the raw Expect-value.
+    return $val;
+}
+
+
+=head2 hsps
+
+ Usage     : $hit_object->hsps();
+ Purpose   : Get a list containing all HSP objects.
+           : Get the numbers of HSPs for the current hit.
+ Example   : @hsps = $hit_object->hsps();
+           : $num  = $hit_object->hsps();  # alternatively, use num_hsps()
+ Returns   : Array context : list of Bio::Search::HSP::BlastHSP.pm objects.
+           : Scalar context: integer (number of HSPs).
+           :                 (Equivalent to num_hsps()).
+ Argument  : n/a. Relies on wantarray
+ Throws    : Exception if the HSPs have not been collected.
+
+See Also   : L<hsp()|hsp>, L<num_hsps()|num_hsps>
+
+=cut
+
+#---------
+sub hsps {
+#---------
+    my $self = shift;
+
+    if (not ref $self->{'_hsps'}) {
+	$self->throw("Can't get HSPs: data not collected.");
+    }
+
+    return wantarray 
+        #  returning list containing all HSPs.
+	? @{$self->{'_hsps'}}
+        #  returning number of HSPs.
+        : scalar(@{$self->{'_hsps'}});
+}
+
+
+
+=head2 hsp
+
+ Usage     : $hit_object->hsp( [string] );
+ Purpose   : Get a single BlastHSP.pm object for the present BlastHit.pm object.
+ Example   : $hspObj  = $hit_object->hsp;  # same as 'best'
+           : $hspObj  = $hit_object->hsp('best');
+           : $hspObj  = $hit_object->hsp('worst');
+ Returns   : Object reference for a Bio::Search::HSP::BlastHSP.pm object.
+ Argument  : String (or no argument).
+           :   No argument (default) = highest scoring HSP (same as 'best').
+           :   'best' or 'first' = highest scoring HSP.
+           :   'worst' or 'last' = lowest scoring HSP.
+ Throws    : Exception if the HSPs have not been collected.
+           : Exception if an unrecognized argument is used.
+
+See Also   : L<hsps()|hsps>, L<num_hsps>()
+
+=cut
+
+#----------
+sub hsp {
+#----------
+    my( $self, $option ) = @_;
+    $option ||= 'best';
+    
+    if (not ref $self->{'_hsps'}) {
+	$self->throw("Can't get HSPs: data not collected.");
+    }
+
+    my @hsps = @{$self->{'_hsps'}};
+    
+    return $hsps[0]      if $option =~ /best|first|1/i;
+    return $hsps[$#hsps] if $option =~ /worst|last/i;
+
+    $self->throw("Can't get HSP for: $option\n" .
+		 "Valid arguments: 'best', 'worst'");
+}
+
+
+
+=head2 num_hsps
+
+ Usage     : $hit_object->num_hsps();
+ Purpose   : Get the number of HSPs for the present Blast hit.
+ Example   : $nhsps = $hit_object->num_hsps();
+ Returns   : Integer
+ Argument  : n/a
+ Throws    : Exception if the HSPs have not been collected.
+
+See Also   : L<hsps()|hsps>
+
+=cut
+
+#-------------
+sub num_hsps {
+#-------------
+    my $self = shift;
+    
+    if (not defined $self->{'_hsps'}) {
+	$self->throw("Can't get HSPs: data not collected.");
+    }
+
+    return scalar(@{$self->{'_hsps'}});
+}
+
+
+
+=head2 logical_length
+
+ Usage     : $hit_object->logical_length( [seq_type] );
+           : (mostly intended for internal use).
+ Purpose   : Get the logical length of the hit sequence.
+           : For query sequence of BLASTX and TBLASTX reports and the hit
+           : sequence of TBLASTN and TBLASTX reports, the returned length 
+           : is the length of the would-be amino acid sequence (length/3).
+           : For all other BLAST flavors, this function is the same as length().
+ Example   : $len = $hit_object->logical_length();
+ Returns   : Integer 
+ Argument  : seq_type = 'query' or 'hit' or 'sbjct' (default = 'query')
+             ('sbjct' is synonymous with 'hit')
+ Throws    : n/a
+ Comments  : This is important for functions like frac_aligned_query()
+           : which need to operate in amino acid coordinate space when dealing
+           : with T?BLASTX type reports.
+
+See Also   : L<length()|length>, L<frac_aligned_query()|frac_aligned_query>, L<frac_aligned_hit()|frac_aligned_hit>
+
+=cut
+
+#--------------------
+sub logical_length {
+#--------------------
+    my $self = shift;
+    my $seqType = shift || 'query';
+    $seqType = 'sbjct' if $seqType eq 'hit';
+
+    my $length;
+
+    # For the sbjct, return logical sbjct length
+    if( $seqType eq 'sbjct' ) {
+	$length = $self->{'_logical_length'} || $self->{'_length'};
+    }
+    else {
+        # Otherwise, return logical query length
+        $length = $self->{'_query_length'};
+
+        # Adjust length based on BLAST flavor.
+        if($self->{'_blast_program'} =~ /T?BLASTX/ ) {
+            $length /= 3;
+        }
+    }
+    return $length;
+}
+
+
+=head2 length_aln
+
+ Usage     : $hit_object->length_aln( [seq_type] );
+ Purpose   : Get the total length of the aligned region for query or sbjct seq.
+           : This number will include all HSPs
+ Example   : $len    = $hit_object->length_aln(); # default = query
+           : $lenAln = $hit_object->length_aln('query');
+ Returns   : Integer 
+ Argument  : seq_Type = 'query' or 'hit' or 'sbjct' (Default = 'query')
+             ('sbjct' is synonymous with 'hit')
+ Throws    : Exception if the argument is not recognized.
+ Comments  : This method will report the logical length of the alignment,
+           : meaning that for TBLAST[NX] reports, the length is reported
+           : using amino acid coordinate space (i.e., nucleotides / 3).
+           : 
+           : This method requires that all HSPs be tiled. If they have not
+           : already been tiled, they will be tiled first automatically..
+           : If you don't want the tiled data, iterate through each HSP
+           : calling length() on each (use hsps() to get all HSPs).
+
+See Also   : L<length()|length>, L<frac_aligned_query()|frac_aligned_query>, L<frac_aligned_hit()|frac_aligned_hit>, L<gaps()|gaps>, L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>, L<Bio::Search::HSP::BlastHSP::length()|Bio::Search::HSP::BlastHSP>
+
+=cut
+
+#---------------'
+sub length_aln {
+#---------------
+    my( $self, $seqType ) = @_;
+    
+    $seqType ||= 'query';
+    $seqType = 'sbjct' if $seqType eq 'hit';
+
+    Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
+
+    my $data = $self->{'_length_aln_'.$seqType};
+    
+    ## If we don't have data, figure out what went wrong.
+    if(!$data) {
+	$self->throw("Can't get length aln for sequence type \"$seqType\"" . 
+		     "Valid types are 'query', 'hit', 'sbjct' ('sbjct' = 'hit')");
+    }		
+    $data;
+}    
+
+
+=head2 gaps
+
+ Usage     : $hit_object->gaps( [seq_type] );
+ Purpose   : Get the number of gaps in the aligned query, sbjct, or both sequences.
+           : Data is summed across all HSPs.
+ Example   : $qgaps = $hit_object->gaps('query');
+           : $hgaps = $hit_object->gaps('hit');
+           : $tgaps = $hit_object->gaps();    # default = total (query + hit)
+ Returns   : scalar context: integer
+           : array context without args: two-element list of integers  
+           :    (queryGaps, sbjctGaps)
+           : Array context can be forced by providing an argument of 'list' or 'array'.
+           :
+           : CAUTION: Calling this method within printf or sprintf is arrray context.
+           : So this function may not give you what you expect. For example:
+           :          printf "Total gaps: %d", $hit->gaps();
+           : Actually returns a two-element array, so what gets printed 
+           : is the number of gaps in the query, not the total
+           :
+ Argument  : seq_type: 'query' | 'hit' or 'sbjct' | 'total' | 'list'  (default = 'total')
+             ('sbjct' is synonymous with 'hit')
+ Throws    : n/a
+ Comments  : If you need data for each HSP, use hsps() and then interate
+           : through each HSP object.
+           : This method requires that all HSPs be tiled. If they have not
+           : already been tiled, they will be tiled first automatically..
+           : Not relying on wantarray since that will fail in situations 
+           : such as printf "%d", $hit->gaps() in which you might expect to 
+           : be printing the total gaps, but evaluates to array context.
+
+See Also   : L<length_aln()|length_aln>
+
+=cut
+
+#----------
+sub gaps {
+#----------
+    my( $self, $seqType ) = @_;
+
+    $seqType ||= (wantarray ? 'list' : 'total');
+    $seqType = 'sbjct' if $seqType eq 'hit';
+
+    Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
+
+    $seqType = lc($seqType);
+
+    if($seqType =~ /list|array/i) {
+	return ($self->{'_gaps_query'}, $self->{'_gaps_sbjct'});
+    }
+
+    if($seqType eq 'total') {
+	return ($self->{'_gaps_query'} + $self->{'_gaps_sbjct'}) || 0;
+    } else {
+	return $self->{'_gaps_'.$seqType} || 0;
+    }
+}    
+
+
+
+=head2 matches
+
+ Usage     : $hit_object->matches( [class] );
+ Purpose   : Get the total number of identical or conserved matches 
+           : (or both) across all HSPs.
+           : (Note: 'conservative' matches are indicated as 'positives' 
+	   :         in the Blast report.)
+ Example   : ($id,$cons) = $hit_object->matches(); # no argument
+           : $id = $hit_object->matches('id');
+           : $cons = $hit_object->matches('cons'); 
+ Returns   : Integer or a 2-element array of integers 
+ Argument  : class = 'id' | 'cons' OR none. 
+           : If no argument is provided, both identical and conservative 
+           : numbers are returned in a two element list.
+           : (Other terms can be used to refer to the conservative
+           :  matches, e.g., 'positive'. All that is checked is whether or
+           :  not the supplied string starts with 'id'. If not, the 
+           : conservative matches are returned.)
+ Throws    : Exception if the requested data cannot be obtained.
+ Comments  : If you need data for each HSP, use hsps() and then interate
+           : through the HSP objects.
+           : Does not rely on wantarray to return a list. Only checks for
+           : the presence of an argument (no arg = return list).
+
+See Also   : L<Bio::Search::HSP::BlastHSP::matches()|Bio::Search::HSP::BlastHSP>, L<hsps()|hsps>
+
+=cut
+
+#---------------
+sub matches {
+#---------------
+    my( $self, $arg) = @_;
+    my(@data,$data);
+
+    if(!$arg) {
+	@data = ($self->{'_totalIdentical'}, $self->{'_totalConserved'});
+
+	return @data if @data;
+
+    } else {
+
+	if($arg =~ /^id/i) { 
+	    $data = $self->{'_totalIdentical'};
+	} else {
+	    $data = $self->{'_totalConserved'};
+	}
+	return $data if $data;
+    }
+    
+    ## Something went wrong if we make it to here.
+    $self->throw("Can't get identical or conserved data: no data.");
+}
+
+
+=head2 start
+
+ Usage     : $sbjct->start( [seq_type] );
+ Purpose   : Gets the start coordinate for the query, sbjct, or both sequences
+           : in the BlastHit object. If there is more than one HSP, the lowest start
+           : value of all HSPs is returned.
+ Example   : $qbeg = $sbjct->start('query');
+           : $sbeg = $sbjct->start('hit');
+           : ($qbeg, $sbeg) = $sbjct->start();
+ Returns   : scalar context: integer 
+           : array context without args: list of two integers (queryStart, sbjctStart)
+           : Array context can be "induced" by providing an argument of 'list' or 'array'.
+ Argument  : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default = 'query')
+             ('sbjct' is synonymous with 'hit')
+ Throws    : n/a
+ Comments  : This method requires that all HSPs be tiled. If there is more than one
+           : HSP and they have not already been tiled, they will be tiled first automatically..
+           : Remember that the start and end coordinates of all HSPs are 
+           : normalized so that start < end. Strand information can be
+           : obtained by calling $hit->strand().
+
+See Also   : L<end()|end>, L<range()|range>, L<strand()|strand>, L<HSP Tiling and Ambiguous Alignments>, L<Bio::Search::HSP::BlastHSP::start|Bio::Search::HSP::BlastHSP>
+
+=cut
+
+#----------
+sub start {
+#----------
+    my ($self, $seqType) = @_;
+
+    $seqType ||= (wantarray ? 'list' : 'query');
+    $seqType = 'sbjct' if $seqType eq 'hit';
+
+    # If there is only one HSP, defer this call to the solitary HSP.
+    if($self->num_hsps == 1) {
+	return $self->hsp->start($seqType);
+    } else {
+	Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
+	if($seqType =~ /list|array/i) {
+	    return ($self->{'_queryStart'}, $self->{'_sbjctStart'});
+	} else {
+	    ## Sensitive to member name changes.
+	    $seqType = "_\L$seqType\E";
+	    return $self->{$seqType.'Start'};
+	}
+    }
+}
+
+
+=head2 end
+
+ Usage     : $sbjct->end( [seq_type] );
+ Purpose   : Gets the end coordinate for the query, sbjct, or both sequences
+           : in the BlastHit object. If there is more than one HSP, the largest end
+           : value of all HSPs is returned.
+ Example   : $qend = $sbjct->end('query');
+           : $send = $sbjct->end('hit');
+           : ($qend, $send) = $sbjct->end();
+ Returns   : scalar context: integer
+           : array context without args: list of two integers (queryEnd, sbjctEnd)
+           : Array context can be "induced" by providing an argument of 'list' or 'array'.
+ Argument  : In scalar context: seq_type = 'query' or 'sbjct'
+           :  (case insensitive). If not supplied, 'query' is used.
+ Throws    : n/a
+ Comments  : This method requires that all HSPs be tiled. If there is more than one
+           : HSP and they have not already been tiled, they will be tiled first automatically..
+           : Remember that the start and end coordinates of all HSPs are 
+           : normalized so that start < end. Strand information can be
+           : obtained by calling $hit->strand().
+
+See Also   : L<start()|start>, L<range()|range>, L<strand()|strand>, L<HSP Tiling and Ambiguous Alignments>, L<Bio::Search::HSP::BlastHSP::end|Bio::Search::HSP::BlastHSP>
+
+=cut
+
+#----------
+sub end {
+#----------
+    my ($self, $seqType) = @_;
+
+    $seqType ||= (wantarray ? 'list' : 'query');
+    $seqType = 'sbjct' if $seqType eq 'hit';
+
+    # If there is only one HSP, defer this call to the solitary HSP.
+    if($self->num_hsps == 1) {
+	return $self->hsp->end($seqType);
+    } else {
+	Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
+	if($seqType =~ /list|array/i) {
+	    return ($self->{'_queryStop'}, $self->{'_sbjctStop'});
+	} else {
+	    ## Sensitive to member name changes.
+	    $seqType = "_\L$seqType\E";
+	    return $self->{$seqType.'Stop'};
+	}
+    }
+}
+
+=head2 range
+
+ Usage     : $sbjct->range( [seq_type] );
+ Purpose   : Gets the (start, end) coordinates for the query or sbjct sequence
+           : in the HSP alignment.
+ Example   : ($qbeg, $qend) = $sbjct->range('query');
+           : ($sbeg, $send) = $sbjct->range('hit');
+ Returns   : Two-element array of integers 
+ Argument  : seq_type = string, 'query' or 'hit' or 'sbjct'  (default = 'query')
+             ('sbjct' is synonymous with 'hit')
+ Throws    : n/a
+
+See Also   : L<start()|start>, L<end()|end>
+
+=cut
+
+#----------
+sub range {
+#----------
+    my ($self, $seqType) = @_;
+    $seqType ||= 'query';
+    $seqType = 'sbjct' if $seqType eq 'hit';
+    return ($self->start($seqType), $self->end($seqType));
+}
+
+
+=head2 frac_identical
+
+ Usage     : $hit_object->frac_identical( [seq_type] );
+ Purpose   : Get the overall fraction of identical positions across all HSPs.
+           : The number refers to only the aligned regions and does not
+           : account for unaligned regions in between the HSPs, if any.
+ Example   : $frac_iden = $hit_object->frac_identical('query');
+ Returns   : Float (2-decimal precision, e.g., 0.75).
+ Argument  : seq_type: 'query' | 'hit' or 'sbjct' | 'total'
+           : default = 'query' (but see comments below).
+           : ('sbjct' is synonymous with 'hit')
+ Throws    : n/a
+ Comments  : Different versions of Blast report different values for the total
+           : length of the alignment. This is the number reported in the
+           : denominators in the stats section:
+           : "Identical = 34/120 Positives = 67/120".
+           : NCBI BLAST uses the total length of the alignment (with gaps)
+           : WU-BLAST uses the length of the query sequence (without gaps).
+           :
+           : Therefore, when called with an argument of 'total',
+           : this method will report different values depending on the
+           : version of BLAST used. Total does NOT take into account HSP
+           : tiling, so it should not be used.
+           :
+           : To get the fraction identical among only the aligned residues,
+           : ignoring the gaps, call this method without an argument or 
+           : with an argument of 'query' or 'hit'.
+           :
+           : If you need data for each HSP, use hsps() and then iterate
+           : through the HSP objects.
+           : This method requires that all HSPs be tiled. If they have not
+           : already been tiled, they will be tiled first automatically.
+
+See Also   : L<frac_conserved()|frac_conserved>, L<frac_aligned_query()|frac_aligned_query>, L<matches()|matches>, L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>
+
+=cut
+
+#------------------
+sub frac_identical {
+#------------------
+    my ($self, $seqType) = @_;
+    $seqType ||= 'query';
+    $seqType = 'sbjct' if $seqType eq 'hit';
+
+    ## Sensitive to member name format.
+    $seqType = lc($seqType);
+
+    Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
+
+    sprintf( "%.2f", $self->{'_totalIdentical'}/$self->{'_length_aln_'.$seqType});
+}
+
+
+
+=head2 frac_conserved
+
+ Usage     : $hit_object->frac_conserved( [seq_type] );
+ Purpose   : Get the overall fraction of conserved positions across all HSPs.
+           : The number refers to only the aligned regions and does not
+           : account for unaligned regions in between the HSPs, if any.
+ Example   : $frac_cons = $hit_object->frac_conserved('hit');
+ Returns   : Float (2-decimal precision, e.g., 0.75).
+ Argument  : seq_type: 'query' | 'hit' or 'sbjct' | 'total'
+           : default = 'query' (but see comments below).
+           : ('sbjct' is synonymous with 'hit')
+ Throws    : n/a
+ Comments  : Different versions of Blast report different values for the total
+           : length of the alignment. This is the number reported in the
+           : denominators in the stats section:
+           : "Positives = 34/120 Positives = 67/120".
+           : NCBI BLAST uses the total length of the alignment (with gaps)
+           : WU-BLAST uses the length of the query sequence (without gaps).
+           :
+           : Therefore, when called with an argument of 'total',
+           : this method will report different values depending on the
+           : version of BLAST used. Total does NOT take into account HSP
+           : tiling, so it should not be used.
+           :
+           : To get the fraction conserved among only the aligned residues,
+           : ignoring the gaps, call this method without an argument or 
+           : with an argument of 'query' or 'hit'.
+           :
+           : If you need data for each HSP, use hsps() and then interate
+           : through the HSP objects.
+           : This method requires that all HSPs be tiled. If they have not
+           : already been tiled, they will be tiled first automatically.
+
+See Also   : L<frac_identical()|frac_identical>, L<matches()|matches>, L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>
+
+=cut
+
+#--------------------
+sub frac_conserved {
+#--------------------
+    my ($self, $seqType) = @_;
+    $seqType ||= 'query';
+    $seqType = 'sbjct' if $seqType eq 'hit';
+
+    ## Sensitive to member name format.
+    $seqType = lc($seqType);
+
+    Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
+
+    sprintf( "%.2f", $self->{'_totalConserved'}/$self->{'_length_aln_'.$seqType});
+}
+
+
+
+
+=head2 frac_aligned_query
+
+ Usage     : $hit_object->frac_aligned_query();
+ Purpose   : Get the fraction of the query sequence which has been aligned
+           : across all HSPs (not including intervals between non-overlapping
+           : HSPs).
+ Example   : $frac_alnq = $hit_object->frac_aligned_query();
+ Returns   : Float (2-decimal precision, e.g., 0.75).
+ Argument  : n/a
+ Throws    : n/a
+ Comments  : If you need data for each HSP, use hsps() and then interate
+           : through the HSP objects.
+           : To compute the fraction aligned, the logical length of the query
+           : sequence is used, meaning that for [T]BLASTX reports, the 
+           : full length of the query sequence is converted into amino acids
+           : by dividing by 3. This is necessary because of the way 
+           : the lengths of aligned sequences are computed.
+           : This method requires that all HSPs be tiled. If they have not
+           : already been tiled, they will be tiled first automatically.
+
+See Also   : L<frac_aligned_hit()|frac_aligned_hit>, L<logical_length()|logical_length>, L<length_aln()|length_aln>,  L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>
+
+=cut
+
+#----------------------
+sub frac_aligned_query {
+#----------------------
+    my $self = shift;
+
+    Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
+
+    sprintf( "%.2f", $self->{'_length_aln_query'}/$self->logical_length('query'));
+}
+
+
+
+=head2 frac_aligned_hit
+
+ Usage     : $hit_object->frac_aligned_hit();
+ Purpose   : Get the fraction of the hit (sbjct) sequence which has been aligned
+           : across all HSPs (not including intervals between non-overlapping
+           : HSPs).
+ Example   : $frac_alnq = $hit_object->frac_aligned_hit();
+ Returns   : Float (2-decimal precision, e.g., 0.75).
+ Argument  : n/a
+ Throws    : n/a
+ Comments  : If you need data for each HSP, use hsps() and then interate
+           : through the HSP objects.
+           : To compute the fraction aligned, the logical length of the sbjct
+           : sequence is used, meaning that for TBLAST[NX] reports, the 
+           : full length of the sbjct sequence is converted into amino acids
+           : by dividing by 3. This is necessary because of the way 
+           : the lengths of aligned sequences are computed.
+           : This method requires that all HSPs be tiled. If they have not
+           : already been tiled, they will be tiled first automatically.
+
+See Also   : L<frac_aligned_query()|frac_aligned_query>, L<matches()|matches>, , L<logical_length()|logical_length>, L<length_aln()|length_aln>,  L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>
+
+=cut
+
+#--------------------
+sub frac_aligned_hit {
+#--------------------
+    my $self = shift;
+
+    Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
+
+    sprintf( "%.2f", $self->{'_length_aln_sbjct'}/$self->logical_length('sbjct'));
+}
+
+
+## These methods are being maintained for backward compatibility. 
+
+=head2 frac_aligned_sbjct
+
+Same as L<frac_aligned_hit()|frac_aligned_hit>
+
+=cut
+
+#----------------
+sub frac_aligned_sbjct {  my $self=shift; $self->frac_aligned_hit(@_); }
+#----------------
+
+=head2 num_unaligned_sbjct
+
+Same as L<num_unaligned_hit()|num_unaligned_hit>
+
+=cut
+
+#----------------
+sub num_unaligned_sbjct {  my $self=shift; $self->num_unaligned_hit(@_); }
+#----------------
+
+
+
+=head2 num_unaligned_hit
+
+ Usage     : $hit_object->num_unaligned_hit();
+ Purpose   : Get the number of the unaligned residues in the hit sequence.
+           : Sums across all all HSPs.
+ Example   : $num_unaln = $hit_object->num_unaligned_hit();
+ Returns   : Integer
+ Argument  : n/a
+ Throws    : n/a
+ Comments  : See notes regarding logical lengths in the comments for frac_aligned_hit().
+           : They apply here as well.
+           : If you need data for each HSP, use hsps() and then interate
+           : through the HSP objects.
+           : This method requires that all HSPs be tiled. If they have not
+           : already been tiled, they will be tiled first automatically..
+
+See Also   : L<num_unaligned_query()|num_unaligned_query>,  L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>, L<frac_aligned_hit()|frac_aligned_hit>
+
+=cut
+
+#---------------------
+sub num_unaligned_hit {
+#---------------------
+    my $self = shift;
+
+    Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
+
+    my $num = $self->logical_length('sbjct') - $self->{'_length_aln_sbjct'};
+    ($num < 0 ? 0 : $num );
+}
+
+
+=head2 num_unaligned_query
+
+ Usage     : $hit_object->num_unaligned_query();
+ Purpose   : Get the number of the unaligned residues in the query sequence.
+           : Sums across all all HSPs.
+ Example   : $num_unaln = $hit_object->num_unaligned_query();
+ Returns   : Integer
+ Argument  : n/a
+ Throws    : n/a
+ Comments  : See notes regarding logical lengths in the comments for frac_aligned_query().
+           : They apply here as well.
+           : If you need data for each HSP, use hsps() and then interate
+           : through the HSP objects.
+           : This method requires that all HSPs be tiled. If they have not
+           : already been tiled, they will be tiled first automatically..
+
+See Also   : L<num_unaligned_hit()|num_unaligned_hit>, L<frac_aligned_query()|frac_aligned_query>,  L<Bio::Search::BlastUtils::tile_hsps()|Bio::Search::BlastUtils>
+
+=cut
+
+#-----------------------
+sub num_unaligned_query {
+#-----------------------
+    my $self = shift;
+
+    Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
+
+    my $num = $self->logical_length('query') - $self->{'_length_aln_query'};
+    ($num < 0 ? 0 : $num );
+}
+
+
+
+=head2 seq_inds
+
+ Usage     : $hit->seq_inds( seq_type, class, collapse );
+ Purpose   : Get a list of residue positions (indices) across all HSPs
+           : for identical or conserved residues in the query or sbjct sequence.
+ Example   : @s_ind = $hit->seq_inds('query', 'identical');
+           : @h_ind = $hit->seq_inds('hit', 'conserved');
+           : @h_ind = $hit->seq_inds('hit', 'conserved', 1);
+ Returns   : Array of integers 
+           : May include ranges if collapse is non-zero.
+ Argument  : [0] seq_type  = 'query' or 'hit' or 'sbjct'  (default = 'query')
+           :                 ('sbjct' is synonymous with 'hit')
+           : [1] class     = 'identical' or 'conserved' (default = 'identical')
+           :              (can be shortened to 'id' or 'cons')
+           :              (actually, anything not 'id' will evaluate to 'conserved').
+           : [2] collapse  = boolean, if non-zero, consecutive positions are merged
+           :             using a range notation, e.g., "1 2 3 4 5 7 9 10 11" 
+           :             collapses to "1-5 7 9-11". This is useful for 
+           :             consolidating long lists. Default = no collapse.
+ Throws    : n/a.
+ Comments  : Note that HSPs are not tiled for this. This could be a problem 
+           : for hits containing mutually exclusive HSPs. 
+           : TODO: Consider tiling and then reporting seq_inds for the 
+           : best HSP contig.
+
+See Also   : L<Bio::Search::HSP::BlastHSP::seq_inds()|Bio::Search::HSP::BlastHSP>
+
+=cut
+
+#-------------
+sub seq_inds {
+#-------------
+    my ($self, $seqType, $class, $collapse) = @_;
+
+    $seqType  ||= 'query';
+    $class ||= 'identical';
+    $collapse ||= 0;
+
+    $seqType = 'sbjct' if $seqType eq 'hit';
+
+    my (@inds, $hsp);
+    foreach $hsp ($self->hsps) {
+	# This will merge data for all HSPs together.
+	push @inds, $hsp->seq_inds($seqType, $class);
+    }
+    
+    # Need to remove duplicates and sort the merged positions.
+    if(@inds) {
+	my %tmp = map { $_, 1 } @inds;
+	@inds = sort {$a <=> $b} keys %tmp;
+    }
+
+    $collapse ?  &Bio::Search::BlastUtils::collapse_nums(@inds) : @inds; 
+}
+
+
+=head2 iteration
+
+ Usage     : $sbjct->iteration( );
+ Purpose   : Gets the iteration number in which the Hit was found.
+ Example   : $iteration_num = $sbjct->iteration();
+ Returns   : Integer greater than or equal to 1
+             Non-PSI-BLAST reports will report iteration as 1, but this number
+             is only meaningful for PSI-BLAST reports.
+ Argument  : none
+ Throws    : none
+
+See Also   : L<found_again()|found_again>
+
+=cut
+
+#----------------
+sub iteration { shift->{'_iteration'} }
+#----------------
+
+
+=head2 found_again
+
+ Usage     : $sbjct->found_again;
+ Purpose   : Gets a boolean indicator whether or not the hit has
+             been found in a previous iteration.
+             This is only applicable to PSI-BLAST reports.
+
+ 	     This method indicates if the hit was reported in the 
+ 	     "Sequences used in model and found again" section of the
+ 	     PSI-BLAST report or if it was reported in the
+ 	     "Sequences not found previously or not previously below threshold"
+ 	     section of the PSI-BLAST report. Only for hits in iteration > 1.
+
+ Example   : if( $sbjct->found_again()) { ... };
+ Returns   : Boolean (1 or 0) for PSI-BLAST report iterations greater than 1.
+             Returns undef for PSI-BLAST report iteration 1 and non PSI_BLAST
+             reports.
+ Argument  : none
+ Throws    : none
+
+See Also   : L<found_again()|found_again>
+
+=cut
+
+#----------------
+sub found_again { shift->{'_found_again'} }
+#----------------
+
+
+=head2 strand
+
+ Usage     : $sbjct->strand( [seq_type] );
+ Purpose   : Gets the strand(s) for the query, sbjct, or both sequences
+           : in the best HSP of the BlastHit object after HSP tiling.
+           : Only valid for BLASTN, TBLASTX, BLASTX-query, TBLASTN-hit.
+ Example   : $qstrand = $sbjct->strand('query');
+           : $sstrand = $sbjct->strand('hit');
+           : ($qstrand, $sstrand) = $sbjct->strand();
+ Returns   : scalar context: integer '1', '-1', or '0'
+           : array context without args: list of two strings (queryStrand, sbjctStrand)
+           : Array context can be "induced" by providing an argument of 'list' or 'array'.
+ Argument  : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default = 'query')
+             ('sbjct' is synonymous with 'hit')
+ Throws    : n/a
+ Comments  : This method requires that all HSPs be tiled. If they have not
+           : already been tiled, they will be tiled first automatically..
+           : If you don't want the tiled data, iterate through each HSP
+           : calling strand() on each (use hsps() to get all HSPs).
+           :
+           : Formerly (prior to 10/21/02), this method would return the
+           : string "-1/1" for hits with HSPs on both strands.
+           : However, now that strand and frame is properly being accounted
+           : for during HSP tiling, it makes more sense for strand()
+           : to return the strand data for the best HSP after tiling.
+           :
+           : If you really want to know about hits on opposite strands,
+           : you should be iterating through the HSPs using methods on the
+           : HSP objects.
+           :
+           : A possible use case where knowing whether a hit has HSPs 
+           : on both strands would be when filtering via SearchIO for hits with 
+           : this property. However, in this case it would be better to have a
+           : dedicated method such as $hit->hsps_on_both_strands(). Similarly
+           : for frame. This could be provided if there is interest.
+
+See Also   : B<Bio::Search::HSP::BlastHSP::strand>()
+
+=cut
+
+#----------'
+sub strand {
+#----------
+    my ($self, $seqType) = @_;
+
+    Bio::Search::BlastUtils::tile_hsps($self) if not $self->{'_tile_hsps'};
+
+    $seqType ||= (wantarray ? 'list' : 'query');
+    $seqType = 'sbjct' if $seqType eq 'hit';
+
+    my ($qstr, $hstr);
+    # If there is only one HSP, defer this call to the solitary HSP.
+    if($self->num_hsps == 1) {
+	return $self->hsp->strand($seqType);
+    } 
+    elsif( defined $self->{'_qstrand'}) {
+        # Get the data computed during hsp tiling.
+        $qstr = $self->{'_qstrand'};
+        $hstr = $self->{'_sstrand'};
+    }
+    else {
+	# otherwise, iterate through all HSPs collecting strand info.
+        # This will return the string "-1/1" if there are HSPs on different strands.
+        # NOTE: This was the pre-10/21/02 procedure which will no longer be used,
+        # (unless the above elsif{} is commented out).
+        my (%qstr, %hstr);
+        foreach my $hsp( $self->hsps ) {
+            my ( $q, $h ) = $hsp->strand();
+            $qstr{ $q }++;
+            $hstr{ $h }++;
+        }
+        $qstr = join( '/', sort keys %qstr);
+        $hstr = join( '/', sort keys %hstr);
+    }
+
+    if($seqType =~ /list|array/i) {
+        return ($qstr, $hstr);
+    } elsif( $seqType eq 'query' ) {
+        return $qstr;
+    } else {
+        return $hstr;
+    }
+}
+
+
+1;
+__END__
+
+#####################################################################################
+#                                END OF CLASS                                       #
+#####################################################################################
+
+
+=head1 FOR DEVELOPERS ONLY
+
+=head2 Data Members
+
+Information about the various data members of this module is provided for those 
+wishing to modify or understand the code. Two things to bear in mind: 
+
+=over 4
+
+=item 1 Do NOT rely on these in any code outside of this module. 
+
+All data members are prefixed with an underscore to signify that they are private.
+Always use accessor methods. If the accessor doesn't exist or is inadequate, 
+create or modify an accessor (and let me know, too!). (An exception to this might
+be for BlastHSP.pm which is more tightly coupled to BlastHit.pm and
+may access BlastHit data members directly for efficiency purposes, but probably 
+should not).
+
+=item 2 This documentation may be incomplete and out of date.
+
+It is easy for these data member descriptions to become obsolete as 
+this module is still evolving. Always double check this info and search 
+for members not described here.
+
+=back
+
+An instance of Bio::Search::Hit::BlastHit.pm is a blessed reference to a hash containing
+all or some of the following fields:
+
+ FIELD           VALUE
+ --------------------------------------------------------------
+ _hsps          : Array ref for a list of Bio::Search::HSP::BlastHSP.pm objects.
+		:
+ _db            : Database identifier from the summary line.
+		:
+ _desc          : Description data for the hit from the summary line.
+		:
+ _length        : Total length of the hit sequence. 
+		:
+ _score         : BLAST score.
+		:
+ _bits          : BLAST score (in bits). Matrix-independent.
+		:
+ _p             : BLAST P value. Obtained from summary section. (Blast1/WU-Blast only)
+		:
+ _expect        : BLAST Expect value. Obtained from summary section.
+		:
+ _n             : BLAST N value (number of HSPs) (Blast1/WU-Blast2 only)
+		:
+ _frame         : Reading frame for TBLASTN and TBLASTX analyses.
+		:
+ _totalIdentical: Total number of identical aligned monomers.
+		:
+ _totalConserved: Total number of conserved aligned monomers (a.k.a. "positives").
+		:
+ _overlap       : Maximum number of overlapping residues between adjacent HSPs
+		: before considering the alignment to be ambiguous. 
+		:
+ _ambiguous_aln : Boolean. True if the alignment of all HSPs is ambiguous.
+		:
+ _length_aln_query : Length of the aligned region of the query sequence.
+		   :
+ _length_aln_sbjct : Length of the aligned region of the sbjct sequence.
+
+
+=cut
+
+1;