diff variant_effect_predictor/Bio/Search/HSP/BlastHSP.pm @ 0:21066c0abaf5 draft

Uploaded
author willmclaren
date Fri, 03 Aug 2012 10:04:48 -0400
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/variant_effect_predictor/Bio/Search/HSP/BlastHSP.pm	Fri Aug 03 10:04:48 2012 -0400
@@ -0,0 +1,1735 @@
+#-----------------------------------------------------------------
+# $Id: BlastHSP.pm,v 1.20 2002/12/24 15:45:33 jason Exp $
+#
+# BioPerl module Bio::Search::HSP::BlastHSP
+#
+# (This module was originally called Bio::Tools::Blast::HSP)
+#
+# Cared for by Steve Chervitz <sac@bioperl.org>
+#
+# You may distribute this module under the same terms as perl itself
+#-----------------------------------------------------------------
+
+## POD Documentation:
+
+=head1 NAME
+
+Bio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object
+
+=head1 SYNOPSIS
+
+The construction of BlastHSP objects is performed by
+Bio::Factory::BlastHitFactory in a process that is
+orchestrated by the Blast parser (B<Bio::SearchIO::psiblast>).
+The resulting BlastHSPs are then accessed via
+B<Bio::Search::Hit::BlastHit>). Therefore, you do not need to
+use B<Bio::Search::HSP::BlastHSP>) directly. If you need to construct
+BlastHSPs directly, see the new() function for details.
+
+For B<Bio::SearchIO> BLAST parsing usage examples, see the
+B<examples/search-blast> directory of the Bioperl distribution.
+
+=head1 DESCRIPTION
+
+A Bio::Search::HSP::BlastHSP object provides an interface to data
+obtained in a single alignment section of a Blast report (known as a
+"High-scoring Segment Pair"). This is essentially a pairwise
+alignment with score information.
+
+BlastHSP objects are accessed via B<Bio::Search::Hit::BlastHit>
+objects after parsing a BLAST report using the B<Bio::SearchIO>
+system.
+
+=head2 Start and End coordinates
+
+Sequence endpoints are swapped so that start is always less than
+end. This affects For TBLASTN/X hits on the minus strand. Strand
+information can be recovered using the strand() method. This
+normalization step is standard Bioperl practice. It also facilitates
+use of range information by methods such as match().
+
+=over 1
+
+=item * Supports BLAST versions 1.x and 2.x, gapped and ungapped.
+
+=back
+
+Bio::Search::HSP::BlastHSP.pm has the ability to extract a list of all
+residue indices for identical and conservative matches along both
+query and sbjct sequences. Since this degree of detail is not always
+needed, this behavior does not occur during construction of the BlastHSP
+object.  These data will automatically be collected as necessary as
+the BlastHSP.pm object is used.
+
+=head1 DEPENDENCIES
+
+Bio::Search::HSP::BlastHSP.pm is a concrete class that inherits from
+B<Bio::SeqFeature::SimilarityPair> and B<Bio::Search::HSP::HSPI>.
+B<Bio::Seq> and B<Bio::SimpleAlign> are employed for creating 
+sequence and alignment objects, respectively.
+
+=head2 Relationship to SimpleAlign.pm & Seq.pm
+
+BlastHSP.pm can provide the query or sbjct sequence as a B<Bio::Seq>
+object via the L<seq()|seq> method. The BlastHSP.pm object can also create a
+two-sequence B<Bio::SimpleAlign> alignment object using the the query
+and sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment
+objects is not automatic when constructing the BlastHSP.pm object since
+this level of functionality is not always required and would generate
+a lot of extra overhead when crunching many reports.
+
+
+=head1 FEEDBACK
+
+=head2 Mailing Lists 
+
+User feedback is an integral part of the evolution of this and other
+Bioperl modules.  Send your comments and suggestions preferably to one
+of the Bioperl mailing lists.  Your participation is much appreciated.
+
+    bioperl-l@bioperl.org               - General discussion
+    http://bio.perl.org/MailList.html   - About the mailing lists
+
+=head2 Reporting Bugs
+
+Report bugs to the Bioperl bug tracking system to help us keep track
+the bugs and their resolution. Bug reports can be submitted via email
+or the web:
+
+    bioperl-bugs@bio.perl.org                   
+    http://bugzilla.bioperl.org/     
+
+=head1 AUTHOR 
+
+Steve Chervitz E<lt>sac@bioperl.orgE<gt>
+
+See L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.
+
+=head1 ACKNOWLEDGEMENTS
+
+This software was originally developed in the Department of Genetics
+at Stanford University. I would also like to acknowledge my
+colleagues at Affymetrix for useful feedback.
+
+=head1 SEE ALSO
+
+ Bio::Search::Hit::BlastHit.pm          - Blast hit object.
+ Bio::Search::Result::BlastResult.pm    - Blast Result object.
+ Bio::Seq.pm                            - Biosequence object  
+
+=head2 Links:
+
+ http://bio.perl.org/Core/POD/Tools/Blast/BlastHit.pm.html
+
+ http://bio.perl.org/Projects/modules.html  - Online module documentation
+ http://bio.perl.org/Projects/Blast/        - Bioperl Blast Project     
+ http://bio.perl.org/                       - Bioperl Project Homepage
+
+=head1 COPYRIGHT
+
+Copyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.
+
+=head1 DISCLAIMER
+
+This software is provided "as is" without warranty of any kind.
+
+=cut
+
+
+# END of main POD documentation.
+
+=head1 APPENDIX
+
+The rest of the documentation details each of the object methods.
+Internal methods are usually preceded with a _
+
+=cut
+
+# Let the code begin...
+
+package Bio::Search::HSP::BlastHSP;
+
+use strict;
+use Bio::SeqFeature::SimilarityPair;
+use Bio::SeqFeature::Similarity;
+use Bio::Search::HSP::HSPI; 
+
+use vars qw( @ISA $GAP_SYMBOL $Revision %STRAND_SYMBOL );
+
+use overload 
+    '""' => \&to_string;
+
+$Revision = '$Id: BlastHSP.pm,v 1.20 2002/12/24 15:45:33 jason Exp $';  #'
+
+@ISA = qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);
+
+$GAP_SYMBOL    = '-';          # Need a more general way to handle gap symbols.
+%STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );
+
+
+=head2 new
+
+ Usage     : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );
+           : Bio::Search::HSP::BlastHSP.pm objects are constructed 
+           : automatically by Bio::SearchIO::BlastHitFactory.pm, 
+           : so there is no need for direct instantiation.
+ Purpose   : Constructs a new BlastHSP object and Initializes key variables 
+           : for the HSP.
+ Returns   : A Bio::Search::HSP::BlastHSP object
+ Argument  : Named parameters:
+           : Parameter keys are case-insensitive.
+           :      -RAW_DATA  => array ref containing raw BLAST report data for 
+           :                    for a single HSP. This includes all lines 
+           :                    of the HSP alignment from a traditional BLAST
+                                or PSI-BLAST (non-XML) report,
+           :      -RANK         => integer (1..n).
+           :      -PROGRAM      => string ('TBLASTN', 'BLASTP', etc.).
+           :      -QUERY_NAME   => string, id of query sequence
+           :      -HIT_NAME     => string, id of hit sequence
+           :
+ Comments  : Having the raw data allows this object to do lazy parsing of 
+           : the raw HSP data (i.e., not parsed until needed).
+           :
+           : Note that there is a fair amount of basic parsing that is
+           : currently performed in this module that would be more appropriate
+           : to do within a separate factory object.
+           : This parsing code will likely be relocated and more initialization
+           : parameters will be added to new().
+           :
+See Also   : B<Bio::SeqFeature::SimilarityPair::new()>, B<Bio::SeqFeature::Similarity::new()>
+
+=cut
+
+#----------------
+sub new {
+#----------------
+    my ($class, @args ) = @_;
+
+    my $self = $class->SUPER::new( @args );
+    # Initialize placeholders
+    $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;
+    my ($raw_data, $qname, $hname, $qlen, $hlen);
+
+    ($self->{'_prog'}, $self->{'_rank'}, $raw_data,
+     $qname, $hname) = 
+      $self->_rearrange([qw( PROGRAM
+			     RANK
+			     RAW_DATA
+			     QUERY_NAME
+			     HIT_NAME
+			   )], @args );
+    
+    # _set_data() does a fair amount of parsing. 
+    # This will likely change (see comment above.)
+    $self->_set_data( @{$raw_data} );
+    # Store the aligned query as sequence feature
+    my ($qb, $hb) = ($self->start());
+    my ($qe, $he) = ($self->end());
+    my ($qs, $hs) = ($self->strand());
+    my ($qf,$hf) = ($self->query->frame(),
+		    $self->hit->frame);
+
+    $self->query( Bio::SeqFeature::Similarity->new (-start   =>$qb, 
+						    -end     =>$qe, 
+						    -strand  =>$qs, 
+						    -bits    =>$self->bits,
+						    -score   =>$self->score, 
+						    -frame   =>$qf,
+						    -seq_id  => $qname,
+						    -source  =>$self->{'_prog'} ));
+    
+    $self->hit( Bio::SeqFeature::Similarity->new (-start   =>$hb, 
+						  -end     =>$he, 
+						  -strand  =>$hs, 
+						  -bits    =>$self->bits,
+						  -score   =>$self->score,
+                                                  -frame   =>$hf, 
+						  -seq_id  => $hname,
+						  -source  =>$self->{'_prog'} ));
+
+    # set lengths
+    $self->query->seqlength($qlen); # query
+    $self->hit->seqlength($hlen); # subject
+
+    $self->query->frac_identical($self->frac_identical('query'));
+    $self->hit->frac_identical($self->frac_identical('hit'));
+    return $self;
+}
+
+#sub DESTROY {
+#    my $self = shift; 
+#    #print STDERR "--->DESTROYING $self\n";
+#}
+
+
+# Title   : _id_str; 
+# Purpose : Intended for internal use only to provide a string for use
+#           within exception messages to help users figure out which 
+#           query/hit caused the problem.
+# Returns : Short string with name of query and hit seq 
+sub _id_str {
+    my $self = shift;
+    if( not defined $self->{'_id_str'}) {
+        my $qname = $self->query->seqname;
+        my $hname = $self->hit->seqname;
+        $self->{'_id_str'} = "QUERY=\"$qname\" HIT=\"$hname\"";
+    }
+    return $self->{'_id_str'};
+}
+
+#=================================================
+# Begin Bio::Search::HSP::HSPI implementation
+#=================================================
+
+=head2 algorithm
+
+ Title   : algorithm
+ Usage   : $alg = $hsp->algorithm();
+ Function: Gets the algorithm specification that was used to obtain the hsp
+           For BLAST, the algorithm denotes what type of sequence was aligned 
+           against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated 
+           dna-prt, TBLASTN prt-translated dna, TBLASTX translated 
+           dna-translated dna).
+ Returns : a scalar string 
+ Args    : none
+
+=cut
+
+#----------------
+sub algorithm {
+#----------------
+    my ($self,@args) = @_;
+    return $self->{'_prog'};
+}
+
+
+
+
+=head2 signif()
+
+ Usage     : $hsp_obj->signif()
+ Purpose   : Get the P-value or Expect value for the HSP.
+ Returns   : Float (0.001 or 1.3e-43)
+           : Returns P-value if it is defined, otherwise, Expect value.
+ Argument  : n/a
+ Throws    : n/a
+ Comments  : Provided for consistency with BlastHit::signif()
+           : Support for returning the significance data in different
+           : formats (e.g., exponent only), is not provided for HSP objects.
+           : This is only available for the BlastHit or Blast object.
+
+See Also   : L<p()|p>, L<expect()|expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>
+
+=cut
+
+#-----------
+sub signif { 
+#-----------
+    my $self = shift; 
+    my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};
+    $val; 
+}
+
+
+
+=head2 evalue
+
+ Usage     : $hsp_obj->evalue()
+ Purpose   : Get the Expect value for the HSP.
+ Returns   : Float (0.001 or 1.3e-43)
+ Argument  : n/a
+ Throws    : n/a
+ Comments  : Support for returning the expectation data in different
+           : formats (e.g., exponent only), is not provided for HSP objects.
+           : This is only available for the BlastHit or Blast object.
+
+See Also   : L<p()|p>
+
+=cut
+
+#----------
+sub evalue { shift->{'_expect'} }
+#----------
+
+
+=head2 p
+
+ Usage     : $hsp_obj->p()
+ Purpose   : Get the P-value for the HSP.
+ Returns   : Float (0.001 or 1.3e-43) or undef if not defined.
+ Argument  : n/a
+ Throws    : n/a
+ Comments  : P-value is not defined with NCBI Blast2 reports.
+           : Support for returning the expectation data in different
+           : formats (e.g., exponent only) is not provided for HSP objects.
+           : This is only available for the BlastHit or Blast object.
+
+See Also   : L<expect()|expect>
+
+=cut
+
+#-----
+sub p { my $self = shift; $self->{'_p'}; }
+#-----
+
+# alias
+sub pvalue { shift->p(@_); }
+
+=head2 length
+
+ Usage     : $hsp->length( [seq_type] )
+ Purpose   : Get the length of the aligned portion of the query or sbjct.
+ Example   : $hsp->length('query')
+ Returns   : integer
+ Argument  : seq_type: 'query' | 'hit' or 'sbjct' | 'total'  (default = 'total')
+             ('sbjct' is synonymous with 'hit')
+ Throws    : n/a
+ Comments  : 'total' length is the full length of the alignment
+           : as reported in the denominators in the alignment section: 
+           : "Identical = 34/120 Positives = 67/120".
+
+See Also   : L<gaps()|gaps>
+
+=cut
+
+#-----------
+sub length {
+#-----------
+## Developer note: when using the built-in length function within
+##                 this module, call it as CORE::length().
+    my( $self, $seqType ) = @_;
+    $seqType  ||= 'total';
+    $seqType = 'sbjct' if $seqType eq 'hit';
+
+    $seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};
+
+    ## Sensitive to member name format.
+    $seqType = "_\L$seqType\E";
+    $self->{$seqType.'Length'};
+}
+
+
+
+=head2 gaps
+
+ Usage     : $hsp->gaps( [seq_type] )
+ Purpose   : Get the number of gaps in the query, sbjct, or total alignment.
+           : Also can return query gaps and sbjct gaps as a two-element list
+           : when in array context.
+ Example   : $total_gaps      = $hsp->gaps();
+           : ($qgaps, $sgaps) = $hsp->gaps();
+           : $qgaps           = $hsp->gaps('query');
+ Returns   : scalar context: integer
+           : array context without args: (int, int) = ('queryGaps', 'sbjctGaps')
+ Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'  
+           :  ('sbjct' is synonymous with 'hit')
+           : (default = 'total', scalar context)
+           : Array context can be "induced" by providing an argument of 'list' or 'array'.
+ Throws    : n/a
+
+See Also   : L<length()|length>, L<matches()|matches>
+
+=cut
+
+#---------
+sub gaps {
+#---------
+    my( $self, $seqType ) = @_;
+    
+    $self->_set_seq_data() unless $self->{'_set_seq_data'};
+
+    $seqType  ||= (wantarray ? 'list' : 'total');
+    $seqType = 'sbjct' if $seqType eq 'hit';
+    
+    if($seqType =~ /list|array/i) {
+	return (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));
+    }
+    
+    if($seqType eq 'total') {
+	return ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;
+    } else {
+	## Sensitive to member name format.
+	$seqType = "_\L$seqType\E";
+	return $self->{$seqType.'Gaps'} || 0;
+    }
+}
+
+
+=head2 frac_identical
+
+ Usage     : $hsp_object->frac_identical( [seq_type] );
+ Purpose   : Get the fraction of identical positions within the given HSP.
+ Example   : $frac_iden = $hsp_object->frac_identical('query');
+ Returns   : Float (2-decimal precision, e.g., 0.75).
+ Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'
+           :  ('sbjct' is synonymous with 'hit') 
+           : default = 'total' (but see comments below).
+ Throws    : n/a
+ Comments  : Different versions of Blast report different values for the total
+           : length of the alignment. This is the number reported in the
+           : denominators in the stats section:
+           : "Identical = 34/120 Positives = 67/120".
+           : NCBI-BLAST uses the total length of the alignment (with gaps)
+           : WU-BLAST uses the length of the query sequence (without gaps).
+           : Therefore, when called without an argument or an argument of 'total',
+           : this method will report different values depending on the
+           : version of BLAST used.
+           : 
+           : To get the fraction identical among only the aligned residues,
+           : ignoring the gaps, call this method with an argument of 'query'
+           : or 'sbjct' ('sbjct' is synonymous with 'hit').
+
+See Also   : L<frac_conserved()|frac_conserved>, L<num_identical()|num_identical>, L<matches()|matches>
+
+=cut
+
+#-------------------
+sub frac_identical {
+#-------------------
+# The value is calculated as opposed to storing it from the parsed results.
+# This saves storage and also permits flexibility in determining for which
+# sequence (query or sbjct) the figure is to be calculated.
+
+    my( $self, $seqType ) = @_;
+    $seqType ||= 'total';
+    $seqType = 'sbjct' if $seqType eq 'hit';
+
+    if($seqType ne 'total') {
+      $self->_set_seq_data() unless $self->{'_set_seq_data'};
+    }
+    ## Sensitive to member name format.
+    $seqType = "_\L$seqType\E";
+
+    sprintf( "%.2f", $self->{'_numIdentical'}/$self->{$seqType.'Length'});
+}
+
+
+=head2 frac_conserved
+
+ Usage     : $hsp_object->frac_conserved( [seq_type] );
+ Purpose   : Get the fraction of conserved positions within the given HSP.
+           : (Note: 'conservative' positions are called 'positives' in the
+	   : Blast report.)
+ Example   : $frac_cons = $hsp_object->frac_conserved('query');
+ Returns   : Float (2-decimal precision, e.g., 0.75).
+ Argument  : seq_type: 'query' or 'hit' or 'sbjct' or 'total'
+           :  ('sbjct' is synonymous with 'hit') 
+           : default = 'total' (but see comments below).
+ Throws    : n/a
+ Comments  : Different versions of Blast report different values for the total
+           : length of the alignment. This is the number reported in the
+           : denominators in the stats section:
+           : "Identical = 34/120 Positives = 67/120".
+           : NCBI-BLAST uses the total length of the alignment (with gaps)
+           : WU-BLAST uses the length of the query sequence (without gaps).
+           : Therefore, when called without an argument or an argument of 'total',
+           : this method will report different values depending on the
+           : version of BLAST used.
+           :
+           : To get the fraction conserved among only the aligned residues,
+           : ignoring the gaps, call this method with an argument of 'query'
+           : or 'sbjct'.
+
+See Also   : L<frac_conserved()|frac_conserved>, L<num_conserved()|num_conserved>, L<matches()|matches>
+
+=cut
+
+#--------------------
+sub frac_conserved {
+#--------------------
+# The value is calculated as opposed to storing it from the parsed results.
+# This saves storage and also permits flexibility in determining for which
+# sequence (query or sbjct) the figure is to be calculated.
+ 
+    my( $self, $seqType ) = @_;
+    $seqType ||= 'total';
+    $seqType = 'sbjct' if $seqType eq 'hit';
+
+    if($seqType ne 'total') {
+      $self->_set_seq_data() unless $self->{'_set_seq_data'};
+    }
+
+    ## Sensitive to member name format.
+    $seqType = "_\L$seqType\E";
+
+    sprintf( "%.2f", $self->{'_numConserved'}/$self->{$seqType.'Length'});
+}
+
+=head2 query_string
+
+ Title   : query_string
+ Usage   : my $qseq = $hsp->query_string;
+ Function: Retrieves the query sequence of this HSP as a string
+ Returns : string
+ Args    : none
+
+
+=cut
+
+#----------------
+sub query_string{ shift->seq_str('query'); }
+#----------------
+
+=head2 hit_string
+
+ Title   : hit_string
+ Usage   : my $hseq = $hsp->hit_string;
+ Function: Retrieves the hit sequence of this HSP as a string
+ Returns : string
+ Args    : none
+
+
+=cut
+
+#----------------
+sub hit_string{ shift->seq_str('hit'); }
+#----------------
+
+
+=head2 homology_string
+
+ Title   : homology_string
+ Usage   : my $homo_string = $hsp->homology_string;
+ Function: Retrieves the homology sequence for this HSP as a string.
+         : The homology sequence is the string of symbols in between the 
+         : query and hit sequences in the alignment indicating the degree
+         : of conservation (e.g., identical, similar, not similar).
+ Returns : string
+ Args    : none
+
+=cut
+
+#----------------
+sub homology_string{ shift->seq_str('match'); }
+#----------------
+
+#=================================================
+# End Bio::Search::HSP::HSPI implementation
+#=================================================
+
+# Older method delegating to method defined in HSPI.
+
+=head2 expect
+
+See L<Bio::Search::HSP::HSPI::expect()|Bio::Search::HSP::HSPI>
+
+=cut
+
+#----------
+sub expect { shift->evalue( @_ ); }
+#----------
+
+
+=head2 rank
+
+ Usage     : $hsp->rank( [string] );
+ Purpose   : Get the rank of the HSP within a given Blast hit.
+ Example   : $rank = $hsp->rank;
+ Returns   : Integer (1..n) corresponding to the order in which the HSP
+             appears in the BLAST report.
+
+=cut
+
+#'
+
+#----------
+sub rank { shift->{'_rank'} }
+#----------
+
+# For backward compatibility
+#----------
+sub name { shift->rank }
+#----------
+
+=head2 to_string
+
+ Title   : to_string
+ Usage   : print $hsp->to_string;
+ Function: Returns a string representation for the Blast HSP.
+           Primarily intended for debugging purposes.
+ Example : see usage
+ Returns : A string of the form:
+           [BlastHSP] <rank>
+           e.g.:
+           [BlastHit] 1
+ Args    : None
+
+=cut
+
+#----------
+sub to_string {
+#----------
+    my $self = shift;
+    return "[BlastHSP] " . $self->rank();
+}
+
+
+#=head2 _set_data (Private method)
+#
+# Usage     : called automatically during object construction.
+# Purpose   : Parses the raw HSP section from a flat BLAST report and
+#             sets the query sequence, sbjct sequence, and the "match" data
+#           : which consists of the symbols between the query and sbjct lines
+#           : in the alignment.
+# Argument  : Array (all lines for a single, complete HSP, from a raw, 
+#             flat (i.e., non-XML) BLAST report)
+# Throws    : Propagates any exceptions from the methods called ("See Also")
+#
+#See Also   : L<_set_seq()|_set_seq>, L<_set_score_stats()|_set_score_stats>, L<_set_match_stats()|_set_match_stats>, L<_initialize()|_initialize>
+#
+#=cut
+
+#--------------
+sub _set_data {
+#--------------
+    my $self = shift;
+    my @data = @_;
+    my @queryList  = ();  # 'Query' = SEQUENCE USED TO QUERY THE DATABASE.
+    my @sbjctList  = ();  # 'Sbjct' = HOMOLOGOUS SEQUENCE FOUND IN THE DATABASE.
+    my @matchList  = ();
+    my $matchLine  = 0;   # Alternating boolean: when true, load 'match' data.
+    my @linedat = ();
+    
+    #print STDERR "BlastHSP: set_data()\n";
+
+    my($line, $aln_row_len, $length_diff);
+    $length_diff = 0;
+
+    # Collecting data for all lines in the alignment
+    # and then storing the collections for possible processing later.
+    #
+    # Note that "match" lines may not be properly padded with spaces.
+    # This loop now properly handles such cases:
+    # Query: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVIXXXXX 1200
+    #             PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVI
+    # Sbjct: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVILSLKL 1200
+
+    foreach $line( @data ) {
+	next if $line =~ /^\s*$/;
+
+	if( $line =~ /^ ?Score/ ) {
+	    $self->_set_score_stats( $line );
+	} elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) {
+	    $self->_set_match_stats( $line );
+	} elsif( $line =~ /^ ?Frame = ([\d+-]+)/ ) {
+	  # Version 2.0.8 has Frame information on a separate line.
+	  # Storing frame according to SeqFeature::Generic::frame()
+	  # which does not contain strand info (use strand()).
+	  my $frame = abs($1) - 1;
+	  $self->frame( $frame );
+	} elsif( $line =~ /^(Query:?[\s\d]+)([^\s\d]+)/ ) {
+	    push @queryList, $line;
+	    $self->{'_match_indent'} = CORE::length $1;
+	    $aln_row_len = (CORE::length $1) + (CORE::length $2);
+	    $matchLine = 1;
+	} elsif( $matchLine ) {
+	    # Pad the match line with spaces if necessary.
+	    $length_diff = $aln_row_len - CORE::length $line;
+	    $length_diff and $line .= ' 'x $length_diff;
+	    push @matchList, $line;
+	    $matchLine = 0;
+	} elsif( $line =~ /^Sbjct/ ) {
+	    push @sbjctList, $line;
+	}
+    }
+    # Storing the query and sbjct lists in case they are needed later.
+    # We could make this conditional to save memory.
+    $self->{'_queryList'} = \@queryList; 
+    $self->{'_sbjctList'} = \@sbjctList; 
+
+    # Storing the match list in case it is needed later.
+    $self->{'_matchList'} = \@matchList;
+
+    if(not defined ($self->{'_numIdentical'})) {
+        my $id_str = $self->_id_str;
+        $self->throw( -text  => "Can't parse match statistics. Possibly a new or unrecognized Blast format. ($id_str)");
+    }
+
+    if(!scalar @queryList or !scalar @sbjctList) {
+        my $id_str = $self->_id_str;
+        $self->throw( "Can't find query or sbjct alignment lines. Possibly unrecognized Blast format. ($id_str)");
+    }
+}
+
+
+#=head2 _set_score_stats (Private method)
+#
+# Usage     : called automatically by _set_data()
+# Purpose   : Sets various score statistics obtained from the HSP listing.
+# Argument  : String with any of the following formats:
+#           : blast2:  Score = 30.1 bits (66), Expect = 9.2
+#           : blast2:  Score = 158.2 bits (544), Expect(2) = e-110
+#           : blast1:  Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40
+#           : blast1:  Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99
+# Throws    : Exception if the stats cannot be parsed, probably due to a change
+#           : in the Blast report format.
+#
+#See Also   : L<_set_data()|_set_data>
+#
+#=cut
+
+#--------------------
+sub _set_score_stats {
+#--------------------
+    my ($self, $data) = @_;
+
+    my ($expect, $p);
+
+    if($data =~ /Score = +([\d.e+-]+) bits \(([\d.e+-]+)\), +Expect = +([\d.e+-]+)/) {
+	# blast2 format n = 1
+	$self->bits($1);
+	$self->score($2);
+	$expect            = $3;
+    } elsif($data =~ /Score = +([\d.e+-]+) bits \(([\d.e+-]+)\), +Expect\((\d+)\) = +([\d.e+-]+)/) {
+	# blast2 format n > 1
+	$self->bits($1);
+	$self->score($2);
+	$self->{'_n'}      = $3;
+	$expect            = $4;
+
+    } elsif($data =~ /Score = +([\d.e+-]+) \(([\d.e+-]+) bits\), +Expect = +([\d.e+-]+), P = +([\d.e-]+)/) {
+	# blast1 format, n = 1
+	$self->score($1);
+	$self->bits($2);
+	$expect            = $3;
+	$p                 = $4;
+
+    } elsif($data =~ /Score = +([\d.e+-]+) \(([\d.e+-]+) bits\), +Expect = +([\d.e+-]+), +Sum P\((\d+)\) = +([\d.e-]+)/) {
+	# blast1 format, n > 1
+	$self->score($1);
+	$self->bits($2);
+	$expect            = $3;
+	$self->{'_n'}      = $4;
+	$p                 = $5;
+
+    } else {
+        my $id_str = $self->_id_str;
+	$self->throw(-class => 'Bio::Root::Exception',
+		     -text => "Can't parse score statistics: unrecognized format. ($id_str)", 
+		     -value => $data);
+    }
+    $expect = "1$expect" if $expect =~ /^e/i;    
+    $p      = "1$p"      if defined $p and $p=~ /^e/i; 
+
+    $self->{'_expect'} = $expect;
+    $self->{'_p'}      = $p || undef;    
+    $self->significance( $p || $expect );
+}
+
+
+#=head2 _set_match_stats (Private method)
+#
+# Usage     : Private method; called automatically by _set_data()
+# Purpose   : Sets various matching statistics obtained from the HSP listing.
+# Argument  : blast2: Identities = 23/74 (31%), Positives = 29/74 (39%), Gaps = 17/74 (22%)
+#           : blast2: Identities = 57/98 (58%), Positives = 74/98 (75%)
+#           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%)
+#           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%), Frame = -3
+#           : WU-blast: Identities = 310/553 (56%), Positives = 310/553 (56%), Strand = Minus / Plus
+# Throws    : Exception if the stats cannot be parsed, probably due to a change
+#           : in the Blast report format.
+# Comments  : The "Gaps = " data in the HSP header has a different meaning depending
+#           : on the type of Blast: for BLASTP, this number is the total number of
+#           : gaps in query+sbjct; for TBLASTN, it is the number of gaps in the
+#           : query sequence only. Thus, it is safer to collect the data
+#           : separately by examining the actual sequence strings as is done
+#           : in _set_seq().
+#
+#See Also   : L<_set_data()|_set_data>, L<_set_seq()|_set_seq>
+#
+#=cut
+
+#--------------------
+sub _set_match_stats {
+#--------------------
+    my ($self, $data) = @_;
+
+    if($data =~ m!Identities = (\d+)/(\d+)!) {
+      # blast1 or 2 format
+      $self->{'_numIdentical'} = $1;
+      $self->{'_totalLength'}  = $2;
+    }
+    
+    if($data =~ m!Positives = (\d+)/(\d+)!) {
+      # blast1 or 2 format
+      $self->{'_numConserved'} = $1;
+      $self->{'_totalLength'}  = $2;
+    }
+    
+    if($data =~ m!Frame = ([\d+-]+)!) { 
+      $self->frame($1); 
+    }
+
+    # Strand data is not always present in this line.
+    # _set_seq() will also set strand information.
+    if($data =~ m!Strand = (\w+) / (\w+)!) { 
+	$self->{'_queryStrand'} = $1; 
+	$self->{'_sbjctStrand'} = $2; 
+    }
+
+#    if($data =~ m!Gaps = (\d+)/(\d+)!) {
+#	 $self->{'_totalGaps'} = $1;
+#    } else {
+#	 $self->{'_totalGaps'} = 0;
+#    }
+}
+
+
+
+#=head2 _set_seq_data (Private method)
+#
+# Usage     : called automatically when sequence data is requested.
+# Purpose   : Sets the HSP sequence data for both query and sbjct sequences.
+#           : Includes: start, stop, length, gaps, and raw sequence.
+# Argument  : n/a
+# Throws    : Propagates any exception thrown by _set_match_seq()
+# Comments  : Uses raw data stored by _set_data() during object construction.
+#           : These data are not always needed, so it is conditionally
+#           : executed only upon demand by methods such as gaps(), _set_residues(),
+#           : etc. _set_seq() does the dirty work.
+#
+#See Also   : L<_set_seq()|_set_seq>
+#
+#=cut
+
+#-----------------
+sub _set_seq_data {
+#-----------------
+    my $self = shift;
+
+    $self->_set_seq('query', @{$self->{'_queryList'}});
+    $self->_set_seq('sbjct', @{$self->{'_sbjctList'}});
+
+    # Liberate some memory.
+    @{$self->{'_queryList'}} = @{$self->{'_sbjctList'}} = ();
+    undef $self->{'_queryList'};
+    undef $self->{'_sbjctList'};
+
+    $self->{'_set_seq_data'} = 1;
+}
+
+
+
+#=head2 _set_seq (Private method)
+#
+# Usage     : called automatically by _set_seq_data()
+#           : $hsp_obj->($seq_type, @data);
+# Purpose   : Sets sequence information for both the query and sbjct sequences.
+#           : Directly counts the number of gaps in each sequence (if gapped Blast).
+# Argument  : $seq_type = 'query' or 'sbjct'
+#           : @data = all seq lines with the form:
+#           : Query: 61  SPHNVKDRKEQNGSINNAISPTATANTSGSQQINIDSALRDRSSNVAAQPSLSDASSGSN 120
+# Throws    : Exception if data strings cannot be parsed, probably due to a change
+#           : in the Blast report format.
+# Comments  : Uses first argument to determine which data members to set
+#           : making this method sensitive data member name changes.
+#           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).
+# Warning   : Sequence endpoints are normalized so that start < end. This affects HSPs
+#           : for TBLASTN/X hits on the minus strand. Normalization facilitates use
+#           : of range information by methods such as match().
+#
+#See Also   : L<_set_seq_data()|_set_seq_data>, L<matches()|matches>, L<range()|range>, L<start()|start>, L<end()|end>
+#
+#=cut
+
+#-------------
+sub _set_seq {
+#-------------
+    my $self      = shift;
+    my $seqType   = shift;
+    my @data      = @_;
+    my @ranges    = ();
+    my @sequence  = ();
+    my $numGaps   = 0;
+
+    foreach( @data ) {
+        if( m/(\d+) *([^\d\s]+) *(\d+)/) {
+            push @ranges, ( $1, $3 ) ;
+            push @sequence, $2;
+        #print STDERR "_set_seq found sequence \"$2\"\n";
+	} else {
+	    $self->warn("Bad sequence data: $_");
+	}
+    }
+
+    if( !(scalar(@sequence) and scalar(@ranges))) {
+        my $id_str = $self->_id_str;
+	$self->throw("Can't set sequence: missing data. Possibly unrecognized Blast format. ($id_str)");
+   }
+ 
+    # Sensitive to member name changes.
+    $seqType = "_\L$seqType\E";
+    $self->{$seqType.'Start'} = $ranges[0];
+    $self->{$seqType.'Stop'}  = $ranges[ $#ranges ];
+    $self->{$seqType.'Seq'}   = \@sequence;
+	
+    $self->{$seqType.'Length'} = abs($ranges[ $#ranges ] - $ranges[0]) + 1;
+
+    # Adjust lengths for BLASTX, TBLASTN, TBLASTX sequences
+    # Converting nucl coords to amino acid coords.
+
+    my $prog = $self->algorithm;
+    if($prog eq 'TBLASTN' and $seqType eq '_sbjct') {
+	$self->{$seqType.'Length'} /= 3;
+    } elsif($prog eq 'BLASTX' and $seqType eq '_query') {
+	$self->{$seqType.'Length'} /= 3;
+    } elsif($prog eq 'TBLASTX') {
+	$self->{$seqType.'Length'} /= 3;
+    }
+
+    if( $prog ne 'BLASTP' ) {
+        $self->{$seqType.'Strand'} = 'Plus' if $prog =~ /BLASTN/;
+        $self->{$seqType.'Strand'} = 'Plus' if ($prog =~ /BLASTX/ and $seqType eq '_query');
+        # Normalize sequence endpoints so that start < end.
+        # Reverse complement or 'minus strand' HSPs get flipped here.
+        if($self->{$seqType.'Start'} > $self->{$seqType.'Stop'}) {
+            ($self->{$seqType.'Start'}, $self->{$seqType.'Stop'}) = 
+                ($self->{$seqType.'Stop'}, $self->{$seqType.'Start'});
+            $self->{$seqType.'Strand'} = 'Minus';
+        }
+    }
+
+    ## Count number of gaps in each seq. Only need to do this for gapped Blasts.
+#    if($self->{'_gapped'}) {
+	my $seqstr = join('', @sequence);
+	$seqstr =~ s/\s//g;
+        my $num_gaps = CORE::length($seqstr) - $self->{$seqType.'Length'};
+	$self->{$seqType.'Gaps'} = $num_gaps if $num_gaps > 0;
+#    }
+}
+
+
+#=head2 _set_residues (Private method)
+#
+# Usage     : called automatically when residue data is requested.
+# Purpose   : Sets the residue numbers representing the identical and
+#           : conserved positions. These data are obtained by analyzing the
+#           : symbols between query and sbjct lines of the alignments.
+# Argument  : n/a
+# Throws    : Propagates any exception thrown by _set_seq_data() and _set_match_seq().
+# Comments  : These data are not always needed, so it is conditionally
+#           : executed only upon demand by methods such as seq_inds().
+#           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).
+#
+#See Also   : L<_set_seq_data()|_set_seq_data>, L<_set_match_seq()|_set_match_seq>, seq_inds()
+#
+#=cut
+
+#------------------
+sub _set_residues {
+#------------------
+    my $self      = shift;
+    my @sequence  = ();
+
+    $self->_set_seq_data() unless $self->{'_set_seq_data'};
+
+    # Using hashes to avoid saving duplicate residue numbers.
+    my %identicalList_query = ();
+    my %identicalList_sbjct = ();
+    my %conservedList_query = ();
+    my %conservedList_sbjct = ();
+    
+    my $aref = $self->_set_match_seq() if not ref $self->{'_matchSeq'};
+    $aref  ||= $self->{'_matchSeq'};
+    my $seqString = join('', @$aref );
+
+    my $qseq = join('',@{$self->{'_querySeq'}});
+    my $sseq = join('',@{$self->{'_sbjctSeq'}});
+    my $resCount_query = $self->{'_queryStop'} || 0;
+    my $resCount_sbjct = $self->{'_sbjctStop'} || 0;
+
+    my $prog = $self->algorithm;
+    if($prog !~ /^BLASTP|^BLASTN/) {
+	if($prog eq 'TBLASTN') {
+	    $resCount_sbjct /= 3;
+	} elsif($prog eq 'BLASTX') {
+	    $resCount_query /= 3;
+	} elsif($prog eq 'TBLASTX') {
+	    $resCount_query /= 3;
+	    $resCount_sbjct /= 3;
+	}
+    }
+
+    my ($mchar, $schar, $qchar);
+    while( $mchar = chop($seqString) ) {
+	($qchar, $schar) = (chop($qseq), chop($sseq));
+	if( $mchar eq '+' ) { 
+	    $conservedList_query{ $resCount_query } = 1; 
+	    $conservedList_sbjct{ $resCount_sbjct } = 1; 
+	} elsif( $mchar ne ' ' ) { 
+	    $identicalList_query{ $resCount_query } = 1; 
+	    $identicalList_sbjct{ $resCount_sbjct } = 1;
+	}
+	$resCount_query-- if $qchar ne $GAP_SYMBOL;
+	$resCount_sbjct-- if $schar ne $GAP_SYMBOL;
+    }
+    $self->{'_identicalRes_query'} = \%identicalList_query;
+    $self->{'_conservedRes_query'} = \%conservedList_query;
+    $self->{'_identicalRes_sbjct'} = \%identicalList_sbjct;
+    $self->{'_conservedRes_sbjct'} = \%conservedList_sbjct;
+
+}
+
+
+
+
+#=head2 _set_match_seq (Private method)
+#
+# Usage     : $hsp_obj->_set_match_seq()
+# Purpose   : Set the 'match' sequence for the current HSP (symbols in between
+#           : the query and sbjct lines.)				
+# Returns   : Array reference holding the match sequences lines.
+# Argument  : n/a
+# Throws    : Exception if the _matchList field is not set.
+# Comments  : The match information is not always necessary. This method
+#           : allows it to be conditionally prepared.
+#           : Called by _set_residues>() and seq_str().
+#
+#See Also   : L<_set_residues()|_set_residues>, L<seq_str()|seq_str>
+#
+#=cut
+
+#-------------------
+sub _set_match_seq {
+#-------------------
+    my $self = shift;
+
+    if( ! ref($self->{'_matchList'}) ) {
+        my $id_str = $self->_id_str;
+        $self->throw("Can't set HSP match sequence: No data ($id_str)");
+    }
+    
+    my @data = @{$self->{'_matchList'}};
+
+    my(@sequence);
+    foreach( @data ) {
+	chomp($_);
+	## Remove leading spaces; (note: aln may begin with a space
+	## which is why we can't use s/^ +//).
+	s/^ {$self->{'_match_indent'}}//;   
+	push @sequence, $_;
+    }
+    # Liberate some memory.
+    @{$self->{'_matchList'}} = undef;
+    $self->{'_matchList'} = undef;
+
+    $self->{'_matchSeq'} = \@sequence;
+
+    return $self->{'_matchSeq'};
+}
+
+
+=head2 n
+
+ Usage     : $hsp_obj->n()
+ Purpose   : Get the N value (num HSPs on which P/Expect is based).
+           : This value is not defined with NCBI Blast2 with gapping.
+ Returns   : Integer or null string if not defined.
+ Argument  : n/a
+ Throws    : n/a
+ Comments  : The 'N' value is listed in parenthesis with P/Expect value:
+           : e.g., P(3) = 1.2e-30  ---> (N = 3).
+           : Not defined in NCBI Blast2 with gaps.
+           : This typically is equal to the number of HSPs but not always.
+           : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().
+
+See Also   : L<Bio::SeqFeature::SimilarityPair::score()|Bio::SeqFeature::SimilarityPair>
+
+=cut
+
+#-----
+sub n { my $self = shift; $self->{'_n'} || ''; }
+#-----
+
+
+=head2 matches
+
+ Usage     : $hsp->matches([seq_type], [start], [stop]);
+ Purpose   : Get the total number of identical and conservative matches 
+           : in the query or sbjct sequence for the given HSP. Optionally can
+           : report data within a defined interval along the seq.
+           : (Note: 'conservative' matches are called 'positives' in the
+	   : Blast report.)
+ Example   : ($id,$cons) = $hsp_object->matches('hit');
+           : ($id,$cons) = $hsp_object->matches('query',300,400);
+ Returns   : 2-element array of integers 
+ Argument  : (1) seq_type = 'query' or 'hit' or 'sbjct' (default = query)
+           :  ('sbjct' is synonymous with 'hit') 
+           : (2) start = Starting coordinate (optional)
+           : (3) stop  = Ending coordinate (optional)
+ Throws    : Exception if the supplied coordinates are out of range.
+ Comments  : Relies on seq_str('match') to get the string of alignment symbols
+           : between the query and sbjct lines which are used for determining
+           : the number of identical and conservative matches.
+
+See Also   : L<length()|length>, L<gaps()|gaps>, L<seq_str()|seq_str>, L<Bio::Search::Hit::BlastHit::_adjust_contigs()|Bio::Search::Hit::BlastHit>
+
+=cut
+
+#-----------
+sub matches {
+#-----------
+    my( $self, %param ) = @_;
+    my(@data);
+    my($seqType, $beg, $end) = ($param{-SEQ}, $param{-START}, $param{-STOP});
+    $seqType ||= 'query';
+    $seqType = 'sbjct' if $seqType eq 'hit';
+
+    my($start,$stop);
+
+    if(!defined $beg && !defined $end) {
+	## Get data for the whole alignment.
+	push @data, ($self->{'_numIdentical'}, $self->{'_numConserved'});
+    } else {
+	## Get the substring representing the desired sub-section of aln.
+	$beg ||= 0;
+	$end ||= 0;
+	($start,$stop) = $self->range($seqType);
+	if($beg == 0) { $beg = $start; $end = $beg+$end; }
+	elsif($end == 0) { $end = $stop; $beg = $end-$beg; }
+
+	if($end >= $stop) { $end = $stop; } ##ML changed from if (end >stop)
+	else { $end += 1;}   ##ML moved from commented position below, makes
+                             ##more sense here
+#	if($end > $stop) { $end = $stop; }
+	if($beg < $start) { $beg = $start; }
+#	else { $end += 1;}
+
+#	my $seq = substr($self->seq_str('match'), $beg-$start, ($end-$beg));
+
+	## ML: START fix for substr out of range error ------------------
+	my $seq = "";
+        my $prog = $self->algorithm;
+	if (($prog eq 'TBLASTN') and ($seqType eq 'sbjct'))
+	{
+	    $seq = substr($self->seq_str('match'),
+			  int(($beg-$start)/3), int(($end-$beg+1)/3));
+
+	} elsif (($prog eq 'BLASTX') and ($seqType eq 'query'))
+	{
+	    $seq = substr($self->seq_str('match'),
+			  int(($beg-$start)/3), int(($end-$beg+1)/3));
+	} else {
+	    $seq = substr($self->seq_str('match'), 
+			  $beg-$start, ($end-$beg));
+	}
+	## ML: End of fix for  substr out of range error -----------------
+
+	
+	## ML: debugging code
+	## This is where we get our exception.  Try printing out the values going
+	## into this:
+	##
+#	 print STDERR 
+#	     qq(*------------MY EXCEPTION --------------------\nSeq: ") , 
+#	     $self->seq_str("$seqType"), qq("\n),$self->rank,",(  index:";
+#	 print STDERR  $beg-$start, ", len: ", $end-$beg," ), (HSPRealLen:", 
+#	     CORE::length $self->seq_str("$seqType");
+#	 print STDERR ", HSPCalcLen: ", $stop - $start +1 ," ), 
+#	     ( beg: $beg, end: $end ), ( start: $start, stop: stop )\n";
+	 ## ML: END DEBUGGING CODE----------
+
+	if(!CORE::length $seq) {
+            my $id_str = $self->_id_str;
+	    $self->throw("Undefined $seqType sub-sequence ($beg,$end). Valid range = $start - $stop ($id_str)");
+	}
+	## Get data for a substring.
+#	printf "Collecting HSP subsection data: beg,end = %d,%d; start,stop = %d,%d\n%s<---\n", $beg, $end, $start, $stop, $seq;
+#	printf "Original match seq:\n%s\n",$self->seq_str('match');
+	$seq =~ s/ //g;  # remove space (no info).
+	my $len_cons = CORE::length $seq;
+	$seq =~ s/\+//g;  # remove '+' characters (conservative substitutions)
+	my $len_id = CORE::length $seq;
+	push @data, ($len_id, $len_cons);
+#	printf "  HSP = %s\n  id = %d; cons = %d\n", $self->rank, $len_id, $len_cons; <STDIN>;
+    }
+    @data;
+}
+
+
+=head2 num_identical
+
+ Usage     : $hsp_object->num_identical();
+ Purpose   : Get the number of identical positions within the given HSP.
+ Example   : $num_iden = $hsp_object->num_identical();
+ Returns   : integer
+ Argument  : n/a
+ Throws    : n/a
+
+See Also   : L<num_conserved()|num_conserved>, L<frac_identical()|frac_identical>
+
+=cut
+
+#-------------------
+sub num_identical {
+#-------------------
+    my( $self) = shift;
+
+    $self->{'_numIdentical'};
+}
+
+
+=head2 num_conserved
+
+ Usage     : $hsp_object->num_conserved();
+ Purpose   : Get the number of conserved positions within the given HSP.
+ Example   : $num_iden = $hsp_object->num_conserved();
+ Returns   : integer
+ Argument  : n/a
+ Throws    : n/a
+
+See Also   : L<num_identical()|num_identical>, L<frac_conserved()|frac_conserved>
+
+=cut
+
+#-------------------
+sub num_conserved {
+#-------------------
+    my( $self) = shift;
+
+    $self->{'_numConserved'};
+}
+
+
+
+=head2 range
+
+ Usage     : $hsp->range( [seq_type] );
+ Purpose   : Gets the (start, end) coordinates for the query or sbjct sequence
+           : in the HSP alignment.
+ Example   : ($query_beg, $query_end) = $hsp->range('query');
+           : ($hit_beg, $hit_end) = $hsp->range('hit');
+ Returns   : Two-element array of integers 
+ Argument  : seq_type = string, 'query' or 'hit' or 'sbjct'  (default = 'query')
+           :  ('sbjct' is synonymous with 'hit') 
+ Throws    : n/a
+
+See Also   : L<start()|start>, L<end()|end>
+
+=cut
+
+#----------
+sub range {
+#----------
+    my ($self, $seqType) = @_;
+
+    $self->_set_seq_data() unless $self->{'_set_seq_data'};
+
+    $seqType ||= 'query';
+    $seqType = 'sbjct' if $seqType eq 'hit';
+
+    ## Sensitive to member name changes.
+    $seqType = "_\L$seqType\E";
+
+    return ($self->{$seqType.'Start'},$self->{$seqType.'Stop'});
+}
+
+=head2 start
+
+ Usage     : $hsp->start( [seq_type] );
+ Purpose   : Gets the start coordinate for the query, sbjct, or both sequences
+           : in the HSP alignment.
+           : NOTE: Start will always be less than end. 
+           : To determine strand, use $hsp->strand()
+ Example   : $query_beg = $hsp->start('query');
+           : $hit_beg = $hsp->start('hit');
+           : ($query_beg, $hit_beg) = $hsp->start();
+ Returns   : scalar context: integer
+           : array context without args: list of two integers
+ Argument  : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')
+           :  ('sbjct' is synonymous with 'hit') 
+           : Array context can be "induced" by providing an argument of 'list' or 'array'.
+ Throws    : n/a
+
+See Also   : L<end()|end>, L<range()|range>
+
+=cut
+
+#----------
+sub start {
+#----------
+    my ($self, $seqType) = @_;
+
+    $seqType ||= (wantarray ? 'list' : 'query');
+    $seqType = 'sbjct' if $seqType eq 'hit';
+
+    $self->_set_seq_data() unless $self->{'_set_seq_data'};
+
+    if($seqType =~ /list|array/i) {
+	return ($self->{'_queryStart'}, $self->{'_sbjctStart'});
+    } else {
+	## Sensitive to member name changes.
+	$seqType = "_\L$seqType\E";
+	return $self->{$seqType.'Start'};
+    }
+}
+
+=head2 end
+
+ Usage     : $hsp->end( [seq_type] );
+ Purpose   : Gets the end coordinate for the query, sbjct, or both sequences
+           : in the HSP alignment.
+           : NOTE: Start will always be less than end. 
+           : To determine strand, use $hsp->strand()
+ Example   : $query_end = $hsp->end('query');
+           : $hit_end = $hsp->end('hit');
+           : ($query_end, $hit_end) = $hsp->end();
+ Returns   : scalar context: integer
+           : array context without args: list of two integers
+ Argument  : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')
+           :  ('sbjct' is synonymous with 'hit') 
+           : Array context can be "induced" by providing an argument of 'list' or 'array'.
+ Throws    : n/a
+
+See Also   : L<start()|start>, L<range()|range>, L<strand()|strand>
+
+=cut
+
+#----------
+sub end {
+#----------
+    my ($self, $seqType) = @_;
+
+    $seqType ||= (wantarray ? 'list' : 'query');
+    $seqType = 'sbjct' if $seqType eq 'hit';
+
+    $self->_set_seq_data() unless $self->{'_set_seq_data'};
+
+    if($seqType =~ /list|array/i) {
+	return ($self->{'_queryStop'}, $self->{'_sbjctStop'});
+    } else {
+	## Sensitive to member name changes.
+	$seqType = "_\L$seqType\E";
+	return $self->{$seqType.'Stop'};
+    }
+}
+
+
+
+=head2 strand
+
+ Usage     : $hsp_object->strand( [seq_type] )
+ Purpose   : Get the strand of the query or sbjct sequence.
+ Example   : print $hsp->strand('query');
+           : ($query_strand, $hit_strand) = $hsp->strand();
+ Returns   : -1, 0, or 1
+           : -1 = Minus strand, +1 = Plus strand
+           : Returns 0 if strand is not defined, which occurs
+           : for BLASTP reports, and the query of TBLASTN
+           : as well as the hit if BLASTX reports.
+           : In scalar context without arguments, returns queryStrand value.
+           : In array context without arguments, returns a two-element list 
+           :    of strings (queryStrand, sbjctStrand).
+           : Array context can be "induced" by providing an argument of 'list' or 'array'.
+ Argument  : seq_type: 'query' or 'hit' or 'sbjct' or undef
+           :  ('sbjct' is synonymous with 'hit') 
+ Throws    : n/a
+
+See Also   : B<_set_seq()>, B<_set_match_stats()>
+
+=cut
+
+#-----------
+sub strand {
+#-----------
+    my( $self, $seqType ) = @_;
+
+    # Hack to deal with the fact that SimilarityPair calls strand()
+    # which will lead to an error because parsing hasn't yet occurred.
+    # See SimilarityPair::new().
+    return if $self->{'_initializing'};
+
+    $seqType  ||= (wantarray ? 'list' : 'query');
+    $seqType = 'sbjct' if $seqType eq 'hit';
+
+    ## Sensitive to member name format.
+    $seqType = "_\L$seqType\E";
+
+    # $seqType could be '_list'.
+    $self->{'_queryStrand'} or $self->_set_seq_data() unless $self->{'_set_seq_data'};
+
+    my $prog = $self->algorithm;
+
+    if($seqType  =~ /list|array/i) {
+        my ($qstr, $hstr);
+        if( $prog eq 'BLASTP') {
+            $qstr = 0;
+            $hstr = 0;
+        }
+        elsif( $prog eq 'TBLASTN') {
+            $qstr = 0;
+            $hstr = $STRAND_SYMBOL{$self->{'_sbjctStrand'}};
+        }
+        elsif( $prog eq 'BLASTX') {
+            $qstr = $STRAND_SYMBOL{$self->{'_queryStrand'}};
+            $hstr = 0;
+        }
+        else {
+            $qstr = $STRAND_SYMBOL{$self->{'_queryStrand'}} if defined $self->{'_queryStrand'};
+            $hstr = $STRAND_SYMBOL{$self->{'_sbjctStrand'}} if defined $self->{'_sbjctStrand'};
+        }
+        $qstr ||= 0;
+        $hstr ||= 0;  
+	return ($qstr, $hstr);
+    }
+    local $^W = 0;
+    $STRAND_SYMBOL{$self->{$seqType.'Strand'}} || 0;
+}
+
+
+=head2 seq
+
+ Usage     : $hsp->seq( [seq_type] );
+ Purpose   : Get the query or sbjct sequence as a Bio::Seq.pm object.
+ Example   : $seqObj = $hsp->seq('query');
+ Returns   : Object reference for a Bio::Seq.pm object.
+ Argument  : seq_type = 'query' or 'hit' or 'sbjct' (default = 'query').
+           :  ('sbjct' is synonymous with 'hit') 
+ Throws    : Propagates any exception that occurs during construction
+           : of the Bio::Seq.pm object.
+ Comments  : The sequence is returned in an array of strings corresponding
+           : to the strings in the original format of the Blast alignment.
+           : (i.e., same spacing).
+
+See Also   : L<seq_str()|seq_str>, L<seq_inds()|seq_inds>, B<Bio::Seq>
+
+=cut
+
+#-------
+sub seq {
+#-------
+    my($self,$seqType) = @_; 
+    $seqType ||= 'query';
+    $seqType = 'sbjct' if $seqType eq 'hit';
+    my $str = $self->seq_str($seqType);
+	
+    require Bio::Seq;
+
+    new Bio::Seq (-ID   => $self->to_string,
+		  -SEQ  => $str,
+		  -DESC => "$seqType sequence",
+		  );
+}
+
+=head2 seq_str
+
+ Usage     : $hsp->seq_str( seq_type );
+ Purpose   : Get the full query, sbjct, or 'match' sequence as a string.
+           : The 'match' sequence is the string of symbols in between the 
+           : query and sbjct sequences.
+ Example   : $str = $hsp->seq_str('query');
+ Returns   : String
+ Argument  : seq_Type = 'query' or 'hit' or 'sbjct' or 'match'
+           :  ('sbjct' is synonymous with 'hit') 
+ Throws    : Exception if the argument does not match an accepted seq_type.
+ Comments  : Calls _set_seq_data() to set the 'match' sequence if it has
+           : not been set already.
+
+See Also   : L<seq()|seq>, L<seq_inds()|seq_inds>, B<_set_match_seq()>
+
+=cut
+
+#------------
+sub seq_str {  
+#------------
+    my($self,$seqType) = @_; 
+
+    $seqType ||= 'query';
+    $seqType = 'sbjct' if $seqType eq 'hit';
+    ## Sensitive to member name changes.
+    $seqType = "_\L$seqType\E";
+
+    $self->_set_seq_data() unless $self->{'_set_seq_data'};
+
+    if($seqType =~ /sbjct|query/) {
+	my $seq = join('',@{$self->{$seqType.'Seq'}}); 
+	$seq =~ s/\s+//g;
+	return $seq;
+
+    } elsif( $seqType =~ /match/i) {
+	# Only need to call _set_match_seq() if the match seq is requested.
+	my $aref = $self->_set_match_seq() unless ref $self->{'_matchSeq'};
+	$aref =  $self->{'_matchSeq'};
+
+	return join('',@$aref); 
+
+    } else {
+        my $id_str = $self->_id_str;
+	$self->throw(-class => 'Bio::Root::BadParameter',
+		     -text => "Invalid or undefined sequence type: $seqType ($id_str)\n" . 
+		               "Valid types: query, sbjct, match",
+		     -value => $seqType);
+    }
+}
+
+=head2 seq_inds
+
+ Usage     : $hsp->seq_inds( seq_type, class, collapse );
+ Purpose   : Get a list of residue positions (indices) for all identical 
+           : or conserved residues in the query or sbjct sequence.
+ Example   : @s_ind = $hsp->seq_inds('query', 'identical');
+           : @h_ind = $hsp->seq_inds('hit', 'conserved');
+           : @h_ind = $hsp->seq_inds('hit', 'conserved', 1);
+ Returns   : List of integers 
+           : May include ranges if collapse is true.
+ Argument  : seq_type  = 'query' or 'hit' or 'sbjct'  (default = query)
+           :  ('sbjct' is synonymous with 'hit') 
+           : class     = 'identical' or 'conserved' (default = identical)
+           :              (can be shortened to 'id' or 'cons')
+           :              (actually, anything not 'id' will evaluate to 'conserved').
+           : collapse  = boolean, if true, consecutive positions are merged
+           :             using a range notation, e.g., "1 2 3 4 5 7 9 10 11" 
+           :             collapses to "1-5 7 9-11". This is useful for 
+           :             consolidating long lists. Default = no collapse.
+ Throws    : n/a.
+ Comments  : Calls _set_residues() to set the 'match' sequence if it has
+           : not been set already.
+
+See Also   : L<seq()|seq>, B<_set_residues()>, L<Bio::Search::BlastUtils::collapse_nums()|Bio::Search::BlastUtils>, L<Bio::Search::Hit::BlastHit::seq_inds()|Bio::Search::Hit::BlastHit>
+
+=cut
+
+#---------------
+sub seq_inds {
+#---------------
+    my ($self, $seqType, $class, $collapse) = @_;
+
+    $seqType  ||= 'query';
+    $class ||= 'identical';
+    $collapse ||= 0;
+    $seqType = 'sbjct' if $seqType eq 'hit';
+
+    $self->_set_residues() unless defined $self->{'_identicalRes_query'};
+
+    $seqType  = ($seqType !~ /^q/i ? 'sbjct' : 'query');
+    $class = ($class !~ /^id/i ? 'conserved' : 'identical');
+
+    ## Sensitive to member name changes.
+    $seqType  = "_\L$seqType\E";
+    $class = "_\L$class\E";
+
+    my @ary = sort { $a <=> $b } keys %{ $self->{"${class}Res$seqType"}};
+
+    require Bio::Search::BlastUtils if $collapse;
+
+    return $collapse ? &Bio::Search::BlastUtils::collapse_nums(@ary) : @ary;
+}
+
+
+=head2 get_aln
+
+ Usage     : $hsp->get_aln()
+ Purpose   : Get a Bio::SimpleAlign object constructed from the query + sbjct 
+           : sequences of the present HSP object.
+ Example   : $aln_obj = $hsp->get_aln();
+ Returns   : Object reference for a Bio::SimpleAlign.pm object.
+ Argument  : n/a.
+ Throws    : Propagates any exception ocurring during the construction of
+           : the Bio::SimpleAlign object.
+ Comments  : Requires Bio::SimpleAlign.
+           : The Bio::SimpleAlign object is constructed from the query + sbjct 
+           : sequence objects obtained by calling seq().
+           : Gap residues are included (see $GAP_SYMBOL). 
+
+See Also   : L<seq()|seq>, L<Bio::SimpleAlign>
+
+=cut
+
+#------------
+sub get_aln {
+#------------
+    my $self = shift;
+
+    require Bio::SimpleAlign;
+    require Bio::LocatableSeq;
+    my $qseq = $self->seq('query');
+    my $sseq = $self->seq('sbjct');
+
+    my $type = $self->algorithm =~ /P$|^T/ ? 'amino' : 'dna';
+    my $aln = new Bio::SimpleAlign();
+    $aln->add_seq(new Bio::LocatableSeq(-seq => $qseq->seq(),
+					-id  => 'query_'.$qseq->display_id(),
+					-start => 1,
+					-end   => CORE::length($qseq)));
+		  
+    $aln->add_seq(new Bio::LocatableSeq(-seq => $sseq->seq(),
+					-id  => 'hit_'.$sseq->display_id(),
+					-start => 1,
+					-end   => CORE::length($sseq)));
+		  
+    return $aln;
+}
+
+
+1;
+__END__
+
+
+=head1 FOR DEVELOPERS ONLY
+
+=head2 Data Members
+
+Information about the various data members of this module is provided for those 
+wishing to modify or understand the code. Two things to bear in mind: 
+
+=over 4
+
+=item 1 Do NOT rely on these in any code outside of this module. 
+
+All data members are prefixed with an underscore to signify that they are private.
+Always use accessor methods. If the accessor doesn't exist or is inadequate, 
+create or modify an accessor (and let me know, too!). 
+
+=item 2 This documentation may be incomplete and out of date.
+
+It is easy for these data member descriptions to become obsolete as 
+this module is still evolving. Always double check this info and search 
+for members not described here.
+
+=back
+
+An instance of Bio::Search::HSP::BlastHSP.pm is a blessed reference to a hash containing
+all or some of the following fields:
+
+ FIELD           VALUE
+ --------------------------------------------------------------
+ (member names are mostly self-explanatory)
+
+ _score              :
+ _bits               :
+ _p                  :
+ _n                  : Integer. The 'N' value listed in parenthesis with P/Expect value:
+                     : e.g., P(3) = 1.2e-30  ---> (N = 3).
+                     : Not defined in NCBI Blast2 with gaps.
+                     : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().
+ _expect             :
+ _queryLength        : 
+ _queryGaps          : 
+ _queryStart         :
+ _queryStop          :
+ _querySeq           :
+ _sbjctLength        :
+ _sbjctGaps          :
+ _sbjctStart         :
+ _sbjctStop          :
+ _sbjctSeq           :
+ _matchSeq           : String. Contains the symbols between the query and sbjct lines
+                       which indicate identical (letter) and conserved ('+') matches
+                       or a mismatch (' ').
+ _numIdentical       :
+ _numConserved       :
+ _identicalRes_query :
+ _identicalRes_sbjct :
+ _conservedRes_query :
+ _conservedRes_sbjct :
+ _match_indent       : The number of leading space characters on each line containing
+                       the match symbols. _match_indent is 13 in this example:
+                         Query:   285 QNSAPWGLARISHRERLNLGSFNKYLYDDDAG
+                                      Q +APWGLARIS       G+ + Y YD+ AG
+                         ^^^^^^^^^^^^^ 
+
+
+=cut
+
+1;
+