diff variant_effect_predictor/Bio/Tools/Blast/HSP.pm @ 0:21066c0abaf5 draft

Uploaded
author willmclaren
date Fri, 03 Aug 2012 10:04:48 -0400
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/variant_effect_predictor/Bio/Tools/Blast/HSP.pm	Fri Aug 03 10:04:48 2012 -0400
@@ -0,0 +1,1863 @@
+#----------------------------------------------------------------------------
+# PACKAGE : Bio::Tools::Blast::HSP
+# AUTHOR  : Steve Chervitz (sac@bioperl.org)
+# CREATED : March 1996
+# STATUS  : Alpha
+# REVISION: $Id: HSP.pm,v 1.18 2002/10/22 07:38:48 lapp Exp $
+#
+# For the latest version and documentation, visit the distribution site:
+#    http://genome-www.stanford.edu/perlOOP/bioperl/blast/
+#
+# To generate documentation, run this module through pod2html
+# (preferably from Perl v5.004 or better).
+#
+# Copyright (c) 1996-2000 Steve Chervitz. All Rights Reserved.
+#           This module is free software; you can redistribute it and/or 
+#           modify it under the same terms as Perl itself.
+#----------------------------------------------------------------------------
+
+package Bio::Tools::Blast::HSP;
+
+use Bio::Root::Global qw(:devel);
+use Bio::Root::Object ();
+#use Bio::Root::Err   qw(:std);
+
+@ISA = qw( Bio::Root::Object);  
+
+use strict;
+use vars qw($ID $GAP_SYMBOL @SCORE_CUTOFFS $Revision %STRAND_SYMBOL);
+$ID       = 'Bio::Tools::Blast::HSP';
+$Revision = '$Id: HSP.pm,v 1.18 2002/10/22 07:38:48 lapp Exp $';  #'
+
+$GAP_SYMBOL    = '-';          # Need a more general way to handle gap symbols.
+@SCORE_CUTOFFS = ( 100, 30 );  # Bit score cutoffs (see homol_score()).
+%STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1);
+
+## POD Documentation:
+
+=head1 NAME
+
+Bio::Tools::Blast::HSP - Bioperl BLAST High-Scoring Segment Pair object
+
+=head1 SYNOPSIS
+
+=head2 Object Creation
+
+The construction of HSP objects is handled by Bio::Tools::Blast:: Sbjct.pm. 
+You should not need to use this package directly. See L<_initialize()|_initialize>
+for a description of constructor parameters.
+
+    require Bio::Tools::Blast::HSP;
+
+    $hspObj = eval{ new Bio::Tools::Blast::HSP(-DATA    =>\@hspData, 
+					       -PARENT  =>$sbjct_object, 
+					       -NAME    =>$hspCount,
+ 		                               -PROGRAM =>'TBLASTN',
+					       );
+		};
+
+@hspData includes the raw BLAST report data for a specific HSP,
+and is prepared by Bio::Tools::Blast::Sbjct.pm. 
+
+=head1 INSTALLATION
+
+This module is included with the central Bioperl distribution:
+
+   http://bio.perl.org/Core/Latest
+   ftp://bio.perl.org/pub/DIST
+
+Follow the installation instructions included in the README file.
+
+=head1 DESCRIPTION
+
+The Bio::Tools::Blast::HSP.pm module encapsulates data and methods for 
+manipulating, parsing, and analyzing HSPs ("High-scoring Segment Pairs")
+derived from BLAST sequence analysis.
+
+This module is a utility module used by the B<Bio::Tools::Blast::Sbjct.pm>
+and is not intended for separate use. Please see documentation for 
+B<Bio::Tools::Blast.pm> for some basic information about using
+HSP objects (L<Links:>).
+
+=over 0
+
+=item * Supports BLAST versions 1.x and 2.x, gapped and ungapped.
+
+=back
+
+Bio::Tools::Blast::HSP.pm has the ability to extract a list of all
+residue indices for identical and conservative matches along both
+query and sbjct sequences. Since this degree of detail is not always
+needed, this behavior does not occur during construction of the HSP
+object.  These data will automatically be collected as necessary as
+the HSP.pm object is used.
+
+=head1 DEPENDENCIES
+
+Bio::Tools::Blast::HSP.pm is a concrete class that inherits from
+B<Bio::Root::Object.pm> and relies on B<Bio::Tools::Sbjct.pm> as a
+container for HSP.pm objects.  B<Bio::Seq.pm> and B<Bio::UnivAln.pm>
+are employed for creating sequence and alignment objects,
+respectively.
+
+
+=head2 Relationship to UnivAln.pm & Seq.pm
+
+HSP.pm can provide the query or sbjct sequence as a B<Bio::Seq.pm>
+object via the L<seq()|seq> method. The HSP.pm object can also create a
+two-sequence B<Bio::UnivAln.pm> alignment object using the the query
+and sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment
+objects is not automatic when constructing the HSP.pm object since
+this level of functionality is not always required and would generate
+a lot of extra overhead when crunching many reports.
+
+
+=head1 FEEDBACK
+
+=head2 Mailing Lists 
+
+User feedback is an integral part of the evolution of this and other
+Bioperl modules.  Send your comments and suggestions preferably to one
+of the Bioperl mailing lists.  Your participation is much appreciated.
+
+    bioperl-l@bioperl.org          - General discussion
+    http://bio.perl.org/MailList.html             - About the mailing lists
+
+=head2 Reporting Bugs
+
+Report bugs to the Bioperl bug tracking system to help us keep track
+the bugs and their resolution. Bug reports can be submitted via email
+or the web:
+
+    bioperl-bugs@bio.perl.org                   
+    http://bugzilla.bioperl.org/           
+
+=head1 AUTHOR
+
+Steve Chervitz, E<lt>sac@bioperl.orgE<gt>
+
+=head1 SEE ALSO
+
+ Bio::Tools::Blast::Sbjct.pm   - Blast hit object.
+ Bio::Tools::Blast.pm          - Blast object.
+ Bio::Seq.pm                   - Biosequence object  
+ Bio::UnivAln.pm               - Biosequence alignment object.
+ Bio::Root::Object.pm          - Proposed base class for all Bioperl objects.
+
+=head2 Links:
+
+ http://bio.perl.org/Core/POD/Tools/Blast/Sbjct.pm.html
+
+ http://bio.perl.org/Projects/modules.html  - Online module documentation
+ http://bio.perl.org/Projects/Blast/        - Bioperl Blast Project     
+ http://bio.perl.org/                       - Bioperl Project Homepage
+
+=head1 COPYRIGHT
+
+Copyright (c) 1996-98 Steve Chervitz. All Rights Reserved.
+This module is free software; you can redistribute it and/or 
+modify it under the same terms as Perl itself.
+
+=cut
+
+
+
+#
+##
+###
+#### END of main POD documentation.
+###
+##
+#
+
+=head1 APPENDIX
+
+Methods beginning with a leading underscore are considered private
+and are intended for internal use by this module. They are
+B<not> considered part of the public interface and are described here
+for documentation purposes only.
+
+=cut
+
+#####################################################################################
+##                                 CONSTRUCTOR                                     ##
+#####################################################################################
+
+=head2 _initialize
+
+ Usage     : n/a; automatically called by Bio::Root::Object::new()
+           : Bio::Tools::Blast::HSP.pm objects are constructed 
+           : automatically by Bio::Tools::Sbjct.pm, so there is no need
+           : for direct consumption.
+ Purpose   : Initializes HSP data and calls private methods to extract 
+           : the data for a given HSP.
+           : Calls superclass constructor first (Bio::Root::Object.pm).
+ Returns   : n/a
+ Argument  : Named parameters passed from new():
+           : All tags must be uppercase (does not call _rearrange()).
+           :      -DATA    => array ref containing raw data for one HSP.
+           :      -PARENT  => Sbjct.pm object ref.
+           :      -NAME    => integer (1..n).
+           :      -PROGRAM => string ('TBLASTN', 'BLASTP', etc.).
+
+See Also   : L<_set_data()|_set_data>, B<Bio::Root::Object::new()>, B<Bio::Tools::Blast::Sbjct::_set_hsps()>
+
+=cut
+
+#----------------
+sub _initialize {
+#----------------
+    my( $self, %param ) = @_;
+    
+    $self->SUPER::_initialize( %param );
+
+    # The gapped and program booleans may be needed after the HSP object
+    # is built.
+#    $self->{'_gapped'} = $param{-GAPPED} || 0;
+    $self->{'_prog'} = $param{-PROGRAM} || 0;  
+    $self->_set_data( @{$param{-DATA}} );
+}
+
+#####################################################################################
+##                                  ACCESSORS                                      ##
+#####################################################################################
+
+
+=head2 _set_data
+
+ Usage     : n/a; called automatically during object construction.
+ Purpose   : Sets the query sequence, sbjct sequence, and the "match" data
+           : which consists of the symbols between the query and sbjct lines
+           : in the alignment.
+ Argument  : Array (all lines from a single, complete HSP, one line per element)
+ Throws    : Propagates any exceptions from the methods called ("See Also")
+
+See Also   : L<_set_seq()|_set_seq>, L<_set_residues()|_set_residues>, L<_set_score_stats()|_set_score_stats>, L<_set_match_stats()|_set_match_stats>, L<_initialize()|_initialize>
+
+=cut
+
+#--------------
+sub _set_data {
+#--------------
+    my $self = shift;
+    my @data = @_;
+    my @queryList  = ();  # 'Query' = SEQUENCE USED TO QUERY THE DATABASE.
+    my @sbjctList  = ();  # 'Sbjct' = HOMOLOGOUS SEQUENCE FOUND IN THE DATABASE.
+    my @matchList  = ();
+    my $matchLine  = 0;   # Alternating boolean: when true, load 'match' data.
+    my @linedat = ();
+    
+    $DEBUG and print STDERR "$ID: set_data()\n";
+
+    my($line, $aln_row_len, $length_diff);
+    $length_diff = 0;
+
+    # Collecting data for all lines in the alignment
+    # and then storing the collections for possible processing later.
+    #
+    # Note that "match" lines may not be properly padded with spaces.
+    # This loop now properly handles such cases:
+    # Query: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVIXXXXX 1200
+    #             PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVI
+    # Sbjct: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVILSLKL 1200
+
+    foreach $line( @data ) {
+	next if $line =~ /^\s*$/;
+
+	if( $line =~ /^ ?Score/ ) {
+	    $self->_set_score_stats( $line );
+	} elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) {
+	    $self->_set_match_stats( $line );
+	} elsif( $line =~ /^ ?Frame = ([\d+-]+)/ ) {
+	    # Version 2.0.8 has Frame information on a separate line.
+	   $self->{'_frame'} = $1;
+	} elsif( $line =~ /^(Query:?[\s\d]+)([^\s\d]+)/ ) {
+	    push @queryList, $line;
+	    $self->{'_match_indent'} = CORE::length $1;
+	    $aln_row_len = (CORE::length $1) + (CORE::length $2);
+	    $matchLine = 1;
+	} elsif( $matchLine ) {
+	    # Pad the match line with spaces if necessary.
+	    $length_diff = $aln_row_len - CORE::length $line;
+	    $length_diff and $line .= ' 'x $length_diff;
+	    push @matchList, $line;
+	    $matchLine = 0;
+	} elsif( $line =~ /^Sbjct/ ) {
+	    push @sbjctList, $line;
+	}
+    }
+
+    # Storing the query and sbjct lists in case they are needed later.
+    # We could make this conditional to save memory.
+    $self->{'_queryList'} = \@queryList; 
+    $self->{'_sbjctList'} = \@sbjctList; 
+
+    # Storing the match list in case it is needed later.
+    $self->{'_matchList'} = \@matchList;
+
+    if(not defined ($self->{'_numIdentical'})) {
+      $self->throw("Can't parse match statistics.",
+		   "Possibly a new or unrecognized Blast format.");
+    }
+
+    if(!scalar @queryList or !scalar @sbjctList) {
+        $self->throw("Can't find query or sbjct alignment lines.",
+	  	     "Possibly unrecognized Blast format.");
+      }
+}
+
+
+
+=head2 _set_score_stats
+
+ Usage     : n/a; called automatically by _set_data()
+ Purpose   : Sets various score statistics obtained from the HSP listing.
+ Argument  : String with any of the following formats:
+           : blast2:  Score = 30.1 bits (66), Expect = 9.2
+           : blast2:  Score = 158.2 bits (544), Expect(2) = e-110
+           : blast1:  Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40
+           : blast1:  Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99
+ Throws    : Exception if the stats cannot be parsed, probably due to a change
+           : in the Blast report format.
+
+See Also   : L<_set_data()|_set_data>
+
+=cut
+
+#--------------------
+sub _set_score_stats {
+#--------------------
+    my ($self, $data) = @_;
+
+    my ($expect, $p);
+
+    if($data =~ /Score = +([\d.e+-]+) bits \(([\d.e+-]+)\), +Expect = +([\d.e+-]+)/) {
+	# blast2 format n = 1
+	$self->{'_bits'}   = $1;
+	$self->{'_score'}  = $2;
+	$expect            = $3;
+    } elsif($data =~ /Score = +([\d.e+-]+) bits \(([\d.e+-]+)\), +Expect\((\d+)\) = +([\d.e+-]+)/) {
+	# blast2 format n > 1
+	$self->{'_bits'}   = $1;
+	$self->{'_score'}  = $2;
+	$self->{'_n'}      = $3;
+	$expect            = $4;
+
+    } elsif($data =~ /Score = +([\d.e+-]+) \(([\d.e+-]+) bits\), +Expect = +([\d.e+-]+), P = +([\d.e-]+)/) {
+	# blast1 format, n = 1
+	$self->{'_score'}  = $1;
+	$self->{'_bits'}   = $2;
+	$expect            = $3;
+	$p                 = $4;
+
+    } elsif($data =~ /Score = +([\d.e+-]+) \(([\d.e+-]+) bits\), +Expect = +([\d.e+-]+), +Sum P\((\d+)\) = +([\d.e-]+)/) {
+	# blast1 format, n > 1
+	$self->{'_score'}  = $1;
+	$self->{'_bits'}   = $2;
+	$expect            = $3;
+	$self->{'_n'}      = $4;
+	$p                 = $5;
+
+    } else {
+	$self->throw("Can't parse score statistics: unrecognized format.", "$data");
+    }
+
+    $expect = "1$expect" if $expect =~ /^e/i;    
+    $p      = "1$p"      if defined $p and $p=~ /^e/i; 
+
+    $self->{'_expect'} = $expect;
+    $self->{'_p'}      = $p || undef;
+
+}
+
+
+
+=head2 _set_match_stats
+
+ Usage     : n/a; called automatically by _set_data()
+ Purpose   : Sets various matching statistics obtained from the HSP listing.
+ Argument  : blast2: Identities = 23/74 (31%), Positives = 29/74 (39%), Gaps = 17/74 (22%)
+           : blast2: Identities = 57/98 (58%), Positives = 74/98 (75%)
+           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%)
+           : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%), Frame = -3
+           : WU-blast: Identities = 310/553 (56%), Positives = 310/553 (56%), Strand = Minus / Plus
+ Throws    : Exception if the stats cannot be parsed, probably due to a change
+           : in the Blast report format.
+ Comments  : The "Gaps = " data in the HSP header has a different meaning depending
+           : on the type of Blast: for BLASTP, this number is the total number of
+           : gaps in query+sbjct; for TBLASTN, it is the number of gaps in the
+           : query sequence only. Thus, it is safer to collect the data
+           : separately by examining the actual sequence strings as is done
+           : in _set_seq().
+
+See Also   : L<_set_data()|_set_data>, L<_set_seq()|_set_seq>
+
+=cut
+
+#--------------------
+sub _set_match_stats {
+#--------------------
+    my ($self, $data) = @_;
+
+    if($data =~ m!Identities = (\d+)/(\d+)!) {
+      # blast1 or 2 format
+      $self->{'_numIdentical'} = $1;
+      $self->{'_totalLength'}  = $2;
+    }
+    
+    if($data =~ m!Positives = (\d+)/(\d+)!) {
+      # blast1 or 2 format
+      $self->{'_numConserved'} = $1;
+      $self->{'_totalLength'}  = $2;
+    }
+    
+    if($data =~ m!Frame = ([\d+-]+)!) { 
+      $self->{'_frame'} = $1; 
+    }
+
+    # Strand data is not always present in this line.
+    # _set_seq() will also set strand information.
+    if($data =~ m!Strand = (\w+) / (\w+)!) { 
+	$self->{'_queryStrand'} = $1; 
+	$self->{'_sbjctStrand'} = $2; 
+    }
+
+#    if($data =~ m!Gaps = (\d+)/(\d+)!) {
+#	 $self->{'_totalGaps'} = $1;
+#    } else {
+#	 $self->{'_totalGaps'} = 0;
+#    }
+}
+
+
+
+=head2 _set_seq_data
+
+ Usage     : n/a; called automatically when sequence data is requested.
+ Purpose   : Sets the HSP sequence data for both query and sbjct sequences.
+           : Includes: start, stop, length, gaps, and raw sequence.
+ Argument  : n/a
+ Throws    : Propagates any exception thrown by _set_match_seq()
+ Comments  : Uses raw data stored by _set_data() during object construction.
+           : These data are not always needed, so it is conditionally
+           : executed only upon demand by methods such as gaps(), _set_residues(),
+           : etc. _set_seq() does the dirty work.
+
+See Also   : L<_set_seq()|_set_seq>
+
+=cut
+
+sub _set_seq_data {
+    my $self = shift;
+
+    $self->_set_seq('query', @{$self->{'_queryList'}});
+    $self->_set_seq('sbjct', @{$self->{'_sbjctList'}});
+
+    # Liberate some memory.
+    @{$self->{'_queryList'}} = @{$self->{'_sbjctList'}} = ();
+    undef $self->{'_queryList'};
+    undef $self->{'_sbjctList'};
+
+    $self->{'_set_seq_data'} = 1;
+}
+
+
+
+=head2 _set_seq
+
+ Usage     : n/a; called automatically by _set_seq_data()
+           : $hsp_obj->($seq_type, @data);
+ Purpose   : Sets sequence information for both the query and sbjct sequences.
+           : Directly counts the number of gaps in each sequence (if gapped Blast).
+ Argument  : $seq_type = 'query' or 'sbjct'
+           : @data = all seq lines with the form:
+           : Query: 61  SPHNVKDRKEQNGSINNAISPTATANTSGSQQINIDSALRDRSSNVAAQPSLSDASSGSN 120
+ Throws    : Exception if data strings cannot be parsed, probably due to a change
+           : in the Blast report format.
+ Comments  : Uses first argument to determine which data members to set
+           : making this method sensitive data member name changes.
+           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).
+ Warning   : Sequence endpoints are normalized so that start < end. This affects HSPs
+           : for TBLASTN/X hits on the minus strand. Normalization facilitates use
+           : of range information by methods such as match().
+
+See Also   : L<_set_seq_data()|_set_seq_data>, L<matches()|matches>, L<range()|range>, L<start()|start>, L<end()|end>
+
+=cut
+
+#-------------
+sub _set_seq {
+#-------------
+    my $self      = shift;
+    my $seqType   = shift;
+    my @data      = @_;
+    my @ranges    = ();
+    my @sequence  = ();
+    my $numGaps   = 0;
+
+    foreach( @data ) {
+	if( m/(\d+) *(\D+) *(\d+)/) {
+	    push @ranges, ( $1, $3 ) ;
+	    push @sequence, $2;
+	} else {
+	    $self->warn("Bad sequence data: $_");
+	}
+    }
+
+    (scalar(@sequence) and scalar(@ranges)) || $self->throw("Can't set sequence: missing data",
+							    "Possibly unrecognized Blast format.");
+ 
+    # Sensitive to member name changes.
+    $seqType = "_\L$seqType\E";
+    $self->{$seqType.'Start'} = $ranges[0];
+    $self->{$seqType.'Stop'}  = $ranges[ $#ranges ];
+    $self->{$seqType.'Seq'}   = \@sequence;
+	
+    $self->{$seqType.'Length'} = abs($ranges[ $#ranges ] - $ranges[0]) + 1;
+
+    # Adjust lengths for BLASTX, TBLASTN, TBLASTX sequences
+    # Converting nucl coords to amino acid coords.
+
+    my $prog = $self->{'_prog'};
+    if($prog eq 'TBLASTN' and $seqType eq '_sbjct') {
+	$self->{$seqType.'Length'} /= 3;
+    } elsif($prog eq 'BLASTX' and $seqType eq '_query') {
+	$self->{$seqType.'Length'} /= 3;
+    } elsif($prog eq 'TBLASTX') {
+	$self->{$seqType.'Length'} /= 3;
+    }
+
+    $self->{$seqType.'Strand'} = 'Plus' if $prog =~ /BLAST[NX]/;
+
+    # Normalize sequence endpoints so that start < end.
+    # Reverse complement or 'minus strand' HSPs get flipped here.
+    if($self->{$seqType.'Start'} > $self->{$seqType.'Stop'}) {
+	($self->{$seqType.'Start'}, $self->{$seqType.'Stop'}) = 
+	    ($self->{$seqType.'Stop'}, $self->{$seqType.'Start'});
+	$self->{$seqType.'Strand'} = 'Minus';
+    } 
+
+    ## Count number of gaps in each seq. Only need to do this for gapped Blasts.
+#    if($self->{'_gapped'}) {
+	my $seqstr = join('', @sequence);
+	$seqstr =~ s/\s//g;
+        my $num_gaps = CORE::length($seqstr) - $self->{$seqType.'Length'};
+	$self->{$seqType.'Gaps'} = $num_gaps if $num_gaps > 0;
+#    }
+}
+
+
+=head2 _set_residues
+
+ Usage     : n/a; called automatically when residue data is requested.
+ Purpose   : Sets the residue numbers representing the identical and
+           : conserved positions. These data are obtained by analyzing the
+           : symbols between query and sbjct lines of the alignments.
+ Argument  : n/a
+ Throws    : Propagates any exception thrown by _set_seq_data() and _set_match_seq().
+ Comments  : These data are not always needed, so it is conditionally
+           : executed only upon demand by methods such as seq_inds().
+           : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).
+
+See Also   : L<_set_seq_data()|_set_seq_data>, L<_set_match_seq()|_set_match_seq>, L<seq_inds()|seq_inds>
+
+=cut
+
+#------------------
+sub _set_residues {
+#------------------
+    my $self      = shift;
+    my @sequence  = ();
+
+    $self->_set_seq_data() unless $self->{'_set_seq_data'};
+
+    # Using hashes to avoid saving duplicate residue numbers.
+    my %identicalList_query = ();
+    my %identicalList_sbjct = ();
+    my %conservedList_query = ();
+    my %conservedList_sbjct = ();
+    
+    my $aref = $self->_set_match_seq() if not ref $self->{'_matchSeq'};
+    $aref  ||= $self->{'_matchSeq'};
+    my $seqString = join('', @$aref );
+
+    my $qseq = join('',@{$self->{'_querySeq'}});
+    my $sseq = join('',@{$self->{'_sbjctSeq'}});
+    my $resCount_query = $self->{'_queryStop'} || 0;
+    my $resCount_sbjct = $self->{'_sbjctStop'} || 0;
+
+    my $prog = $self->{'_prog'};
+    if($prog !~ /^BLASTP|^BLASTN/) {
+	if($prog eq 'TBLASTN') {
+	    $resCount_sbjct /= 3;
+	} elsif($prog eq 'BLASTX') {
+	    $resCount_query /= 3;
+	} elsif($prog eq 'TBLASTX') {
+	    $resCount_query /= 3;
+	    $resCount_sbjct /= 3;
+	}
+    }
+
+    my ($mchar, $schar, $qchar);
+    while( $mchar = chop($seqString) ) {
+	($qchar, $schar) = (chop($qseq), chop($sseq));
+	if( $mchar eq '+' ) { 
+	    $conservedList_query{ $resCount_query } = 1; 
+	    $conservedList_sbjct{ $resCount_sbjct } = 1; 
+	} elsif( $mchar ne ' ' ) { 
+	    $identicalList_query{ $resCount_query } = 1; 
+	    $identicalList_sbjct{ $resCount_sbjct } = 1;
+	}
+	$resCount_query-- if $qchar ne $GAP_SYMBOL;
+	$resCount_sbjct-- if $schar ne $GAP_SYMBOL;
+    }
+    $self->{'_identicalRes_query'} = \%identicalList_query;
+    $self->{'_conservedRes_query'} = \%conservedList_query;
+    $self->{'_identicalRes_sbjct'} = \%identicalList_sbjct;
+    $self->{'_conservedRes_sbjct'} = \%conservedList_sbjct;
+
+}
+
+
+
+
+=head2 _set_match_seq
+
+ Usage     : n/a. Internal method.
+           : $hsp_obj->_set_match_seq()
+ Purpose   : Set the 'match' sequence for the current HSP (symbols in between
+	   : the query and sbjct lines.)				
+ Returns   : Array reference holding the match sequences lines.
+ Argument  : n/a
+ Throws    : Exception if the _matchList field is not set.
+ Comments  : The match information is not always necessary. This method
+           : allows it to be conditionally prepared.
+           : Called by _set_residues>() and seq_str().
+
+See Also   : L<_set_residues()|_set_residues>, L<seq_str()|seq_str>
+
+=cut
+
+#-------------------
+sub _set_match_seq {
+#-------------------
+    my $self = shift;
+
+## DEBUGGING CODE:
+#    if($self->parent->name eq '1AK5_' and $self->parent->parent->name eq 'YAR073W') {
+#	print "\n_set_match_seq() called for HSP ", $self->name, " of hit ${\$self->parent->name} in query ${\$self->parent->parent->name}"; <STDIN>;
+ #   }
+
+    ref($self->{'_matchList'}) || $self->throw("Can't set HSP match sequence: No data");
+    
+    my @data = @{$self->{'_matchList'}};
+
+    my(@sequence);
+    foreach( @data ) {
+	chomp($_);
+	## Remove leading spaces; (note: aln may begin with a space
+	## which is why we can't use s/^ +//).
+	s/^ {$self->{'_match_indent'}}//;   
+	push @sequence, $_;
+    }
+    # Liberate some memory.
+    @{$self->{'_matchList'}} = undef;
+    $self->{'_matchList'} = undef;
+
+    $self->{'_matchSeq'} = \@sequence;
+
+## DEBUGGING CODE:
+#    if($self->parent->name eq '1AK5_' and $self->parent->parent->name eq 'YAR073W') {
+#	print "RETURNING: $self->{'_matchSeq'}:\n   @{$self->{'_matchSeq'}}";<STDIN>;
+#    }
+
+    $self->{'_matchSeq'};
+}
+
+
+
+=head2 score
+
+ Usage     : $hsp_obj->score()
+ Purpose   : Get the Blast score for the HSP.
+ Returns   : Integer
+ Argument  : n/a
+ Throws    : n/a
+
+See Also   : L<bits()|bits>
+
+=cut
+
+#---------
+sub score { my $self = shift; $self->{'_score'}; }
+#---------
+
+
+
+=head2 bits
+
+ Usage     : $hsp_obj->bits()
+ Purpose   : Get the Blast score in bits for the HSP.
+ Returns   : Float
+ Argument  : n/a
+ Throws    : n/a
+
+
+See Also   : L<score()|score>
+
+=cut
+
+#--------
+sub bits { my $self = shift; $self->{'_bits'}; }
+#--------
+
+
+
+=head2 n
+
+ Usage     : $hsp_obj->n()
+ Purpose   : Get the N value (num HSPs on which P/Expect is based).
+           : This value is not defined with NCBI Blast2 with gapping.
+ Returns   : Integer or null string if not defined.
+ Argument  : n/a
+ Throws    : n/a
+ Comments  : The 'N' value is listed in parenthesis with P/Expect value:
+           : e.g., P(3) = 1.2e-30  ---> (N = 3).
+           : Not defined in NCBI Blast2 with gaps.
+           : This typically is equal to the number of HSPs but not always.
+           : To obtain the number of HSPs, use Bio::Tools::Blast::Sbjct::num_hsps().
+
+See Also   : L<score()|score>
+
+=cut
+
+#-----
+sub n { my $self = shift; $self->{'_n'} || ''; }
+#-----
+
+
+
+=head2 frame
+
+ Usage     : $hsp_obj->frame()
+ Purpose   : Get the reading frame number (-/+ 1, 2, 3) (TBLASTN/X only).
+ Returns   : Integer or null string if not defined.
+ Argument  : n/a
+ Throws    : n/a
+
+=cut
+
+#---------
+sub frame { my $self = shift; $self->{'_frame'} || ''; }
+#---------
+
+
+
+=head2 signif()
+
+ Usage     : $hsp_obj->signif()
+ Purpose   : Get the P-value or Expect value for the HSP.
+ Returns   : Float (0.001 or 1.3e-43)
+           : Returns P-value if it is defined, otherwise, Expect value.
+ Argument  : n/a
+ Throws    : n/a
+ Comments  : Provided for consistency with Sbjct::signif()
+           : Support for returning the significance data in different
+           : formats (e.g., exponent only), is not provided for HSP objects.
+           : This is only available for the Sbjct or Blast object.
+
+See Also   : L<p()|p>, L<expect()|expect>, B<Bio::Tools::Blast::Sbjct::signif()>
+
+=cut
+
+#-----------
+sub signif { 
+#-----------
+    my $self = shift; 
+    my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};
+    $val; 
+}
+
+
+
+=head2 expect
+
+ Usage     : $hsp_obj->expect()
+ Purpose   : Get the Expect value for the HSP.
+ Returns   : Float (0.001 or 1.3e-43)
+ Argument  : n/a
+ Throws    : n/a
+ Comments  : Support for returning the expectation data in different
+           : formats (e.g., exponent only), is not provided for HSP objects.
+           : This is only available for the Sbjct or Blast object.
+
+See Also   : L<p()|p>
+
+=cut
+
+#----------
+sub expect { my $self = shift; $self->{'_expect'}; }
+#----------
+
+
+
+=head2 p
+
+ Usage     : $hsp_obj->p()
+ Purpose   : Get the P-value for the HSP.
+ Returns   : Float (0.001 or 1.3e-43) or undef if not defined.
+ Argument  : n/a
+ Throws    : n/a
+ Comments  : P-value is not defined with NCBI Blast2 reports.
+           : Support for returning the expectation data in different
+           : formats (e.g., exponent only) is not provided for HSP objects.
+           : This is only available for the Sbjct or Blast object.
+
+See Also   : L<expect()|expect>
+
+=cut
+
+#-----
+sub p { my $self = shift; $self->{'_p'}; }
+#-----
+
+
+=head2 length
+
+ Usage     : $hsp->length( [seq_type] )
+ Purpose   : Get the length of the aligned portion of the query or sbjct.
+ Example   : $hsp->length('query')
+ Returns   : integer
+ Argument  : seq_type: 'query' | 'sbjct' | 'total'  (default = 'total')
+ Throws    : n/a
+ Comments  : 'total' length is the full length of the alignment
+           : as reported in the denominators in the alignment section: 
+           : "Identical = 34/120 Positives = 67/120".
+           : Developer note: when using the built-in length function within
+           : this module, call it as CORE::length().
+
+See Also   : L<gaps()|gaps>
+
+=cut
+
+#-----------
+sub length {
+#-----------
+    my( $self, $type ) = @_;
+    $type  ||= 'total';
+
+    $type ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};
+
+    ## Sensitive to member name format.
+    $type = "_\L$type\E";
+    $self->{$type.'Length'};
+}
+
+
+
+=head2 gaps
+
+ Usage     : $hsp->gaps( [seq_type] )
+ Purpose   : Get the number of gaps in the query, sbjct, or total alignment.
+           : Also can return query gaps and sbjct gaps as a two-element list
+           : when in array context.
+ Example   : $total_gaps      = $hsp->gaps();
+           : ($qgaps, $sgaps) = $hsp->gaps();
+           : $qgaps           = $hsp->gaps('query');
+ Returns   : scalar context: integer
+           : array context without args: (int, int)  =  ('queryGaps', 'sbjctGaps')
+ Argument  : seq_type: 'query' | 'sbjct' | 'total'  
+           : (default = 'total', scalar context)
+           : Array context can be "induced" by providing an argument of 'list' or 'array'.
+ Throws    : n/a
+
+See Also   : L<length()|length>, L<matches()|matches>
+
+=cut
+
+#---------
+sub gaps {
+#---------
+    my( $self, $seqType ) = @_;
+
+    $self->_set_seq_data() unless $self->{'_set_seq_data'};
+
+    $seqType  ||= (wantarray ? 'list' : 'total');
+
+    if($seqType =~ /list|array/i) {
+	return (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));
+    }
+
+    if($seqType eq 'total') {
+	return ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;
+    } else {
+	## Sensitive to member name format.
+	$seqType = "_\L$seqType\E";
+	return $self->{$seqType.'Gaps'} || 0;
+    }
+}
+
+
+
+=head2 matches
+
+ Usage     : $hsp->matches([seq_type], [start], [stop]);
+ Purpose   : Get the total number of identical and conservative matches 
+           : in the query or sbjct sequence for the given HSP. Optionally can
+           : report data within a defined interval along the seq.
+           : (Note: 'conservative' matches are called 'positives' in the
+	   : Blast report.)
+ Example   : ($id,$cons) = $hsp_object->matches('sbjct');
+           : ($id,$cons) = $hsp_object->matches('query',300,400);
+ Returns   : 2-element array of integers 
+ Argument  : (1) seq_type = 'query' | 'sbjct' (default = query)
+           : (2) start = Starting coordinate (optional)
+           : (3) stop  = Ending coordinate (optional)
+ Throws    : Exception if the supplied coordinates are out of range.
+ Comments  : Relies on seq_str('match') to get the string of alignment symbols
+           : between the query and sbjct lines which are used for determining
+           : the number of identical and conservative matches.
+
+See Also   : L<length()|length>, L<gaps()|gaps>, L<seq_str()|seq_str>, B<Bio::Tools::Blast::Sbjct::_adjust_contigs()>
+
+=cut
+
+#-----------
+sub matches {
+#-----------
+    my( $self, %param ) = @_;
+    my(@data);
+    my($seqType, $beg, $end) = ($param{-SEQ}, $param{-START}, $param{-STOP});
+    $seqType ||= 'query';
+
+    if(!defined $beg && !defined $end) {
+	## Get data for the whole alignment.
+	push @data, ($self->{'_numIdentical'}, $self->{'_numConserved'});
+    } else {
+	## Get the substring representing the desired sub-section of aln.
+	$beg ||= 0;
+	$end ||= 0;
+	my($start,$stop) = $self->range($seqType);
+	if($beg == 0) { $beg = $start; $end = $beg+$end; }
+	elsif($end == 0) { $end = $stop; $beg = $end-$beg; }
+
+	if($end >= $stop) { $end = $stop; } ##ML changed from if (end >stop)
+	else { $end += 1;}   ##ML moved from commented position below, makes
+                             ##more sense here
+#	if($end > $stop) { $end = $stop; }
+	if($beg < $start) { $beg = $start; }
+#	else { $end += 1;}
+
+#	my $seq = substr($self->seq_str('match'), $beg-$start, ($end-$beg));
+
+	## ML: START fix for substr out of range error ------------------
+	my $seq = "";
+	if (($self->{'_prog'} eq 'TBLASTN') and ($seqType eq 'sbjct'))
+	{
+	    $seq = substr($self->seq_str('match'),
+			  int(($beg-$start)/3), int(($end-$beg+1)/3));
+
+	} elsif (($self->{'_prog'} eq 'BLASTX') and ($seqType eq 'query'))
+	{
+	    $seq = substr($self->seq_str('match'),
+			  int(($beg-$start)/3), int(($end-$beg+1)/3));
+	} else {
+	    $seq = substr($self->seq_str('match'), 
+			  $beg-$start, ($end-$beg));
+	}
+	## ML: End of fix for  substr out of range error -----------------
+
+	
+	## ML: debugging code
+	## This is where we get our exception.  Try printing out the values going
+	## into this:
+	##
+#	 print STDERR 
+#	     qq(*------------MY EXCEPTION --------------------\nSeq: ") , 
+#	     $self->seq_str("$seqType"), qq("\n),$self->name,",(  index:";
+#	 print STDERR  $beg-$start, ", len: ", $end-$beg," ), (HSPRealLen:", 
+#	     CORE::length $self->seq_str("$seqType");
+#	 print STDERR ", HSPCalcLen: ", $stop - $start +1 ," ), 
+#	     ( beg: $beg, end: $end ), ( start: $start, stop: stop )\n";
+	 ## ML: END DEBUGGING CODE----------
+
+	if(!CORE::length $seq) {
+	    $self->throw("Undefined sub-sequence ($beg,$end). Valid range = $start - $stop");
+	}
+	## Get data for a substring.
+#	printf "Collecting HSP subsection data: beg,end = %d,%d; start,stop = %d,%d\n%s<---\n", $beg, $end, $start, $stop, $seq;
+#	printf "Original match seq:\n%s\n",$self->seq_str('match');
+	$seq =~ s/ //g;  # remove space (no info).
+	my $len_cons = CORE::length $seq;
+	$seq =~ s/\+//g;  # remove '+' characters (conservative substitutions)
+	my $len_id = CORE::length $seq;
+	push @data, ($len_id, $len_cons);
+#	printf "  HSP = %s\n  id = %d; cons = %d\n", $self->name, $len_id, $len_cons; <STDIN>;
+    }
+    @data;
+}
+
+
+
+=head2 frac_identical
+
+ Usage     : $hsp_object->frac_identical( [seq_type] );
+ Purpose   : Get the fraction of identical positions within the given HSP.
+ Example   : $frac_iden = $hsp_object->frac_identical('query');
+ Returns   : Float (2-decimal precision, e.g., 0.75).
+ Argument  : seq_type: 'query' | 'sbjct' | 'total'
+           : default = 'total' (but see comments below).
+ Throws    : n/a
+ Comments  : Different versions of Blast report different values for the total
+           : length of the alignment. This is the number reported in the
+           : denominators in the stats section:
+           : "Identical = 34/120 Positives = 67/120".
+           : BLAST-GP uses the total length of the alignment (with gaps)
+           : WU-BLAST uses the length of the query sequence (without gaps).
+           : Therefore, when called without an argument or an argument of 'total',
+           : this method will report different values depending on the
+           : version of BLAST used.
+           : 
+           : To get the fraction identical among only the aligned residues,
+           : ignoring the gaps, call this method with an argument of 'query'
+           : or 'sbjct'.
+
+See Also   : L<frac_conserved()|frac_conserved>, L<num_identical()|num_identical>, L<matches()|matches>
+
+=cut
+
+#-------------------
+sub frac_identical {
+#-------------------
+# The value is calculated as opposed to storing it from the parsed results.
+# This saves storage and also permits flexibility in determining for which
+# sequence (query or sbjct) the figure is to be calculated.
+
+    my( $self, $seqType ) = @_;
+    $seqType ||= 'total';
+
+    if($seqType ne 'total') {
+      $self->_set_seq_data() unless $self->{'_set_seq_data'};
+    }
+    ## Sensitive to member name format.
+    $seqType = "_\L$seqType\E";
+
+    sprintf( "%.2f", $self->{'_numIdentical'}/$self->{$seqType.'Length'});
+}
+
+
+=head2 frac_conserved
+
+ Usage     : $hsp_object->frac_conserved( [seq_type] );
+ Purpose   : Get the fraction of conserved positions within the given HSP.
+           : (Note: 'conservative' positions are called 'positives' in the
+	   : Blast report.)
+ Example   : $frac_cons = $hsp_object->frac_conserved('query');
+ Returns   : Float (2-decimal precision, e.g., 0.75).
+ Argument  : seq_type: 'query' | 'sbjct' 
+           : default = 'total' (but see comments below).
+ Throws    : n/a
+ Comments  : Different versions of Blast report different values for the total
+           : length of the alignment. This is the number reported in the
+           : denominators in the stats section:
+           : "Identical = 34/120 Positives = 67/120".
+           : BLAST-GP uses the total length of the alignment (with gaps)
+           : WU-BLAST uses the length of the query sequence (without gaps).
+           : Therefore, when called without an argument or an argument of 'total',
+           : this method will report different values depending on the
+           : version of BLAST used.
+           :
+           : To get the fraction conserved among only the aligned residues,
+           : ignoring the gaps, call this method with an argument of 'query'
+           : or 'sbjct'.
+
+See Also   : L<frac_conserved()|frac_conserved>, L<num_conserved()|num_conserved>, L<matches()|matches>
+
+=cut
+
+#--------------------
+sub frac_conserved {
+#--------------------
+# The value is calculated as opposed to storing it from the parsed results.
+# This saves storage and also permits flexibility in determining for which
+# sequence (query or sbjct) the figure is to be calculated.
+ 
+    my( $self, $seqType ) = @_;
+    $seqType ||= 'total';
+
+    if($seqType ne 'total') {
+      $self->_set_seq_data() unless $self->{'_set_seq_data'};
+    }
+
+    ## Sensitive to member name format.
+    $seqType = "_\L$seqType\E";
+
+    sprintf( "%.2f", $self->{'_numConserved'}/$self->{$seqType.'Length'});
+}
+
+
+=head2 num_identical
+
+ Usage     : $hsp_object->num_identical();
+ Purpose   : Get the number of identical positions within the given HSP.
+ Example   : $num_iden = $hsp_object->num_identical();
+ Returns   : integer
+ Argument  : n/a
+ Throws    : n/a
+
+See Also   : L<num_conserved()|num_conserved>, L<frac_identical()|frac_identical>
+
+=cut
+
+#-------------------
+sub num_identical {
+#-------------------
+    my( $self) = shift;
+
+    $self->{'_numIdentical'};
+}
+
+
+=head2 num_conserved
+
+ Usage     : $hsp_object->num_conserved();
+ Purpose   : Get the number of conserved positions within the given HSP.
+ Example   : $num_iden = $hsp_object->num_conserved();
+ Returns   : integer
+ Argument  : n/a
+ Throws    : n/a
+
+See Also   : L<num_identical()|num_identical>, L<frac_conserved()|frac_conserved>
+
+=cut
+
+#-------------------
+sub num_conserved {
+#-------------------
+    my( $self) = shift;
+
+    $self->{'_numConserved'};
+}
+
+
+
+=head2 range
+
+ Usage     : $hsp->range( [seq_type] );
+ Purpose   : Gets the (start, end) coordinates for the query or sbjct sequence
+           : in the HSP alignment.
+ Example   : ($qbeg, $qend) = $hsp->range('query');
+           : ($sbeg, $send) = $hsp->range('sbjct');
+ Returns   : Two-element array of integers 
+ Argument  : seq_type = string, 'query' or 'sbjct'  (default = 'query')
+           : (case insensitive).
+ Throws    : n/a
+
+See Also   : L<start()|start>, L<end()|end>
+
+=cut
+
+#----------
+sub range {
+#----------
+    my ($self, $seqType) = @_;
+
+    $self->_set_seq_data() unless $self->{'_set_seq_data'};
+
+    $seqType ||= 'query';
+    ## Sensitive to member name changes.
+    $seqType = "_\L$seqType\E";
+
+    return ($self->{$seqType.'Start'},$self->{$seqType.'Stop'});
+}
+
+=head2 start
+
+ Usage     : $hsp->start( [seq_type] );
+ Purpose   : Gets the start coordinate for the query, sbjct, or both sequences
+           : in the HSP alignment.
+ Example   : $qbeg = $hsp->start('query');
+           : $sbeg = $hsp->start('sbjct');
+           : ($qbeg, $sbeg) = $hsp->start();
+ Returns   : scalar context: integer
+           : array context without args: list of two integers
+ Argument  : In scalar context: seq_type = 'query' or 'sbjct'
+           :  (case insensitive). If not supplied, 'query' is used.
+           : Array context can be "induced" by providing an argument of 'list' or 'array'.
+ Throws    : n/a
+
+See Also   : L<end()|end>, L<range()|range>
+
+=cut
+
+#----------
+sub start {
+#----------
+    my ($self, $seqType) = @_;
+
+    $seqType ||= (wantarray ? 'list' : 'query');
+
+    $self->_set_seq_data() unless $self->{'_set_seq_data'};
+
+    if($seqType =~ /list|array/i) {
+	return ($self->{'_queryStart'}, $self->{'_sbjctStart'});
+    } else {
+	## Sensitive to member name changes.
+	$seqType = "_\L$seqType\E";
+	return $self->{$seqType.'Start'};
+    }
+}
+
+=head2 end
+
+ Usage     : $hsp->end( [seq_type] );
+ Purpose   : Gets the end coordinate for the query, sbjct, or both sequences
+           : in the HSP alignment.
+ Example   : $qbeg = $hsp->end('query');
+           : $sbeg = $hsp->end('sbjct');
+           : ($qbeg, $sbeg) = $hsp->end();
+ Returns   : scalar context: integer
+           : array context without args: list of two integers
+ Argument  : In scalar context: seq_type = 'query' or 'sbjct'
+           :  (case insensitive). If not supplied, 'query' is used.
+           : Array context can be "induced" by providing an argument of 'list' or 'array'.
+ Throws    : n/a
+
+See Also   : L<start()|start>, L<range()|range>
+
+=cut
+
+#----------
+sub end {
+#----------
+    my ($self, $seqType) = @_;
+
+    $seqType ||= (wantarray ? 'list' : 'query');
+
+    $self->_set_seq_data() unless $self->{'_set_seq_data'};
+
+    if($seqType =~ /list|array/i) {
+	return ($self->{'_queryStop'}, $self->{'_sbjctStop'});
+    } else {
+	## Sensitive to member name changes.
+	$seqType = "_\L$seqType\E";
+	return $self->{$seqType.'Stop'};
+    }
+}
+
+
+
+=head2 strand
+
+ Usage     : $hsp_object->strand( [seq_type] )
+ Purpose   : Get the strand of the query or sbjct sequence.
+ Example   : print $hsp->strand('query');
+           : ($qstrand, $sstrand) = $hsp->strand();
+ Returns   : -1, 0, or 1
+           : -1 = Minus strand, +1 = Plus strand
+           : Returns 0 if strand is not defined, which occurs
+           : for non-TBLASTN/X reports.
+           : In scalar context without arguments, returns queryStrand value.
+           : In array context without arguments, returns a two-element list 
+           :    of strings (queryStrand, sbjctStrand).
+           : Array context can be "induced" by providing an argument of 'list' or 'array'.
+ Argument  : seq_type: 'query' | 'sbjct' or undef
+ Throws    : n/a
+
+See Also   : L<_set_seq()|_set_seq>, L<_set_match_stats()|_set_match_stats>
+
+=cut
+
+#-----------
+sub strand {
+#-----------
+    my( $self, $seqType ) = @_;
+    $seqType  ||= (wantarray ? 'list' : 'query');
+
+    return '' if $seqType eq 'query' and $self->{'_prog'} eq 'TBLASTN';
+
+    ## Sensitive to member name format.
+    $seqType = "_\L$seqType\E";
+
+    # $seqType could be '_list'.
+    $self->{'_queryStrand'} or $self->_set_seq_data() unless $self->{'_set_seq_data'};
+
+    if($seqType  =~ /list|array/i) {
+	return ('','') unless defined $self->{'_queryStrand'};
+	return ($self->{'_queryStrand'}, $self->{'_sbjctStrand'});
+    }
+    local $^W = 0;
+    $STRAND_SYMBOL{$self->{$seqType.'Strand'}} || 0;
+}
+
+
+#####################################################################################
+##                            INSTANCE METHODS                                     ##
+#####################################################################################
+
+
+=head2 seq
+
+ Usage     : $hsp->seq( [seq_type] );
+ Purpose   : Get the query or sbjct sequence as a Bio::Seq.pm object.
+ Example   : $seqObj = $hsp->seq('query');
+ Returns   : Object reference for a Bio::Seq.pm object.
+ Argument  : seq_type = 'query' or 'sbjct' (default = 'query').
+ Throws    : Propagates any exception that occurs during construction
+           : of the Bio::Seq.pm object.
+ Comments  : The sequence is returned in an array of strings corresponding
+           : to the strings in the original format of the Blast alignment.
+           : (i.e., same spacing).
+
+See Also   : L<seq_str()|seq_str>, L<seq_inds()|seq_inds>, B<Bio::Seq.pm>
+
+=cut
+
+#-------
+sub seq {  
+#-------
+    my($self,$seqType) = @_; 
+    $seqType ||= 'query';
+    my $str = $self->seq_str($seqType);
+    my $num =  $self->name;
+    my $name = $seqType =~ /query/i 
+	            ? $self->parent->parent->name 
+	            : $self->parent->name; 
+	
+    require Bio::Seq;
+
+    new Bio::Seq (-ID   => $name,
+		  -SEQ  => $str,
+		  -DESC => "Blast HSP #$num, $seqType sequence",
+		  );
+}
+
+
+
+=head2 seq_str
+
+ Usage     : $hsp->seq_str( seq_type );
+ Purpose   : Get the full query, sbjct, or 'match' sequence as a string.
+           : The 'match' sequence is the string of symbols in between the 
+           : query and sbjct sequences.
+ Example   : $str = $hsp->seq_str('query');
+ Returns   : String
+ Argument  : seq_Type = 'query' or 'sbjct' or 'match'
+ Throws    : Exception if the argument does not match an accepted seq_type.
+ Comments  : Calls _set_residues() to set the 'match' sequence if it has
+           : not been set already.
+
+See Also   : L<seq()|seq>, L<seq_inds()|seq_inds>, L<_set_match_seq()|_set_match_seq>
+
+=cut
+
+#------------
+sub seq_str {  
+#------------
+    my($self,$seqType) = @_; 
+
+    ## Sensitive to member name changes.
+    $seqType = "_\L$seqType\E";
+
+    $self->_set_seq_data() unless $self->{'_set_seq_data'};
+
+    if($seqType =~ /sbjct|query/) {
+	my $seq = join('',@{$self->{$seqType.'Seq'}}); 
+	$seq =~ s/\s+//g;
+	return $seq;
+
+    } elsif( $seqType =~ /match/i) {
+	# Only need to call _set_match_seq() if the match seq is requested.
+	my $aref = $self->_set_match_seq() unless ref $self->{'_matchSeq'};
+	$aref =  $self->{'_matchSeq'};
+
+## DEBUGGING CODE:
+#	if($self->parent->name eq '1AK5_' and $self->parent->parent->name eq 'YAR073W') {
+#	    print "seq_str():\n   @$aref";<STDIN>;
+#	}
+
+	return join('',@$aref); 
+
+    } else {
+	$self->throw("Invalid or undefined sequence type: $seqType",
+		     "Valid types: query, sbjct, match");
+    }
+}
+
+
+
+
+=head2 seq_inds
+
+ Usage     : $hsp->seq_inds( seq_type, class, collapse );
+ Purpose   : Get a list of residue positions (indices) for all identical 
+           : or conserved residues in the query or sbjct sequence.
+ Example   : @ind = $hsp->seq_inds('query', 'identical');
+           : @ind = $hsp->seq_inds('sbjct', 'conserved');
+           : @ind = $hsp->seq_inds('sbjct', 'conserved', 1);
+ Returns   : List of integers 
+           : May include ranges if collapse is true.
+ Argument  : seq_type  = 'query' or 'sbjct'  (default = query)
+           : class     = 'identical' or 'conserved' (default = identical)
+           :              (can be shortened to 'id' or 'cons')
+           :              (actually, anything not 'id' will evaluate to 'conserved').
+           : collapse  = boolean, if true, consecutive positions are merged
+           :             using a range notation, e.g., "1 2 3 4 5 7 9 10 11" 
+           :             collapses to "1-5 7 9-11". This is useful for 
+           :             consolidating long lists. Default = no collapse.
+ Throws    : n/a.
+ Comments  : Calls _set_residues() to set the 'match' sequence if it has
+           : not been set already.
+
+See Also   : L<seq()|seq>, L<_set_residues()|_set_residues>, L<collapse_nums()|collapse_nums>, B<Bio::Tools::Blast::Sbjct::seq_inds()>
+
+=cut
+
+#---------------
+sub seq_inds {
+#---------------
+    my ($self, $seq, $class, $collapse) = @_;
+
+    $seq  ||= 'query';
+    $class ||= 'identical';
+    $collapse ||= 0;
+
+    $self->_set_residues() unless defined $self->{'_identicalRes_query'};
+
+    $seq  = ($seq !~ /^q/i ? 'sbjct' : 'query');
+    $class = ($class !~ /^id/i ? 'conserved' : 'identical');
+
+    ## Sensitive to member name changes.
+    $seq  = "_\L$seq\E";
+    $class = "_\L$class\E";
+
+    my @ary = sort { $a <=> $b } keys %{ $self->{"${class}Res$seq"}};
+
+    return $collapse ? &collapse_nums(@ary) : @ary;
+}
+
+
+
+
+=head2 get_aln
+
+ Usage     : $hsp->get_aln()
+ Purpose   : Get a Bio::UnivAln.pm object constructed from the query + sbjct 
+           : sequences of the present HSP object.
+ Example   : $aln_obj = $hsp->get_aln();
+ Returns   : Object reference for a Bio::UnivAln.pm object.
+ Argument  : n/a.
+ Throws    : Propagates any exception ocurring during the construction of
+           : the Bio::UnivAln object.
+ Comments  : Requires Bio::UnivAln.pm.
+           : The Bio::UnivAln.pm object is constructed from the query + sbjct 
+           : sequence objects obtained by calling seq().
+           : Gap residues are included (see $GAP_SYMBOL). It is important that 
+           : Bio::UnivAln.pm recognizes the gaps correctly. A strategy for doing 
+           : this is being considered. Currently it is hard-wired.
+
+See Also   : L<seq()|seq>, B<Bio::UnivAln.pm>
+
+=cut
+
+#------------
+sub get_aln {
+#------------
+    my $self = shift;
+    
+    require Bio::UnivAln;
+    
+    my $qseq = $self->seq('query');
+    my $sseq = $self->seq('sbjct');
+
+    my $desc = sprintf "HSP #%s of query %s vs. sbjct %s", 
+	      $self->name, $self->parent->parent->name, $self->parent->name;
+
+    my $type = $self->{'_prog'} =~ /P$|^T/ ? 'amino' : 'dna';
+
+    Bio::UnivAln->new( -seqs => [$qseq, $sseq],
+		       -desc => $desc,
+		       -type => $type,
+		       );
+}
+
+
+=head2 display
+
+ Usage     : $sbjct_object->display( %named_parameters );
+ Purpose   : Display information about Bio::Tools::Blast::Sbjct.pm data members
+           : including:  length, gaps, score, significance value, 
+           :             sequences and sequence indices.
+ Example   : $object->display(-SHOW=>'stats');
+ Argument  : Named parameters: (TAGS CAN BE UPPER OR LOWER CASE)
+           :          -SHOW  => 'hsp',
+           :          -WHERE => filehandle (default = STDOUT)
+ Returns   : n/a
+ Status    : Experimental
+ Comments  : For more control over the display of sequence data,
+           : use seq(), seq_str(), seq_inds().
+
+See Also   : L<_display_seq()|_display_seq>, L<seq()|seq>, L<seq_str()|seq_str>, L<seq_inds()|seq_inds>, L<_display_matches()|_display_matches>, B<Bio::Root::Object::display()>
+
+=cut
+
+#-----------
+sub display {
+#-----------
+    my( $self, %param ) = @_;
+    
+     my $sbjctName = $self->parent->name();
+    my $queryName = $self->parent->parent->name();
+    my $layout = $self->parent->parent->_layout();
+    
+    my $OUT = $self->set_display(%param);
+    
+    printf( $OUT "%-15s: %d\n", "LENGTH TOTAL", $self->length('total') );
+    printf( $OUT "%-15s: %d\n", "LENGTH QUERY", $self->length('query') );
+    printf( $OUT "%-15s: %d\n", "LENGTH SBJCT", $self->length('sbjct') );
+    printf( $OUT "%-15s: %d\n", "GAPS QUERY", $self->gaps('query') );
+    printf( $OUT "%-15s: %d\n", "GAPS SBJCT", $self->gaps('sbjct') );
+    printf( $OUT "%-15s: %d\n", "SCORE", $self->{'_score'} );
+    printf( $OUT "%-15s: %0.1f\n", "BITS", $self->{'_bits'} );
+    if($layout == 1) {
+	printf( $OUT "%-15s: %.1e\n", "P-VAL", $self->{'_p'} );
+	printf( $OUT "%-15s: %.1e\n", "EXPECT", $self->{'_expect'} );
+    } else {
+	printf( $OUT "%-15s: %.1e\n", "EXPECT", $self->{'_expect'} );
+    }
+
+    my $queryLength = $self->length('query'); 
+
+    printf( $OUT "%-15s: %d (%0.0f%%)\n", "IDENTICAL", $self->{'_numIdentical'}, 
+	   $self->{'_numIdentical'}/$queryLength * 100 );
+    printf( $OUT "%-15s: %d (%0.0f%%)  %s \n", "CONSERVED", $self->{'_numConserved'}, 
+	   $self->{'_numConserved'}/$queryLength * 100,
+	   "includes identical" );
+    
+    $self->_display_seq('query', $queryName, $OUT);
+    $self->_display_seq('sbjct', $sbjctName, $OUT);
+    $self->_display_matches($queryName, $sbjctName, $OUT);
+}
+
+
+
+
+=head2 _display_seq
+
+ Usage     : n/a; called automatically by display()
+ Purpose   : Display information about query and sbjct HSP sequences.
+           : Prints the start, stop coordinates and the actual sequence.
+ Example   : n/a
+ Argument  : 
+ Returns   : printf call.
+ Status    : Experimental
+ Comments  : For more control, use seq(), seq_str(), or seq_inds().
+
+See Also   : L<display()|display>, L<seq()|seq>, L<seq_str()|seq_str>, L<seq_inds()|seq_inds>, L<_display_matches()|_display_matches>
+
+=cut
+
+#------------------
+sub _display_seq {
+#------------------
+    my( $self, $seqType, $name, $OUT ) = @_;
+    
+    $self->_set_seq_data() unless $self->{'_set_seq_data'};
+
+    # Sensitive to member name changes.
+    my $mem = "_\L$seqType\E";
+    printf( $OUT "\n%10s: %s\n%10s  %s\n", "\U$seqType\E", "$name", "-----",
+	   ('-'x ((CORE::length $name) + 2)) );
+    printf( $OUT "%13s: %d\n", "START", $self->{$mem.'Start'} );
+    printf( $OUT "%13s: %d\n", "STOP", $self->{$mem.'Stop'} );
+    printf( $OUT "%13s: \n", "SEQ" );
+    foreach( @{ $self->{$mem.'Seq'}} ) {
+	printf( $OUT "%15s%s\n", "", $_);
+    }
+}
+
+
+=head2 _display_matches
+
+ Usage     : n/a; called automatically by display()
+ Purpose   : Display information about identical and conserved positions
+           : within both the query and sbjct sequences.
+ Example   : n/a
+ Argument  :
+ Returns   : printf call.
+ Status    : Experimental
+ Comments  : For more control, use seq_inds().
+
+See Also   : L<display()|display>, L<seq_inds()|seq_inds>, L<_display_seq()|_display_seq>, 
+
+=cut
+
+#--------------------
+sub _display_matches {
+#--------------------
+    my( $self, $queryName, $sbjctName, $OUT) = @_;
+    my($resNum, $count);
+    
+    $self->_set_residues() unless defined $self->{'_identicalRes_query'};
+
+    printf( $OUT "\n%10s:    \n%10s\n", "HITS", "-----" );
+    foreach( @{ $self->{'_matchSeq'}} ) {
+	printf( $OUT "%15s%s\n", "", $_ );
+    }		
+    
+    print $OUT "\n\U$queryName\E\n------------\n";
+    printf( $OUT "\n%5s%s:\n%5s%s\n\t", "", "IDENTICAL RESIDUES IN $queryName (n=$self->{'_numIdentical'})", 
+	   "", "--------------------------------------------" );
+    $count = 0;
+    foreach $resNum ( sort keys %{ $self->{'_identicalRes_query' }} ) {
+	$count++;
+	print $OUT "$resNum";
+	$count > 0 and print $OUT +( $count % 15 ? ", " : "\n\t");
+    }
+    
+    print $OUT "\n";
+
+    my $justConserved = ($self->{'_numConserved'})-($self->{'_numIdentical'});
+    printf( $OUT "\n%5s%s:\n%5s%s\n\t", "","CONSERVED RESIDUES IN $queryName (n=$justConserved)", 
+	   "", "--------------------------------------------" );
+    $count = 0;
+    foreach $resNum ( sort keys %{ $self->{'_conservedRes_query' }} ) { 
+	$count++;
+	print $OUT "$resNum";
+	$count > 0 and print $OUT +( $count % 15 ? ", " : "\n\t");
+    }
+    
+    
+    print $OUT "\n\n\U$sbjctName\E\n------------\n";
+    printf( $OUT "\n%5s%s:\n%5s%s\n\t", "", "IDENTICAL RESIDUES IN $sbjctName (n=$self->{'_numIdentical'})", 
+	   "", "--------------------------------------------" );
+    $count = 0;
+    foreach $resNum ( sort keys %{ $self->{'_identicalRes_sbjct' }} ) {
+	$count++;
+	print $OUT "$resNum";
+	$count > 0 and print $OUT +( $count % 15 ? ", " : "\n\t");
+    }
+    
+    print $OUT "\n";
+    $justConserved = ($self->{'_numConserved'})-($self->{'_numIdentical'});
+    printf( $OUT "\n%5s%s:\n%5s%s\n\t", "","CONSERVED RESIDUES IN $sbjctName (n=$justConserved)", 
+	   "", "--------------------------------------------" );
+    $count       = 0;
+    foreach $resNum ( sort keys %{ $self->{'_conservedRes_sbjct' }} ) { 
+	$count++;
+	print $OUT "$resNum";
+	$count > 0 and print $OUT +( $count % 15 ? ", " : "\n\t");
+    }
+}
+
+
+
+
+=head2 homol_data
+
+ Usage     : $data = $hsp_object->homo_data( %named_params );
+ Purpose   : Gets similarity data for a single HSP.
+ Returns   : String:
+           : "Homology data" for each HSP is in the format:
+           :  "<integer> <start> <stop>"
+           :  where integer is the value returned by homol_score().
+ Argument  : Named params:  (UPPER OR LOWERCASE TAGS)
+           :   currently just one param is used:
+           :   -SEQ =>'query' or 'sbjct'
+ Throws    : n/a
+ Status    : Experimental
+ Comments  : This is a very experimental method used for obtaining a
+           : coarse indication of:
+           :   1) how strong the similarity is between the sequences in the HSP,
+           :   3) the endpoints of the alignment (sequence monomer numbers)
+
+See Also   : L<homol_score()|homol_score>, B<Bio::Tools::Blast.::homol_data()>, B<Bio::Tools::Blast::Sbjct::homol_data()>
+
+=cut
+
+#---------------
+sub homol_data {
+#---------------
+    my ($self, %param) = @_;
+    my $seq = $param{-SEQ} || $param{'-seq'} || 'sbjct';  # 'query' or 'sbjct'
+    my $homolScore = $self->homol_score();
+    # Sensitive to member name changes.
+    $seq = "_\L$seq\E";
+    
+    $self->_set_seq_data() unless $self->{'_set_seq_data'};
+    return ( $homolScore.' '.$self->{$seq.'Start'}.' '.$self->{$seq.'Stop'});
+}
+
+
+=head2 homol_score
+
+ Usage     : $self->homol_score();
+ Purpose   : Get a homology score (integer 1 - 3) as a coarse representation of 
+           : the strength of the similarity independent of sequence composition. 
+           : Based on the Blast bit score.
+ Example   : $hscore = $hsp->homol_score();
+ Returns   : Integer 
+ Argument  : n/a
+ Throws    : n/a
+ Status    : Experimental
+ Comments  : See @Bio::Tools::Blast::HSP::SCORE_CUTOFFS for the specific values. 
+           : Currently,  BIT_SCORE  HOMOL_SCORE
+           :             ---------  -----------
+           :             >=100     --> 3 
+           :               30-100  --> 2
+           :             < 30      --> 1
+
+See Also   : L<homol_data()|homol_data>
+
+=cut
+
+#----------------
+sub homol_score {
+#----------------
+    my $self = shift;
+    
+    if( $self->{'_bits'} >= $SCORE_CUTOFFS[0] ) { 1 }
+    elsif($self->{'_bits'} < $SCORE_CUTOFFS[0] and 
+	  $self->{'_bits'} >= $SCORE_CUTOFFS[1] ) { 2 }
+    else  { 3 }
+}
+
+
+#####################################################################################
+##                                  CLASS METHODS                                  ##
+#####################################################################################
+
+=head1 CLASS METHODS
+
+=head2 collapse_nums
+
+ Usage     : @cnums = collapse_nums( @numbers );
+ Purpose   : Collapses a list of numbers into a set of ranges of consecutive terms:
+           : Useful for condensing long lists of consecutive numbers.
+           :  EXPANDED:
+           :     1 2 3 4 5 6 10 12 13 14 15 17 18 20 21 22 24 26 30 31 32
+           :  COLLAPSED:
+           :     1-6 10 12-15 17 18 20-22 24 26 30-32
+ Argument  : List of numbers and sorted numerically.
+ Returns   : List of numbers mixed with ranges of numbers (see above).
+ Throws    : n/a
+ Comments  : Probably belongs in a more general utility class.
+
+See Also   : L<seq_inds()|seq_inds>
+
+=cut
+
+#------------------
+sub collapse_nums {
+#------------------
+# This is not the slickest connectivity algorithm, but will do for now.
+    my @a = @_;
+    my ($from, $to, $i, @ca, $consec);
+    
+    $consec = 0;
+    for($i=0; $i < @a; $i++) {
+	not $from and do{ $from = $a[$i]; next; };
+	if($a[$i] == $a[$i-1]+1) {
+	    $to = $a[$i];
+	    $consec++;
+	} else {
+	    if($consec == 1) { $from .= ",$to"; }
+	    else { $from .= $consec>1 ? "\-$to" : ""; }
+	    push @ca, split(',', $from);
+	    $from =  $a[$i];
+	    $consec = 0;
+	    $to = undef;
+	}
+    }
+    if(defined $to) {
+	if($consec == 1) { $from .= ",$to"; }
+	else { $from .= $consec>1 ? "\-$to" : ""; }
+    }
+    push @ca, split(',', $from) if $from;
+
+    @ca;
+}
+
+
+1;
+__END__
+
+#####################################################################################
+#                                  END OF CLASS
+#####################################################################################
+
+=head1 FOR DEVELOPERS ONLY
+
+=head2 Data Members
+
+Information about the various data members of this module is provided for those 
+wishing to modify or understand the code. Two things to bear in mind: 
+
+=over 4
+
+=item 1 Do NOT rely on these in any code outside of this module. 
+
+All data members are prefixed with an underscore to signify that they are private.
+Always use accessor methods. If the accessor doesn't exist or is inadequate, 
+create or modify an accessor (and let me know, too!). 
+
+=item 2 This documentation may be incomplete and out of date.
+
+It is easy for these data member descriptions to become obsolete as 
+this module is still evolving. Always double check this info and search 
+for members not described here.
+
+=back
+
+An instance of Bio::Tools::Blast::HSP.pm is a blessed reference to a hash containing
+all or some of the following fields:
+
+ FIELD           VALUE
+ --------------------------------------------------------------
+ (member names are mostly self-explanatory)
+
+ _score              :
+ _bits               :
+ _p                  :
+ _n                  : Integer. The 'N' value listed in parenthesis with P/Expect value:
+                     : e.g., P(3) = 1.2e-30  ---> (N = 3).
+                     : Not defined in NCBI Blast2 with gaps.
+                     : To obtain the number of HSPs, use Bio::Tools::Blast::Sbjct::num_hsps().
+ _expect             :
+ _queryLength        : 
+ _queryGaps          : 
+ _queryStart         :
+ _queryStop          :
+ _querySeq           :
+ _sbjctLength        :
+ _sbjctGaps          :
+ _sbjctStart         :
+ _sbjctStop          :
+ _sbjctSeq           :
+ _matchSeq           : String. Contains the symbols between the query and sbjct lines
+                       which indicate identical (letter) and conserved ('+') matches
+                       or a mismatch (' ').
+ _numIdentical       :
+ _numConserved       :
+ _identicalRes_query :
+ _identicalRes_sbjct :
+ _conservedRes_query :
+ _conservedRes_sbjct :
+ _match_indent       : The number of leading space characters on each line containing
+                       the match symbols. _match_indent is 13 in this example:
+                         Query:   285 QNSAPWGLARISHRERLNLGSFNKYLYDDDAG
+                                      Q +APWGLARIS       G+ + Y YD+ AG
+                         ^^^^^^^^^^^^^ 
+
+ INHERITED DATA MEMBERS 
+
+ _name          : From Bio::Root::Object.pm. 
+		:
+ _parent        : From Bio::Root::Object.pm. This member contains a reference to the
+		: Bio::Tools::Blast::Sbjct.pm object to which this hit belongs.
+
+
+=cut
+
+1;
+