Mercurial > repos > mahtabm > ensembl
diff variant_effect_predictor/Bio/Tools/Blast/Sbjct.pm @ 0:1f6dce3d34e0
Uploaded
author | mahtabm |
---|---|
date | Thu, 11 Apr 2013 02:01:53 -0400 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/variant_effect_predictor/Bio/Tools/Blast/Sbjct.pm Thu Apr 11 02:01:53 2013 -0400 @@ -0,0 +1,2417 @@ +#------------------------------------------------------------------------------ +# PACKAGE : Bio::Tools::Blast::Sbjct +# AUTHOR : Steve Chervitz (sac@bioperl.org) +# CREATED : 7 October 1996 +# STATUS : Alpha +# REVISION: $Id: Sbjct.pm,v 1.20 2002/10/22 07:38:48 lapp Exp $ +# +# For the latest version and documentation, visit the distribution site: +# http://genome-www.stanford.edu/perlOOP/bioperl/blast/ +# +# To generate documentation, run this module through pod2html +# (preferably from Perl v5.004 or better). +# +# Copyright (c) 1996-2000 Steve Chervitz. All Rights Reserved. +# This module is free software; you can redistribute it and/or +# modify it under the same terms as Perl itself. +#------------------------------------------------------------------------------ + +package Bio::Tools::Blast::Sbjct; + +use Bio::Root::Global qw(:devel); +use Bio::Root::Object (); + +@ISA = qw( Bio::Root::Object Exporter ); + +use strict; +use vars qw($ID %SUMMARY_OFFSET $Revision); +$ID = 'Bio::Tools::Blast::Sbjct'; +$Revision = '$Id: Sbjct.pm,v 1.20 2002/10/22 07:38:48 lapp Exp $'; #' + +my $_prog = ''; +my $_signif_fmt = ''; + +## POD Documentation: + +=head1 NAME + +Bio::Tools::Blast::Sbjct - Bioperl BLAST "Hit" object + +=head1 SYNOPSIS + +=head2 Object Creation + +The construction of HSP objects is handled by B<Bio::Tools::Blast>. +You should not need to use this package directly. See L<_initialize()|_initialize> +for a description of constructor parameters. + + require Bio::Tools::Blast::Sbjct; + + $hit = new Bio::Tools::Blast::Sbjct (-DATA =>\@hitData, + -PARENT =>$self, + -NAME =>5, + -RANK =>5, + -RANK_BY =>'order', + -MAKE =>'query' (or 'sbjct'), + -OVERLAP =>2, + -PROGRAM =>'TBLASTN' + ); + +@hitData includes the summary line for the hit as element [0], plus +all lines from the HSP alignment section of the BLAST report for +the present hit. + +=head1 INSTALLATION + +This module is included with the central Bioperl distribution: + + http://bio.perl.org/Core/Latest + ftp://bio.perl.org/pub/DIST + +Follow the installation instructions included in the README file. + +=head1 DESCRIPTION + +The Bio::Tools::Blast::Sbjct.pm module encapsulates data and methods for +parsing and manipulating "hits" from a BLAST report. +This module is a utility module used by the Bio::Tools::Blast.pm +and is not intended for separate use. + +In Blast lingo, the "sbjct" sequences are all the sequences +in a target database which were compared against a "query" sequence. +The terms "sbjct" and "hit" will be used interchangeably in this and related modules. + +This module supports BLAST versions 1.x and 2.x, gapped and ungapped. + + +=head2 HSP Tiling and Ambiguous Alignments + +If a Blast hit has more than one HSP, the Bio::Tools::Blast::Sbjct.pm +object has the ability to merge overlapping HSPs into contiguous +blocks. This permits the Sbjct object to sum data across all HSPs +without counting data in the overlapping regions multiple times, which +would happen if data from each overlapping HSP are simply summed. HSP +tiling is performed automatically when methods of the Sbjct object +that rely on tiled data are invoked. These include +L<frac_identical()|frac_identical>, L<frac_conserved()|frac_conserved>, L<gaps()|gaps>, +L<frac_aligned_query()|frac_aligned_query>, L<frac_aligned_hit()|frac_aligned_hit>, +L<num_unaligned_query()|num_unaligned_query>, L<num_unaligned_hit()|num_unaligned_hit>. + +It also permits the assessment of an "ambiguous alignment" if the +query (or sbjct) sequences from different HSPs overlap. The existence +of an overlap could indicate a biologically interesting region in the +sequence, such as a repeated domain. The Sbjct object uses the +-OVERLAP parameter to determine when two sequences overlap; if this is +set to 2 -- the default -- then any two sbjct or query HSP sequences +must overlap by more than two residues to get merged into the same +contig and counted as an overlap. See the L<BUGS | BUGS> section below for +"issues" with HSP tiling. + + +The results of the HSP tiling is reported with the following ambiguity codes: + + 'q' = Query sequence contains multiple sub-sequences matching + a single region in the sbjct sequence. + + 's' = Sbjct sequence contains multiple sub-sequences matching + a single region in the query sequence. + + 'qs' = Both query and sbjct sequences contain more than one + sub-sequence with similarity to the other sequence. + + +For addition information about ambiguous BLAST alignments, see +L<_tile_hsps()|_tile_hsps> and + + http://www-genome.stanford.edu/Sacch3D/help/ambig_aln.html + +=head1 DEPENDENCIES + +Bio::Tools::Blast::Sbjct.pm is a concrete class that inherits from B<Bio::Root::Object> +and relies on two other modules: + +=over 4 + +=item B<Bio::Tools::Blast::HSP> + +Encapsulates a single high-scoring segment pair within a hit. + +=item B<Bio::Tools::Blast> + +Provides a container for Sbjct.pm objects. + +=back + + +Bio::Tools::Blast::Sbjct.pm does not currently inherit from +Bio::Root::Vector.pm since Bio::Root::Vector.pm may be re-designed to +make it usable via delegation. Thus, a Blast.pm object would manage a +vector of Sbjct.pm objects. Stay tuned. + + +=head1 BUGS + +One consequence of the HSP tiling is that methods that rely on HSP +tiling such as L<frac_identical()|frac_identical>, L<frac_conserved()|frac_conserved>, L<gaps()|gaps> +etc. may report misleading numbers when C<-OVERLAP> is set to a large +number. For example, say we have two HSPs and the query sequence tile +as follows: + + 1 8 22 30 40 60 + Full seq: ------------------------------------------------------------ + * ** * ** + HSP1: --------------- (6 identical matches) + ** ** ** + HSP2: ------------- (6 identical matches) + + +If C<-OVERLAP> is set to some number over 4, HSP1 and HSP2 will not be +tiled into a single contig and their numbers of identical matches will +be added, giving a total of 12, not 10 if they had be combined into +one contig. This can lead to number greater than 1.0 for methods +L<frac_identical()|frac_identical> and L<frac_conserved()|frac_conserved>. This is less of an issue +with gapped Blast since it tends to combine HSPs that would be listed +separately without gapping. (Fractions E<gt>1.0 can be viewed as a +signal for an interesting alignment that warrants further inspection, +thus turning this bug into a feature). + +Using large values for C<-OVERLAP> can lead to incorrect numbers +reported by methods that rely on HSP tiling but can be useful if you +care more about detecting ambiguous alignments. Setting C<-OVERLAP> +to zero will lead to the most accurate numbers for the +tiling-dependent methods but will be useless for detecting overlapping +HSPs since all HSPs will appear to overlap. + + +=head1 SEE ALSO + + Bio::Tools::Blast::HSP.pm - Blast HSP object. + Bio::Tools::Blast.pm - Blast object. + Bio::Root::Object.pm - Proposed base class for all Bioperl objects. + +Links: + + http://bio.perl.org/Core/POD/Tools/Blast/HSP.pm.html + + http://bio.perl.org/Projects/modules.html - Online module documentation + http://bio.perl.org/Projects/Blast/ - Bioperl Blast Project + http://bio.perl.org/ - Bioperl Project Homepage + + +=head1 FEEDBACK + +=head2 Mailing Lists + +User feedback is an integral part of the evolution of this and other +Bioperl modules. Send your comments and suggestions preferably to one +of the Bioperl mailing lists. Your participation is much appreciated. + + bioperl-l@bioperl.org - General discussion + http://bio.perl.org/MailList.html - About the mailing lists + +=head2 Reporting Bugs + +Report bugs to the Bioperl bug tracking system to help us keep track +the bugs and their resolution. Bug reports can be submitted via email +or the web: + + bioperl-bugs@bio.perl.org + http://bugzilla.bioperl.org/ + +=head1 AUTHOR + +Steve Chervitz, sac@bioperl.org + +See the L<FEEDBACK | FEEDBACK> section for where to send bug reports and comments. + +=head1 COPYRIGHT + +Copyright (c) 1996-2000 Steve Chervitz. All Rights Reserved. +This module is free software; you can redistribute it and/or +modify it under the same terms as Perl itself. + +=cut + + + +# +## +### +#### END of main POD documentation. +### +## +#' + + +=head1 APPENDIX + +Methods beginning with a leading underscore are considered private +and are intended for internal use by this module. They are +B<not> considered part of the public interface and are described here +for documentation purposes only. + +=cut + +##################################################################################### +## CONSTRUCTOR ## +##################################################################################### + +=head2 _initialize + + Usage : n/a; automatically called by Bio::Root::Object::new() + : Bio::Tools::Blast::Sbjct.pm objects are constructed + : automatically by Bio::Tools::Blast.pm, so there is no need + : for direct consumption. + Purpose : Initializes key varaiables and calls methods to parse a single Blast hit. + : Constructs Bio::Tools::Blast::HSP.pm objects for each + : high-scoring segment pair (HSP). + : Calls superclass constructor first (Bio::Root::Object.pm). + Returns : n/a + Argument : Named Parameters passed from new() + : ALL TAGS MUST BE UPPERCASE (does not call _rearrange()). + : -DATA => array reference holding all data for a single hit. + : The first element should hold the description + : line (from the desctiption section at the top of + : the report), remaining lines should hold all lines + : within the HSP alignment listing section of report. + : -PARENT => object reference to a Bio::Tools::Blast.pm object. + : -NAME => string (typically same as -RANK, just a temporary + : name to use until the actual name of hit is parsed), + : -RANK => integer, + : -RANK_BY => 'order', + : -OVERLAP => integer (maximum overlap between adjacent + : HSPs when tiling) + : -PROGRAM => string (type of Blast blastp, blastn, etc). + +See Also : L<_set_id()|_set_id>, L<_set_hsps()|_set_hsps>, L<_tile_hsps()|_tile_hsps>, B<Bio::Root::Object::new()>, B<Bio::Tools::Blast::_set_hit_db()> + +=cut + +#------------------- +sub _initialize { +#------------------- + my( $self, %param ) = @_; + + # $make not currently used. + my $make = $self->SUPER::_initialize( %param ); + + # Set various class data. + $_prog = $param{-PROGRAM} || ''; + $_signif_fmt = $param{-SIGNIF_FMT}; + + $self->{'_rank'} = $param{-RANK} || ''; + $self->_set_id( $param{-DATA}->[0]); + $self->_set_hsps( @{$param{-DATA}} ); + + $self->{'_overlap'} = $param{-OVERLAP} || 0; +} + +#-------------- +sub destroy { +#-------------- + my $self=shift; + if($self->{'_hsps'}) { + foreach($self->hsps) { + $_->destroy; + undef $_; + } + undef $self->{'_hsps'}; + } + $DEBUG==2 && print STDERR "DESTROYING $self ${\$self->name}"; + $self->SUPER::destroy; +} + +##################################################################################### +## ACCESSORS ## +##################################################################################### + + +=head2 rank + + Usage : $sbjct->rank( integer or string ); + Purpose : Sets/Gets the rank of the current Sbjct object relative to + : other Sbjct objects managed by a given Blast object. + Example : $sbjct->rank(1); + Returns : The current rank value. + Argument : Integer or string to be used for ranking the hit + : relative to other hits. + Throws : n/a + Comments : The rank usually corresponds to the order the listing + : of hits in the BLAST report from lowest to highest p-value. + : Rank need not be restricted to this value. + : rank() may be provided by a delegated or inherited + : iterator class in the future (such as Bio::Root::Vector.pm). + +=cut + +#----------- +sub rank { +#----------- + my $self = shift; + if(@_) {$self->{'_rank'} = shift; } + $self->{'_rank'}; +} + + + +=head2 _set_id + + Usage : n/a; automatically called by _initialize() + Purpose : Sets the name of the Sbjct sequence from the BLAST summary line. + : The identifier is assumed to be the first + : chunk of non-whitespace characters in the description line + : Does not assume any semantics in the structure of the identifier + : (Formerly, this method attempted to extract database name from + : the seq identifiers, but this was prone to break). + Returns : n/a + Argument : String containing description line of the hit from Blast report + : or first line of an alignment section. + Throws : Warning if cannot locate sequence ID. + +See Also : L<_initialize()|_initialize>, B<Bio::Tools::Blast::database()> + +=cut + +#--------------- +sub _set_id { +#--------------- + my( $self, $desc ) = @_; + my ($seqID1, $seqID2, $dbID, @seqDat); + + local $_ = $desc; + my @linedat = split(); + my $data = $linedat[0]; + +# New strategy: Assume only that the ID is the first white space +# delimited chunk. Not attempting to extract database name. +# Clients will have to interpret it as necessary. + if($data =~ /^(\S+)\s*/) { + $self->name($1); + } + else { + $self->warn("Can't locate sequence identifier in summary line.", "Line = $data"); + $data = 'Unknown sequence ID' if not $data; + $self->name($data); + } + $self->{'_db'} = '-'; + +# Old strategy: assumes semantics in the identifier +# and tries to separate out database and id components. +# Too fancy and fragile! SAC, 2000-02-18 + +# # Proceeding from more standard (NCBI-like) to less standard. +# if($data =~ /(\S+?)[:\|]+(\S+?)[:\|]+(\S*)/) { +# # matches: database|id1|id2 or database||id1||id2 or database:id1:id2 +# $dbID = $1; +# $seqID1 = $2; +# $seqID2 = $3; +# if($seqID2 eq $seqID1) { undef($seqID2); } +# +# } elsif($data =~ /(\S+?)[:\|]+(\S+)/) { +# # matches: database|id1 or database:id1 +# $dbID = $1; +# $seqID1 = $2; +# +# } elsif($data =~ /^(\S+)\s+([gb|emb|dbj|sp|pir])\s+(\S+)*/) { +# # matches: id1 database id2 +# $seqID1 = $1; +# $dbID = $2; +# $seqID2 = $3; +# +# } elsif($data =~ /^(\S+)\s*/) { +# $seqID1 = $1; +# } +# +# ## Combine the multiple IDs. +# $seqID2 = scalar($seqID2) ? "/$seqID2" : ''; +# +# if( !scalar $seqID1) { +# $self->warn("Can't locate sequence identifier in summary line.", "Line = $data"); +# $self->name('Unknown sequence ID'); +# } else { +# $self->name($seqID1.$seqID2); +# } +# $self->{'_db'} = $dbID || '-'; +} + + +=head2 _set_hsps + + Usage : n/a; called automatically during object construction. + Purpose : Creates HSP.pm objects for each HSP in a BLAST hit alignment. + : Also collects the full description of the hit from the + : HSP alignment section. + Returns : n/a + Argument : List of strings containing raw BLAST report data for + : a single hit's HSP alignment data. + Throws : Warnings for each HSP.pm object that fails to be constructed. + : Exception if no HSP.pm objects can be constructed. + : Exception if can't parse length data for hit sequence. + Comments : Requires Bio::Tools::Blast::HSP.pm. + : Sets the description using the full string present in + : the alignment data. + : Also sets Expect and P-values for the Sbjct object by + : copying from the HSP object. + : While this sacrifices some memory efficiency, it + : improves access speed for these critical data. + +See Also : L<_initialize()|_initialize>, L<_set_desc()|_set_desc> + +=cut + +#-------------- +sub _set_hsps { +#-------------- + + my( $self, @data ) = @_; + my $start = 0; + my $hspCount = 0; + + require Bio::Tools::Blast::HSP; + +# printf STDERR "$ID _set_hsps(). DATA (%d lines) =\n@data\n", scalar(@data); <STDIN>; + + my( @hspData, @hspList, @errs, @bad_names ); + my($line, $set_desc, @desc); + $set_desc = 0; + + hit_loop: + foreach $line( @data ) { + + if( $line =~ /^\s*Length = ([\d,]+)/ ) { + $self->_set_desc(@desc); + $set_desc = 1; + ($self->{'_length'} = $1) =~ s/,//g; # get rid of commas + next hit_loop; + } elsif( !$set_desc) { + $line =~ s/^\s+|\s+$//g; + push @desc, $line; + next hit_loop; + } elsif( $line =~ /^\s*Score/ ) { + ## This block is for setting multiple HSPs. + + if( not scalar @hspData ) { + $start = 1; + push @hspData, $line; + next hit_loop; + + } elsif( scalar @hspData) { + $hspCount++; + $DEBUG and do{ print STDERR +( $hspCount % 10 ? "+" : "+\n" ); }; + +# print STDERR "\n$ID: setting HSP: ${\$self->name}\n"; + my $hspObj = eval { new Bio::Tools::Blast::HSP(-DATA =>\@hspData, + -PARENT =>$self, + -NAME =>$hspCount, + -PROGRAM =>$_prog, + ); + }; + if($@) { +# print "$ID: ERROR:\n$@";<STDIN>; + push @errs, $@; + push @bad_names, "#$hspCount"; + $hspObj->destroy if ref $hspObj; + undef $hspObj; + } else { + push @hspList, $hspObj; + if (!defined($self->{'_expect'}) || $hspObj->expect() < $self->{'_expect'}) { + $self->{'_expect'} = $hspObj->expect(); + } + if (!defined($self->{'_p'}) || $hspObj->p() < $self->{'_p'}) { + $self->{'_p'} = $hspObj->p(); + } + } + @hspData = (); + push @hspData, $line; + next; + } else { + push @hspData, $line; + } + } elsif( $start ) { + ## This block is for setting the last HSP (which may be the first as well!). + if( $line =~ /^(end|>|Parameters|CPU|Database:)/ ) { + $hspCount++; + $DEBUG and do{ print STDERR +( $hspCount % 10 ? "+" : "+\n" ); }; + +# print STDERR "\n$ID: setting HSP: ${\$self->name}\n"; + + my $hspObj = eval { new Bio::Tools::Blast::HSP(-DATA =>\@hspData, + -PARENT =>$self, + -NAME =>$hspCount, + -PROGRAM =>$_prog, + ); + }; + if($@) { +# print "$ID: ERROR:\n$@";<STDIN>; + push @errs, $@; + push @bad_names, "#$hspCount"; + $hspObj->destroy if ref $hspObj; + undef $hspObj; + } else { + push @hspList, $hspObj; + if (!defined($self->{'_expect'}) || $hspObj->expect() < $self->{'_expect'}) { + $self->{'_expect'} = $hspObj->expect(); + } + if (!defined($self->{'_p'}) || $hspObj->p() < $self->{'_p'}) { + $self->{'_p'} = $hspObj->p(); + } + } + } else { + push @hspData, $line; + } + } + } + +# print STDERR "\n--------> Done building HSPs for ${\$self->name}\n"; + + $self->{'_length'} or $self->throw( "Can't determine hit sequence length."); + + # Adjust logical length based on BLAST flavor. + if($_prog =~ /TBLAST[NX]/) { + $self->{'_logical_length'} = $self->{'_length'} / 3; + } + + # Handling errors as done in Blast.pm. (as of version 0.073) + + if(@errs) { + my ($str); + # When there are many errors, in most of the cases, they are + # caused by the same problem. Only need to see full data for + # the first one. + if(@errs > 2) { + $str = "SHOWING FIRST EXCEPTION ONLY:\n$errs[0]"; + $self->clear_err(); # clearing the existing set of errors. + # Not necessary, unless the -RECORD_ERR =>1 + # constructor option was used for Blast object. + } else { + $str = join("\n",@errs); + } + + if( not scalar @hspList) { + $self->throw("Failed to create any HSP objects for $hspCount potential HSP(s).", + "\n\nTRAPPED EXCEPTION(S):\n$str\nEND TRAPPED EXCEPTION(S)\n" + ); + } else { + $self->warn(sprintf("Could not create HSP objects for %d HSP(s): %s", scalar(@errs), join(', ',@bad_names)), + "\n\nTRAPPED EXCEPTION(S):\n$str\nEND TRAPPED EXCEPTION(S)\n" + ); + } + + } else { + $self->{'_hsps'} = \@hspList; + } +} + +=head2 _set_desc + + Usage : n/a; called automatically by _set_hsps() + Purpose : Sets the description of the hit sequence. + : For sequence without descriptions, sets description to "-". + Argument : Array containing description (multiple lines). + Comments : _set_hsps() calls this method with the data from the + : HSP alignment listing, which contains the complete description. + : (Formerly, this was called from the _set_desc_data() method initially.) + +See Also : _set_hsps() + +=cut + +#-------------- +sub _set_desc { +#-------------- + my( $self, @desc ) = @_; + my( $desc); + +# print "$ID: RAW DESC:\n@desc";<STDIN>; + + $desc = join(" ", @desc); + + if($desc) { + $desc =~ s/^\s*\S+\s+//; # remove the sequence ID(s) + $desc =~ s/^[\s!]+//; + $desc =~ s/ \d+$//; + $desc =~ s/\.+$//; + $self->{'_desc'} = $desc || '-'; + } else { + $self->{'_desc'} = '-'; + } + +# print "$ID: _set_desc = $desc";<STDIN>; +} + + +=head2 _tile_hsps + + Usage : n/a; called automatically during object construction or + : as needed by methods that rely on having tiled data. + Purpose : Collect statistics about the aligned sequences in a set of HSPs. + : Calculates the following data across all HSPs: + : -- total alignment length + : -- total identical residues + : -- total conserved residues + Returns : n/a + Argument : n/a + Throws : n/a + Status : Experimental + Comments : + : This method performs more careful summing of data across + : all HSPs in the Sbjct object. Simply summing the data from all HSPs + : will overestimate the actual length of the alignment if there is + : overlap between different HSPs (often the case). + : The strategy is to tile the HSPs and sum over the + : contigs, collecting data separately from overlapping and + : non-overlapping regions of each HSP. To facilitate this, the + : HSP.pm object now permits extraction of data from sub-sections + : of an HSP. + : + : Additional useful information is collected from the results + : of the tiling. It is possible that sub-sequences in + : different HSPs will overlap significantly. In this case, it + : is impossible to create a single unambiguous alignment by + : concatenating the HSPs. The ambiguity may indicate the + : presence of multiple, similar domains in one or both of the + : aligned sequences. This ambiguity is recorded using the + : ambiguous_aln() method. + : + : This method does not attempt to discern biologically + : significant vs. insignificant overlaps. The allowable amount of + : overlap can be set with the overlap() method or with the -OVERLAP + : parameter used when constructing the Blast & Sbjct objects. + : + : For a given hit, both the query and the sbjct sequences are + : tiled independently. + : + : -- If only query sequence HSPs overlap, + : this may suggest multiple domains in the sbjct. + : -- If only sbjct sequence HSPs overlap, + : this may suggest multiple domains in the query. + : -- If both query & sbjct sequence HSPs overlap, + : this suggests multiple domains in both. + : -- If neither query & sbjct sequence HSPs overlap, + : this suggests either no multiple domains in either + : sequence OR that both sequences have the same + : distribution of multiple similar domains. + : + : This method can deal with the special case of when multiple + : HSPs exactly overlap. + : + : Efficiency concerns: + : Speed will be an issue for sequences with numerous HSPs. + : + Bugs : Currently, _tile_hsps() does not properly account for + : the number of non-tiled but overlapping HSPs, which becomes a problem + : as overlap() grows. Large values overlap() may thus lead to + : incorrect statistics for some hits. For best results, keep overlap() + : below 5 (DEFAULT IS 2). For more about this, see the "HSP Tiling and + : Ambiguous Alignments" section. + +See Also : L<_adjust_contigs()|_adjust_contigs>, L<ambiguous_aln()|ambiguous_aln>, L<overlap()|overlap>, L<frac_identical()|frac_identical>, L<frac_conserved()|frac_conserved>, L<frac_aligned_query()|frac_aligned_query>, L<frac_aligned_hit()|frac_aligned_hit>, L<num_unaligned_query()|num_unaligned_query>, L<num_unaligned_hit()|num_unaligned_hit>, L<HSP Tiling and Ambiguous Alignments> + +=cut + +#-------------- +sub _tile_hsps { +#-------------- + my $self = shift; +# my $gapped = $self->parent->gapped || 0; # no special treatment + + $self->{'_tile_hsps'} = 1; + $self->{'_gaps_query'} = 0; + $self->{'_gaps_sbjct'} = 0; + + ## Simple summation scheme. Valid if there is only one HSP. + if((defined($self->{'_n'}) and $self->{'_n'} == 1) or $self->num_hsps == 1) { + my $hsp = $self->hsp; + $self->{'_length_aln_query'} = $hsp->length('query'); + $self->{'_length_aln_sbjct'} = $hsp->length('sbjct'); + $self->{'_length_aln_total'} = $hsp->length('total'); + ($self->{'_totalIdentical'},$self->{'_totalConserved'}) = $hsp->matches(); + $self->{'_gaps_query'} = $hsp->gaps('query'); + $self->{'_gaps_sbjct'} = $hsp->gaps('sbjct'); + +# print "_tile_hsps(): single HSP, easy stats.\n"; + return; + } else { +# print STDERR "$ID: _tile_hsps: summing multiple HSPs\n"; + $self->{'_length_aln_query'} = 0; + $self->{'_length_aln_sbjct'} = 0; + $self->{'_length_aln_total'} = 0; + $self->{'_totalIdentical'} = 0; + $self->{'_totalConserved'} = 0; + } + + ## More than one HSP. Must tile HSPs. +# printf "\nTiling HSPs for %s (BLAST: %s)\n",$self->name, $self->parent->name; + my($hsp, $qstart, $qstop, $sstart, $sstop); + my(@qcontigs, @scontigs); + my $qoverlap = 0; + my $soverlap = 0; + my $max_overlap = $self->{'_overlap'}; + + foreach $hsp ($self->hsps()) { +# printf " HSP: %s\n%s\n",$hsp->name, $hsp->str('query'); +# printf " Length = %d; Identical = %d; Conserved = %d; Conserved(1-10): %d",$hsp->length, $hsp->length(-TYPE=>'iden'), $hsp->length(-TYPE=>'cons'), $hsp->length(-TYPE=>'cons',-START=>0,-STOP=>10); <STDIN>; + ($qstart, $qstop) = $hsp->range('query'); + ($sstart, $sstop) = $hsp->range('sbjct'); + + my ($qgaps, $sgaps) = $hsp->gaps(); + $self->{'_gaps_query'} += $qgaps; + $self->{'_gaps_sbjct'} += $sgaps; + + $self->{'_length_aln_total'} += $hsp->length; + ## Collect contigs in the query sequence. + $qoverlap = &_adjust_contigs('query', $hsp, $qstart, $qstop, \@qcontigs, $max_overlap); + + ## Collect contigs in the sbjct sequence (needed for domain data and gapped Blast). + $soverlap = &_adjust_contigs('sbjct', $hsp, $sstart, $sstop, \@scontigs, $max_overlap); + + ## Collect overall start and stop data for query and sbjct over all HSPs. + if(not defined $self->{'_queryStart'}) { + $self->{'_queryStart'} = $qstart; + $self->{'_queryStop'} = $qstop; + $self->{'_sbjctStart'} = $sstart; + $self->{'_sbjctStop'} = $sstop; + } else { + $self->{'_queryStart'} = ($qstart < $self->{'_queryStart'} ? $qstart : $self->{'_queryStart'}); + $self->{'_queryStop'} = ($qstop > $self->{'_queryStop'} ? $qstop : $self->{'_queryStop'}); + $self->{'_sbjctStart'} = ($sstart < $self->{'_sbjctStart'} ? $sstart : $self->{'_sbjctStart'}); + $self->{'_sbjctStop'} = ($sstop > $self->{'_sbjctStop'} ? $sstop : $self->{'_sbjctStop'}); + } + } + + ## Collect data across the collected contigs. + +# print "\nQUERY CONTIGS:\n"; +# print " gaps = $self->{'_gaps_query'}\n"; + + foreach(@qcontigs) { +# print " query contig: $_->{'start'} - $_->{'stop'}\n"; +# print " iden = $_->{'iden'}; cons = $_->{'cons'}\n"; + $self->{'_length_aln_query'} += $_->{'stop'} - $_->{'start'} + 1; + $self->{'_totalIdentical'} += $_->{'iden'}; + $self->{'_totalConserved'} += $_->{'cons'}; + } + + ## Collect data for sbjct contigs. Important for gapped Blast. + ## The totalIdentical and totalConserved numbers will be the same + ## as determined for the query contigs. + +# print "\nSBJCT CONTIGS:\n"; +# print " gaps = $self->{'_gaps_sbjct'}\n"; + + foreach(@scontigs) { +# print " sbjct contig: $_->{'start'} - $_->{'stop'}\n"; +# print " iden = $_->{'iden'}; cons = $_->{'cons'}\n"; + $self->{'_length_aln_sbjct'} += $_->{'stop'} - $_->{'start'} + 1; + } +# <STDIN>; + + if($qoverlap) { + if($soverlap) { $self->ambiguous_aln('qs'); +# print "\n*** AMBIGUOUS ALIGNMENT: Query and Sbjct\n\n"; + } + else { $self->ambiguous_aln('q'); +# print "\n*** AMBIGUOUS ALIGNMENT: Query\n\n"; + } + } elsif($soverlap) { + $self->ambiguous_aln('s'); +# print "\n*** AMBIGUOUS ALIGNMENT: Sbjct\n\n"; + } + + # Adjust length based on BLAST flavor. + my $prog = $self->parent->program; + if($prog eq 'TBLASTN') { + $self->{'_length_aln_sbjct'} /= 3; + } elsif($prog eq 'BLASTX' ) { + $self->{'_length_aln_query'} /= 3; + } elsif($prog eq 'TBLASTX') { + $self->{'_length_aln_query'} /= 3; + $self->{'_length_aln_sbjct'} /= 3; + } +} + + + +=head2 _adjust_contigs + + Usage : n/a; called automatically during object construction. + Purpose : Builds HSP contigs for a given BLAST hit. + : Utility method called by _tile_hsps() + Returns : + Argument : + Throws : Exceptions propagated from Bio::Tools::Blast::HSP::matches() + : for invalid sub-sequence ranges. + Status : Experimental + Comments : This method does not currently support gapped alignments. + : Also, it does not keep track of the number of HSPs that + : overlap within the amount specified by overlap(). + : This will lead to significant tracking errors for large + : overlap values. + +See Also : L<overlap()|overlap>, L<_tile_hsps()|_tile_hsps>, B<Bio::Tools::Blast::HSP>::matches + +=cut + +#------------------- +sub _adjust_contigs { +#------------------- + my ($seqType, $hsp, $start, $stop, $contigs_ref, $max_overlap) = @_; + + my $overlap = 0; + my ($numID, $numCons); + +# print "Testing $seqType data: HSP (${\$hsp->name}); $start, $stop\n"; + foreach(@$contigs_ref) { +# print " Contig: $_->{'start'} - $_->{'stop'}, iden= $_->{'iden'}, cons= $_->{'cons'}\n"; + + ## Test special case of a nested HSP. Skip it. + if($start >= $_->{'start'} and $stop <= $_->{'stop'}) { +# print "----> Nested HSP. Skipping.\n"; + $overlap = 1; + next; + } + + ## Test for overlap at beginning of contig. + if($start < $_->{'start'} and $stop > ($_->{'start'} + $max_overlap)) { +# print "----> Overlaps beg: existing beg,end: $_->{'start'},$_->{'stop'}, new beg,end: $start,$stop\n";<STDIN>; + # Collect stats over the non-overlapping region. + eval { + ($numID, $numCons) = $hsp->matches(-SEQ =>$seqType, + -START =>$start, + -STOP =>$_->{'start'}-1); + }; + if($@) { warn "\a\n$@\n"; } + else { + $_->{'start'} = $start; # Assign a new start coordinate to the contig + $_->{'iden'} += $numID; # and add new data to #identical, #conserved. + $_->{'cons'} += $numCons; + $overlap = 1; + } + } + + ## Test for overlap at end of contig. + if($stop > $_->{'stop'} and $start < ($_->{'stop'} - $max_overlap)) { +# print "----> Overlaps end: existing beg,end: $_->{'start'},$_->{'stop'}, new beg,end: $start,$stop\n";<STDIN>; + # Collect stats over the non-overlapping region. + eval { + ($numID,$numCons) = $hsp->matches(-SEQ =>$seqType, + -START =>$_->{'stop'}, + -STOP =>$stop); + }; + if($@) { warn "\a\n$@\n"; } + else { + $_->{'stop'} = $stop; # Assign a new stop coordinate to the contig + $_->{'iden'} += $numID; # and add new data to #identical, #conserved. + $_->{'cons'} += $numCons; + $overlap = 1; + } + } + $overlap && do { +# print " New Contig data:\n"; +# print " Contig: $_->{'start'} - $_->{'stop'}, iden= $_->{'iden'}, cons= $_->{'cons'}\n"; + last; + }; + } + ## If there is no overlap, add the complete HSP data. + !$overlap && do { +# print "No overlap. Adding new contig.\n"; + ($numID,$numCons) = $hsp->matches(-SEQ=>$seqType); + push @$contigs_ref, {'start'=>$start, 'stop'=>$stop, + 'iden'=>$numID, 'cons'=>$numCons }; + }; +# <STDIN>; + $overlap; +} + + +=head2 ambiguous_aln + + Usage : $ambig_code = $sbjct_object->ambiguous_aln(); + Purpose : Sets/Gets ambiguity code data member. + Example : (see usage) + Returns : String = 'q', 's', 'qs', '-' + : 'q' = query sequence contains overlapping sub-sequences + : while sbjct does not. + : 's' = sbjct sequence contains overlapping sub-sequences + : while query does not. + : 'qs' = query and sbjct sequence contains overlapping sub-sequences + : relative to each other. + : '-' = query and sbjct sequence do not contains multiple domains + : relative to each other OR both contain the same distribution + : of similar domains. + Argument : n/a + Throws : n/a + Status : Experimental + +See Also : L<_tile_hsps()|_tile_hsps>, L<HSP Tiling and Ambiguous Alignments> + +=cut + +#-------------------- +sub ambiguous_aln { +#-------------------- + my $self = shift; + if(@_) { $self->{'_ambiguous_aln'} = shift; } + $self->{'_ambiguous_aln'} || '-'; +} + + + +=head2 overlap + + Usage : $blast_object->overlap( [integer] ); + Purpose : Gets/Sets the allowable amount overlap between different HSP sequences. + Example : $blast_object->overlap(5); + : $overlap = $blast_object->overlap(); + Returns : Integer. + Argument : integer. + Throws : n/a + Status : Experimental + Comments : Any two HSPs whose sequences overlap by less than or equal + : to the overlap() number of resides will be considered separate HSPs + : and will not get tiled by _adjust_contigs(). + +See Also : L<_adjust_contigs()|_adjust_contigs>, L<BUGS | BUGS> + +=cut + +#------------- +sub overlap { +#------------- + my $self = shift; + if(@_) { $self->{'_overlap'} = shift; } + defined $self->{'_overlap'} ? $self->{'_overlap'} : 0; +} + + + + +=head2 score + + Usage : $sbjct_object->score(); + Purpose : Gets the BLAST score of the best HSP for the current Blast hit. + Example : $score = $sbjct_object->score(); + Returns : Integer + Argument : n/a + Throws : n/a + +See Also : L<bits()|bits> + +=cut + +#---------- +sub score { +#---------- + my $self = shift; + + # The check for $self->{'_score'} is a remnant from the 'query' mode days + # in which the sbjct object would collect data from the description line only. + + my ($score); + if(not defined($self->{'_score'})) { + $score = $self->hsp->score; + } else { + $score = $self->{'_score'}; + } + return $score; +} + + + +=head2 bits + + Usage : $sbjct_object->bits(); + Purpose : Gets the BLAST bit score of the best HSP for the current Blast hit. + Example : $bits = $sbjct_object->bits(); + Returns : Integer + Argument : n/a + Throws : Exception if bit score is not set. + Comments : For BLAST1, the non-bit score is listed in the summary line. + +See Also : L<score()|score> + +=cut + +#--------- +sub bits { +#--------- + my $self = shift; + + # The check for $self->{'_bits'} is a remnant from the 'query' mode days + # in which the sbjct object would collect data from the description line only. + + my ($bits); + if(not defined($self->{'_bits'})) { + $bits = $self->hsp->bits; + } else { + $bits = $self->{'_bits'}; + } + return $bits; +} + + + +=head2 n + + Usage : $sbjct_object->n(); + Purpose : Gets the N number for the current Blast hit. + : This is the number of HSPs in the set which was ascribed + : the lowest P-value (listed on the description line). + : This number is not the same as the total number of HSPs. + : To get the total number of HSPs, use num_hsps(). + Example : $n = $sbjct_object->n(); + Returns : Integer + Argument : n/a + Throws : Exception if HSPs have not been set (BLAST2 reports). + Comments : Note that the N parameter is not reported in gapped BLAST2. + : Calling n() on such reports will result in a call to num_hsps(). + : The num_hsps() method will count the actual number of + : HSPs in the alignment listing, which may exceed N in + : some cases. + +See Also : L<num_hsps()|num_hsps> + +=cut + +#----- +sub n { +#----- + my $self = shift; + + # The check for $self->{'_n'} is a remnant from the 'query' mode days + # in which the sbjct object would collect data from the description line only. + + my ($n); + if(not defined($self->{'_n'})) { + $n = $self->hsp->n; + } else { + $n = $self->{'_n'}; + } + $n ||= $self->num_hsps; + + return $n; +} + + + +=head2 frame + + Usage : $sbjct_object->frame(); + Purpose : Gets the reading frame for the hit sequence (TBLASTN/X only). + Example : $frame = $sbjct_object->frame(); + Returns : Integer (-3 .. +3). + Argument : n/a + Throws : Exception if HSPs have not been set (BLAST2 reports). + +See Also : L<hsps()|hsps> + +=cut + +#---------- +sub frame { +#---------- + my $self = shift; + + # The check for $self->{'_frame'} is a remnant from the 'query' mode days + # in which the sbjct object would collect data from the description line only. + + my ($frame); + if(not defined($self->{'_frame'})) { + $frame = $self->hsp->frame; + } else { + $frame = $self->{'_frame'}; + } + return $frame; +} + + + + + +=head2 p + + Usage : $sbjct_object->p( [format] ); + Purpose : Get the P-value for the given BLAST hit. + : (Note that P-values are not provided with NCBI Blast2 reports). + Example : $p = $sbjct->p; + : $p = $sbjct->p('exp'); # get exponent only. + : ($num, $exp) = $sbjct->p('parts'); # split sci notation into parts + Returns : Float or scientific notation number (the raw P-value, DEFAULT). + : Integer if format == 'exp' (the magnitude of the base 10 exponent). + : 2-element list (float, int) if format == 'parts' and P-value + : is in scientific notation (See Comments). + Argument : format: string of 'raw' | 'exp' | 'parts' + : 'raw' returns value given in report. Default. (1.2e-34) + : 'exp' returns exponent value only (34) + : 'parts' returns the decimal and exponent as a + : 2-element list (1.2, -34) (See Comments). + Throws : Exception if the P-value is not defined, which will occur + : with any NCBI Blast2 report. + Comments : Using the 'parts' argument is not recommended since it will not + : work as expected if the P-value is not in scientific notation. + : That is, floats are not converted into sci notation before + : splitting into parts. + +See Also : L<expect()|expect>, L<signif()|signif>, L<get_exponent()|get_exponent> + +=cut + +#-------- +sub p { +#-------- +# Some duplication of logic for p(), expect() and signif() for the sake of performance. + my ($self, $fmt) = @_; + + my ($val); + $fmt ||= $_signif_fmt; + + # $val can be zero. + if(not defined($val = $self->{'_p'})) { + ## P-value not defined, must be a NCBI Blast2 report. + my $note = ''; + if($self->parent->_layout() == 2) { + $note = "Blast2 does not report P-values. Use expect() instead."; + } + $self->throw("Can't get P-value: undefined.", $note); + } + + return $val if not $fmt or $fmt =~ /^raw/i; + ## Special formats: exponent-only or as list. + return &get_exponent($val) if $fmt =~ /^exp/i; + return (split (/eE/, $val)) if $fmt =~ /^parts/i; + + ## Default: return the raw P-value. + return $val; +} + + + +=head2 expect + + Usage : $sbjct_object->expect( [format] ); + Purpose : Get the Expect value for the given BLAST hit. + Example : $e = $sbjct->expect; + : $e = $sbjct->expect('exp'); # get exponent only. + : ($num, $exp) = $sbjct->expect('parts'); # split sci notation into parts + Returns : Float or scientific notation number (the raw expect value, DEFAULT). + : Integer if format == 'exp' (the magnitude of the base 10 exponent). + : 2-element list (float, int) if format == 'parts' and Expect + : is in scientific notation (see Comments). + Argument : format: string of 'raw' | 'exp' | 'parts' + : 'raw' returns value given in report. Default. (1.2e-34) + : 'exp' returns exponent value only (34) + : 'parts' returns the decimal and exponent as a + : 2-element list (1.2, -34) (see Comments). + Throws : Exception if the Expect value is not defined. + Comments : Using the 'parts' argument is not recommended since it will not + : work as expected if the expect value is not in scientific notation. + : That is, floats are not converted into sci notation before + : splitting into parts. + +See Also : L<p()|p>, L<signif()|signif>, L<get_exponent()|get_exponent> + +=cut + +#----------- +sub expect { +#----------- +# Some duplication of logic for p(), expect() and signif() for the sake of performance. + my ($self, $fmt) = @_; + + my $val = $self->{'_expect'}; + $fmt ||= $_signif_fmt; + + # $val can be zero. + defined($val) or $self->throw("Can't get Expect value: HSPs may not have been set."); + + return $val if not $fmt or $fmt =~ /^raw/i; + ## Special formats: exponent-only or as list. + return &get_exponent($val) if $fmt =~ /^exp/i; + return (split (/eE/, $val)) if $fmt =~ /^parts/i; + + ## Default: return the raw Expect-value. + return $val; +} + + + +=head2 signif + + Usage : $sbjct_object->signif( [format] ); + Purpose : Get the P or Expect value for the given BLAST hit. + : The value returned is the one which is reported in the description + : section of the Blast report. For Blast1 and WU-Blast2, this + : is a P-value, for Blast2, it is an Expect value. + Example : $obj->signif() # returns 1.3e-34 + : $obj->signif('exp') # returns -34 + : $obj->signif('parts') # returns (1.3, -34) + Returns : Float or scientific notation number (the raw P/Expect value, DEFAULT). + : Integer if format == 'exp' (the magnitude of the base 10 exponent). + : 2-element list (float, int) if format == 'parts' and P/Expect value + : is in scientific notation (see Comments). + Argument : format: string of 'raw' | 'exp' | 'parts' + : 'raw' returns value given in report. Default. (1.2e-34) + : 'exp' returns exponent value only (34) + : 'parts' returns the decimal and exponent as a + : 2-element list (1.2, -34) (see Comments). + Throws : n/a + Status : Deprecated. Use p() or expect(). + Comments : The signif() method provides a way to deal with the fact that + : Blast1 and Blast2 formats differ in what is reported in the + : description lines of each hit in the Blast report. The signif() + : method frees any client code from having to know if this is a P-value + : or an Expect value, making it easier to write code that can process + : both Blast1 and Blast2 reports. This is not necessarily a good thing, since + : one should always know when one is working with P-values or + : Expect values (hence the deprecated status). + : Use of expect() is recommended since all hits will have an Expect value. + : + : Using the 'parts' argument is not recommended since it will not + : work as expected if the expect value is not in scientific notation. + : That is, floats are not converted into sci notation before + : splitting into parts. + +See Also : L<p()|p>, L<expect()|expect>, L<get_exponent()|get_exponent> + +=cut + +#------------- +sub signif { +#------------- +# Some duplication of logic for p(), expect() and signif() for the sake of performance. + my ($self, $fmt) = @_; + + my $val = defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'}; + $fmt ||= $_signif_fmt; + + # $val can be zero. + defined($val) or $self->throw("Can't get P- or Expect value: HSPs may not have been set."); + + return $val if not $fmt or $fmt =~ /^raw/i; + ## Special formats: exponent-only or as list. + return &get_exponent($val) if $fmt =~ /^exp/i; + return (split (/eE/, $val)) if $fmt =~ /^parts/i; + + ## Default: return the raw P/Expect-value. + return $val; +} + + + +=head2 desc + + Usage : $sbjct_object->desc( [integer] ); + Purpose : Get the description for the given BLAST hit. + Example : (see usage) + Returns : String + Argument : Integer (optional) indicating the desired length of the + : description string to be returned. + Throws : n/a + +See Also : L<_set_desc()|_set_desc> + +=cut + +#--------- +sub desc { +#--------- + my( $self, $len ) = @_; + $len = (defined $len) ? $len : (CORE::length $self->{'_desc'}); + substr( $self->{'_desc'}, 0 ,$len ); +} + + + +=head2 database + + Usage : $sbjct_object->database(); + Purpose : Get the name of the database for the hit sequence. + Example : (see usage) + Returns : String + Argument : n/a + Throws : n/a + Status : Deprecated. Use Bio::Tools::Blast::database() + Extracting database name from the seq identifier is error prone. + Comments : Database id should be the same for all hits in a given + : BLAST report, however, they do not always have the same + : name as the database name extraced by the Blast.pm object. + : The Sbjct.pm database id is obtained from the summary line. + +=cut + +#-------------- +sub database { + my $self = shift; + $self->warn("Bio::Tools::Sbjct::database() is deprecated.\nNo useful information is provided by this method.\nUse Bio::Tools::Blast::database().\n"); + return $self->{'_db'}; +} +#-------------- + + + + +=head2 hsps + + Usage : $sbjct_object->hsps(); + Purpose : Get a list containing all HSP objects. + : Get the numbers of HSPs for the current hit. + Example : @hsps = $sbjct_object->hsps(); + : $num = $sbjct_object->hsps(); # alternatively, use num_hsps() + Returns : Array context : list of Bio::Tools::Blast::HSP.pm objects. + : Scalar context: integer (number of HSPs). + : (Equivalent to num_hsps()). + Argument : n/a. Relies on wantarray + Throws : Exception if the HSPs have not been collected. + +See Also : L<hsp()|hsp>, L<num_hsps()|num_hsps>, L<_set_hsps()|_set_hsps> + +=cut + +#--------- +sub hsps { +#--------- + my $self = shift; + + if (not ref $self->{'_hsps'}) { + $self->throw("Can't get HSPs: data not collected."); + } + + return wantarray + # returning list containing all HSPs. + ? @{$self->{'_hsps'}} + # returning number of HSPs. + : scalar(@{$self->{'_hsps'}}); +} + + + +=head2 hsp + + Usage : $sbjct_object->hsp( [string] ); + Purpose : Get a single HSP.pm object for the present Sbjct.pm object. + Example : $hspObj = $sbjct_object->hsp; # same as 'best' + : $hspObj = $sbjct_object->hsp('best'); + : $hspObj = $sbjct_object->hsp('worst'); + Returns : Object reference for a Bio::Tools::Blast::HSP.pm object. + Argument : String (or no argument). + : No argument (default) = highest scoring HSP (same as 'best'). + : 'best' or 'first' = highest scoring HSP. + : 'worst' or 'last' = lowest scoring HSP. + Throws : Exception if the HSPs have not been collected. + : Exception if an unrecognized argument is used. + +See Also : L<hsps()|hsps>, L<num_hsps()|num_hsps>, L<_set_hsps()|_set_hsps> + +=cut + +#---------- +sub hsp { +#---------- + my( $self, $option ) = @_; + $option ||= 'best'; + + if (not ref $self->{'_hsps'}) { + $self->throw("Can't get HSPs: data not collected."); + } + + my @hsps = @{$self->{'_hsps'}}; + + return $hsps[0] if $option =~ /best|first|1/i; + return $hsps[$#hsps] if $option =~ /worst|last/i; + + $self->throw("Can't get HSP for: $option", + "Valid arguments: 'best', 'worst'"); +} + + + +=head2 num_hsps + + Usage : $sbjct_object->num_hsps(); + Purpose : Get the number of HSPs for the present Blast hit. + Example : $nhsps = $sbjct_object->num_hsps(); + Returns : Integer + Argument : n/a + Throws : Exception if the HSPs have not been collected. + +See Also : L<hsps()|hsps> + +=cut + +#------------- +sub num_hsps { +#------------- + my $self = shift; + + if (not defined $self->{'_hsps'}) { + $self->throw("Can't get HSPs: data not collected."); + } + + return scalar(@{$self->{'_hsps'}}); +} + + + +=head2 length + + Usage : $sbjct_object->length(); + Purpose : Get the total length of the hit sequence. + Example : $len = $sbjct_object->length(); + Returns : Integer + Argument : n/a + Throws : n/a + Comments : Developer note: when using the built-in length function within + : this module, call it as CORE::length(). + +See Also : L<logical_length()|logical_length>, L<length_aln()|length_aln> + +=cut + +#----------- +sub length { +#----------- + my $self = shift; + $self->{'_length'}; +} + + +=head2 logical_length + + Usage : $sbjct_object->logical_length( [seq_type] ); + : (mostly intended for internal use). + Purpose : Get the logical length of the hit sequence. + : If the Blast is a TBLASTN or TBLASTX, the returned length + : is the length of the would-be amino acid sequence (length/3). + : For all other BLAST flavors, this function is the same as length(). + Example : $len = $sbjct_object->logical_length(); + Returns : Integer + Argument : seq_type = 'query' or 'sbjct' (default = 'query') + Throws : n/a + Comments : This is important for functions like frac_aligned_query() + : which need to operate in amino acid coordinate space when dealing + : with [T]BLAST[NX] type reports. + +See Also : L<length()|length>, L<frac_aligned_query()|frac_aligned_query>, L<frac_aligned_hit()|frac_aligned_hit> + +=cut + +#-------------------- +sub logical_length { +#-------------------- + my $self = shift; + my $seqType = shift || 'query'; + + # Return logical sbjct length + $seqType eq 'sbjct' and return + $self->{'_logical_length'} || $self->{'_length'}; + + # Otherwise, return logical query length + my $qlen = $self->parent->length; + + # Adjust length based on BLAST flavor. + my $prog = $self->parent->program; + if($prog =~ /T?BLASTX/ ) { + $qlen /= 3; + } + return $qlen; +} + + + +=head2 length_aln + + Usage : $sbjct_object->length_aln( [seq_type] ); + Purpose : Get the total length of the aligned region for query or sbjct seq. + : This number will include all HSPs + Example : $len = $sbjct_object->length_aln(); # default = query + : $lenAln = $sbjct_object->length_aln('query'); + Returns : Integer + Argument : seq_Type = 'query' | 'sbjct' (Default = 'query') + Throws : Exception if the argument is not recognized. + Comments : This method will report the logical length of the alignment, + : meaning that for TBLAST[NX] reports, the length is reported + : using amino acid coordinate space (i.e., nucleotides / 3). + : + : This method requires that all HSPs be tiled. If they have not + : already been tiled, they will be tiled first. + : If you don't want the tiled data, iterate through each HSP + : calling length() on each (use hsps() to get the HSPs). + +See Also : L<length()|length>, L<frac_aligned_query()|frac_aligned_query>, L<frac_aligned_hit()|frac_aligned_hit>, L<gaps()|gaps>, L<_tile_hsps()|_tile_hsps>, B<Bio::Tools::Blast::HSP::length()> + +=cut + +#---------------' +sub length_aln { +#--------------- + my( $self, $type ) = @_; + + $type ||= 'query'; + + $self->_tile_hsps() if not $self->{'_tile_hsps'}; + + my $data = $self->{'_length_aln_'.$type}; + + ## If we don't have data, figure out what went wrong. + if(!$data) { + $self->throw("Can't get length aln for sequence type \"$type\"", + "Valid types are 'query', 'sbjct'"); + } + $data; +} + + +=head2 gaps + + Usage : $sbjct_object->gaps( [seq_type] ); + Purpose : Get the number of gaps in the aligned query, sbjct, or both sequences. + : Data is summed across all HSPs. + Example : $qgaps = $sbjct_object->gaps('query'); + : $sgaps = $sbjct_object->gaps('sbjct'); + : $tgaps = $sbjct_object->gaps(); # default = total (query + sbjct) + Returns : scalar context: integer + : array context without args: two-element list of integers + : (queryGaps, sbjctGaps) + : Array context can be "induced" by providing an argument of 'list' or 'array'. + Argument : seq_type: 'query' | 'sbjct' | 'total' | 'list' (default = 'total') + Throws : n/a + Comments : If you need data for each HSP, use hsps() and then interate + : through each HSP object. + : This method requires that all HSPs be tiled. If they have not + : already been tiled, they will be tiled first. + : Not relying on wantarray since that will fail in situations + : such as printf "%d", $hit->gaps() in which you might expect to + : be printing the total gaps, but evaluates to array context. + +See Also : L<length_aln()|length_aln> + +=cut + +#---------- +sub gaps { +#---------- + my( $self, $seqType ) = @_; + + $seqType ||= (wantarray ? 'list' : 'total'); + + $self->_tile_hsps() if not $self->{'_tile_hsps'}; + + $seqType = lc($seqType); + + if($seqType =~ /list|array/i) { + return ($self->{'_gaps_query'}, $self->{'_gaps_sbjct'}); + } + + if($seqType eq 'total') { + return ($self->{'_gaps_query'} + $self->{'_gaps_sbjct'}) || 0; + } else { + return $self->{'_gaps_'.$seqType} || 0; + } +} + + + +=head2 matches + + Usage : $sbjct_object->matches( [class] ); + Purpose : Get the total number of identical or conserved matches + : (or both) across all HSPs. + : (Note: 'conservative' matches are indicated as 'positives' + : in the Blast report.) + Example : ($id,$cons) = $sbjct_object->matches(); # no argument + : $id = $sbjct_object->matches('id'); + : $cons = $sbjct_object->matches('cons'); + Returns : Integer or a 2-element array of integers + Argument : class = 'id' | 'cons' OR none. + : If no argument is provided, both identical and conservative + : numbers are returned in a two element list. + : (Other terms can be used to refer to the conservative + : matches, e.g., 'positive'. All that is checked is whether or + : not the supplied string starts with 'id'. If not, the + : conservative matches are returned.) + Throws : Exception if the requested data cannot be obtained. + Comments : If you need data for each HSP, use hsps() and then interate + : through the HSP objects. + : Does not rely on wantarray to return a list. Only checks for + : the presence of an argument (no arg = return list). + +See Also : B<Bio::Tools::Blast::HSP::matches()>, L<hsps()|hsps> + +=cut + +#--------------- +sub matches { +#--------------- + my( $self, $arg) = @_; + my(@data,$data); + + if(!$arg) { + @data = ($self->{'_totalIdentical'}, $self->{'_totalConserved'}); + + return @data if @data; + + } else { + + if($arg =~ /^id/i) { + $data = $self->{'_totalIdentical'}; + } else { + $data = $self->{'_totalConserved'}; + } + return $data if $data; + } + + ## Something went wrong if we make it to here. + $self->throw("Can't get identical or conserved data: no data."); +} + + +=head2 start + + Usage : $sbjct->start( [seq_type] ); + Purpose : Gets the start coordinate for the query, sbjct, or both sequences + : in the Sbjct object. If there is more than one HSP, the lowest start + : value of all HSPs is returned. + Example : $qbeg = $sbjct->start('query'); + : $sbeg = $sbjct->start('sbjct'); + : ($qbeg, $sbeg) = $sbjct->start(); + Returns : scalar context: integer + : array context without args: list of two integers (queryStart, sbjctStart) + : Array context can be "induced" by providing an argument of 'list' or 'array'. + Argument : In scalar context: seq_type = 'query' or 'sbjct' + : (case insensitive). If not supplied, 'query' is used. + Throws : n/a + Comments : This method requires that all HSPs be tiled. If there is more than one + : HSP and they have not already been tiled, they will be tiled first. + : Remember that the start and end coordinates of all HSPs are + : normalized so that start < end. Strand information can only be + : obtained on an HSP-by-HSP basis by calling $hsp->strand(). + +See Also : L<end()|end>, L<range()|range>, L<HSP Tiling and Ambiguous Alignments>, B<Bio::Tools::Blast::HSP::start>() + +=cut + +#---------- +sub start { +#---------- + my ($self, $seqType) = @_; + + $seqType ||= (wantarray ? 'list' : 'query'); + + # If there is only one HSP, defer this call to the solitary HSP. + if($self->num_hsps == 1) { + return $self->hsp->start($seqType); + } else { + $self->_tile_hsps() if not $self->{'_tile_hsps'}; + if($seqType =~ /list|array/i) { + return ($self->{'_queryStart'}, $self->{'_sbjctStart'}); + } else { + ## Sensitive to member name changes. + $seqType = "_\L$seqType\E"; + return $self->{$seqType.'Start'}; + } + } +} + + +=head2 end + + Usage : $sbjct->end( [seq_type] ); + Purpose : Gets the end coordinate for the query, sbjct, or both sequences + : in the Sbjct object. If there is more than one HSP, the largest end + : value of all HSPs is returned. + Example : $qend = $sbjct->end('query'); + : $send = $sbjct->end('sbjct'); + : ($qend, $send) = $sbjct->end(); + Returns : scalar context: integer + : array context without args: list of two integers (queryEnd, sbjctEnd) + : Array context can be "induced" by providing an argument of 'list' or 'array'. + Argument : In scalar context: seq_type = 'query' or 'sbjct' + : (case insensitive). If not supplied, 'query' is used. + Throws : n/a + Comments : This method requires that all HSPs be tiled. If there is more than one + : HSP and they have not already been tiled, they will be tiled first. + : Remember that the start and end coordinates of all HSPs are + : normalized so that start < end. Strand information can only be + : obtained on an HSP-by-HSP basis by calling $hsp->strand(). + +See Also : L<start()|start>, L<range()|range>, L<HSP Tiling and Ambiguous Alignments>, B<Bio::Tools::Blast::HSP::end>() + +=cut + +#---------- +sub end { +#---------- + my ($self, $seqType) = @_; + + $seqType ||= (wantarray ? 'list' : 'query'); + + # If there is only one HSP, defer this call to the solitary HSP. + if($self->num_hsps == 1) { + return $self->hsp->end($seqType); + } else { + $self->_tile_hsps() if not $self->{'_tile_hsps'}; + if($seqType =~ /list|array/i) { + return ($self->{'_queryStop'}, $self->{'_sbjctStop'}); + } else { + ## Sensitive to member name changes. + $seqType = "_\L$seqType\E"; + return $self->{$seqType.'Stop'}; + } + } +} + +=head2 range + + Usage : $sbjct->range( [seq_type] ); + Purpose : Gets the (start, end) coordinates for the query or sbjct sequence + : in the HSP alignment. + Example : ($qbeg, $qend) = $sbjct->range('query'); + : ($sbeg, $send) = $sbjct->range('sbjct'); + Returns : Two-element array of integers + Argument : seq_type = string, 'query' or 'sbjct' (default = 'query') + : (case insensitive). + Throws : n/a + +See Also : L<start()|start>, L<end()|end> + +=cut + +#---------- +sub range { +#---------- + my ($self, $seqType) = @_; + $seqType ||= 'query'; + return ($self->start($seqType), $self->end($seqType)); +} + + +=head2 frac_identical + + Usage : $sbjct_object->frac_identical( [seq_type] ); + Purpose : Get the overall fraction of identical positions across all HSPs. + : The number refers to only the aligned regions and does not + : account for unaligned regions in between the HSPs, if any. + Example : $frac_iden = $sbjct_object->frac_identical('query'); + Returns : Float (2-decimal precision, e.g., 0.75). + Argument : seq_type: 'query' | 'sbjct' | 'total' + : default = 'total' (but see comments below). + Throws : n/a + Comments : Different versions of Blast report different values for the total + : length of the alignment. This is the number reported in the + : denominators in the stats section: + : "Identical = 34/120 Positives = 67/120". + : BLAST-GP uses the total length of the alignment (with gaps) + : WU-BLAST uses the length of the query sequence (without gaps). + : Therefore, when called without an argument or an argument of 'total', + : this method will report different values depending on the + : version of BLAST used. + : + : To get the fraction identical among only the aligned residues, + : ignoring the gaps, call this method with an argument of 'query' + : or 'sbjct'. + : + : If you need data for each HSP, use hsps() and then iterate + : through the HSP objects. + : This method requires that all HSPs be tiled. If they have not + : already been tiled, they will be tiled first. + +See Also : L<frac_conserved()|frac_conserved>, L<frac_aligned_query()|frac_aligned_query>, L<matches()|matches>, L<_tile_hsps()|_tile_hsps> + +=cut + +#------------------ +sub frac_identical { +#------------------ + my ($self, $seqType) = @_; + $seqType ||= 'total'; + + ## Sensitive to member name format. + $seqType = lc($seqType); + + $self->_tile_hsps() if not $self->{'_tile_hsps'}; + + sprintf( "%.2f", $self->{'_totalIdentical'}/$self->{'_length_aln_'.$seqType}); +} + + + +=head2 frac_conserved + + Usage : $sbjct_object->frac_conserved( [seq_type] ); + Purpose : Get the overall fraction of conserved positions across all HSPs. + : The number refers to only the aligned regions and does not + : account for unaligned regions in between the HSPs, if any. + Example : $frac_cons = $sbjct_object->frac_conserved('sbjct'); + Returns : Float (2-decimal precision, e.g., 0.75). + Argument : seq_type: 'query' | 'sbjct' | 'total' + : default = 'total' (but see comments below). + Throws : n/a + Comments : Different versions of Blast report different values for the total + : length of the alignment. This is the number reported in the + : denominators in the stats section: + : "Identical = 34/120 Positives = 67/120". + : BLAST-GP uses the total length of the alignment (with gaps) + : WU-BLAST uses the length of the query sequence (without gaps). + : Therefore, when called without an argument or an argument of 'total', + : this method will report different values depending on the + : version of BLAST used. + : + : To get the fraction conserved among only the aligned residues, + : ignoring the gaps, call this method with an argument of 'query' + : or 'sbjct'. + : + : If you need data for each HSP, use hsps() and then interate + : through the HSP objects. + : This method requires that all HSPs be tiled. If they have not + : already been tiled, they will be tiled first. + +See Also : L<frac_identical()|frac_identical>, L<matches()|matches>, L<_tile_hsps()|_tile_hsps> + +=cut + +#-------------------- +sub frac_conserved { +#-------------------- + my ($self, $seqType) = @_; + $seqType ||= 'total'; + + ## Sensitive to member name format. + $seqType = lc($seqType); + + $self->_tile_hsps() if not $self->{'_tile_hsps'}; + + sprintf( "%.2f", $self->{'_totalConserved'}/$self->{'_length_aln_'.$seqType}); +} + + + + +=head2 frac_aligned_query + + Usage : $sbjct_object->frac_aligned_query(); + Purpose : Get the fraction of the query sequence which has been aligned + : across all HSPs (not including intervals between non-overlapping + : HSPs). + Example : $frac_alnq = $sbjct_object->frac_aligned_query(); + Returns : Float (2-decimal precision, e.g., 0.75). + Argument : n/a + Throws : n/a + Comments : If you need data for each HSP, use hsps() and then interate + : through the HSP objects. + : To compute the fraction aligned, the logical length of the query + : sequence is used, meaning that for [T]BLASTX reports, the + : full length of the query sequence is converted into amino acids + : by dividing by 3. This is necessary because of the way + : the lengths of aligned sequences are computed. + : This method requires that all HSPs be tiled. If they have not + : already been tiled, they will be tiled first. + +See Also : L<frac_aligned_hit()|frac_aligned_hit>, L<_tile_hsps()|_tile_hsps>, L<logical_length()|logical_length>, L<length_aln()|length_aln> + +=cut + +#---------------------- +sub frac_aligned_query { +#---------------------- + my $self = shift; + + $self->_tile_hsps() if not $self->{'_tile_hsps'}; + + sprintf( "%.2f", $self->{'_length_aln_query'}/$self->logical_length('query')); +} + + + +=head2 frac_aligned_hit + + Usage : $sbjct_object->frac_aligned_hit(); + Purpose : Get the fraction of the hit (sbjct) sequence which has been aligned + : across all HSPs (not including intervals between non-overlapping + : HSPs). + Example : $frac_alnq = $sbjct_object->frac_aligned_hit(); + Returns : Float (2-decimal precision, e.g., 0.75). + Argument : n/a + Throws : n/a + Comments : If you need data for each HSP, use hsps() and then interate + : through the HSP objects. + : To compute the fraction aligned, the logical length of the sbjct + : sequence is used, meaning that for TBLAST[NX] reports, the + : full length of the sbjct sequence is converted into amino acids + : by dividing by 3. This is necessary because of the way + : the lengths of aligned sequences are computed. + : This method requires that all HSPs be tiled. If they have not + : already been tiled, they will be tiled first. + +See Also : L<frac_aligned_query()|frac_aligned_query>, L<matches()|matches>, L<_tile_hsps()|_tile_hsps>, L<logical_length()|logical_length>, L<length_aln()|length_aln> + +=cut + +#-------------------- +sub frac_aligned_hit { +#-------------------- + my $self = shift; + + $self->_tile_hsps() if not $self->{'_tile_hsps'}; + + sprintf( "%.2f", $self->{'_length_aln_sbjct'}/$self->logical_length('sbjct')); +} + +# Safety-net methods for those who try don't read or remember the API. +# Redirecting to the proper method. These 'sbjct' versions may be more +# consistent with the API of this module since there are numerous other +# instances of using 'sbjct' in arguments. However, 'sbjct' is a bit tech-ee. + +#----------------------- +sub frac_aligned_sbjct { my $self=shift; $self->frac_aligned_hit(@_); } +#----------------------- +sub num_unaligned_sbjct { my $self=shift; $self->num_unaligned_hit(@_); } +#----------------------- + + + +=head2 num_unaligned_hit + + Usage : $sbjct_object->num_unaligned_hit(); + Purpose : Get the number of the unaligned residues in the hit sequence. + : Sums across all all HSPs. + Example : $num_unaln = $sbjct_object->num_unaligned_hit(); + Returns : Integer + Argument : n/a + Throws : n/a + Comments : See notes regarding logical lengths in the comments for frac_aligned_hit(). + : They apply here as well. + : If you need data for each HSP, use hsps() and then interate + : through the HSP objects. + : This method requires that all HSPs be tiled. If they have not + : already been tiled, they will be tiled first. + +See Also : L<num_unaligned_query()|num_unaligned_query>, L<_tile_hsps()|_tile_hsps>, L<frac_aligned_hit()|frac_aligned_hit> + +=cut + +#--------------------- +sub num_unaligned_hit { +#--------------------- + my $self = shift; + + $self->_tile_hsps() if not $self->{'_tile_hsps'}; + + my $num = $self->logical_length('sbjct') - $self->{'_length_aln_sbjct'}; + ($num < 0 ? 0 : $num ); +} + + +=head2 num_unaligned_query + + Usage : $sbjct_object->num_unaligned_query(); + Purpose : Get the number of the unaligned residues in the query sequence. + : Sums across all all HSPs. + Example : $num_unaln = $sbjct_object->num_unaligned_query(); + Returns : Integer + Argument : n/a + Throws : n/a + Comments : See notes regarding logical lengths in the comments for frac_aligned_query(). + : They apply here as well. + : If you need data for each HSP, use hsps() and then interate + : through the HSP objects. + : This method requires that all HSPs be tiled. If they have not + : already been tiled, they will be tiled first. + +See Also : L<num_unaligned_hit()|num_unaligned_hit>, L<_tile_hsps()|_tile_hsps>, L<frac_aligned_query()|frac_aligned_query> + +=cut + +#----------------------- +sub num_unaligned_query { +#----------------------- + my $self = shift; + + $self->_tile_hsps() if not $self->{'_tile_hsps'}; + + my $num = $self->logical_length('query') - $self->{'_length_aln_query'}; + ($num < 0 ? 0 : $num ); +} + + + +=head2 seq_inds + + Usage : $hit->seq_inds( seq_type, class, collapse ); + Purpose : Get a list of residue positions (indices) across all HSPs + : for identical or conserved residues in the query or sbjct sequence. + Example : @ind = $hit->seq_inds('query', 'identical'); + : @ind = $hit->seq_inds('sbjct', 'conserved'); + : @ind = $hit->seq_inds('sbjct', 'conserved', 1); + Returns : Array of integers + : May include ranges if collapse is non-zero. + Argument : seq_type = 'query' or 'sbjct' (default = query) + : class = 'identical' or 'conserved' (default = identical) + : (can be shortened to 'id' or 'cons') + : (actually, anything not 'id' will evaluate to 'conserved'). + : collapse = boolean, if non-zero, consecutive positions are merged + : using a range notation, e.g., "1 2 3 4 5 7 9 10 11" + : collapses to "1-5 7 9-11". This is useful for + : consolidating long lists. Default = no collapse. + Throws : n/a. + +See Also : B<Bio::Tools::Blast::HSP::seq_inds()> + +=cut + +#------------- +sub seq_inds { +#------------- + my ($self, $seq, $class, $collapse) = @_; + + $seq ||= 'query'; + $class ||= 'identical'; + $collapse ||= 0; + + my (@inds, $hsp); + foreach $hsp ($self->hsps) { + # This will merge data for all HSPs together. + push @inds, $hsp->seq_inds($seq, $class); + } + + # Need to remove duplicates and sort the merged positions. + if(@inds) { + my %tmp = map { $_, 1 } @inds; + @inds = sort {$a <=> $b} keys %tmp; + } + + require Bio::Tools::Blast::HSP; + + $collapse ? &Bio::Tools::Blast::HSP::collapse_nums(@inds) : @inds; +} + + +##################################################################################### +## INSTANCE METHODS ## +##################################################################################### + + +=head2 display + + Usage : $sbjct_object->display( %named_parameters ); + Purpose : Display information about Bio::Tools::Blast::Sbjct.pm data members + Example : $object->display(-SHOW=>'stats'); + Argument : Named parameters: -SHOW => 'hsp', + : -WHERE => filehandle (default = STDOUT) + Returns : n/a + Status : Deprecated, Buggy. + : Use Blast::table() or Blast::table_tiled() instead. + +See Also : L<_display_stats()|_display_stats>, L<_display_hsps()|_display_hsps>, B<Bio::Root::Object>::display + +=cut + +#------------ +sub display { +#------------ + my( $self, %param) = @_; + + $param{-HEADER} = 0; + $self->SUPER::display(%param); + + $self->show =~ /hsp/i and $self->_display_hsps( %param); +} + + +=head2 _display_stats + + Usage : n/a; called automatically by display() + Purpose : Display information about Bio::Tools::Blast.pm data members. + : Not tab-delimited. + : Prints the rank, name, database, score, p, n, length + : of the hit sequence, length of the aligned region, + : fraction identical, fraction conserved, and the fraction aligned + : for both the query and hit sequences. + Example : n/a + Argument : one argument = filehandle object. + Returns : printf call. + Status : Deprecated, Buggy. + : Use Blast::table() or Blast::table_tiled() instead. + +See Also : L<display()|display> + +=cut + +#------------------- +sub _display_stats { +#------------------- + my( $self, $OUT) = @_; + my $layout = $self->parent->_layout(); + + if($layout == 1) { + printf( $OUT "%-3d %-20s %-11s %-5d %-5d %-9.1e %-9.1e %-4d %-3d %-5d %-5d %-5s %-6.2f %-6.2f %-4d(%.2f) %-4d(%.2f)\n", + $self->rank(), $self->name(), + ($self->database() || 'UNKNOWN DB') , + $self->score(),$self->bits(),$self->p(),$self->expect(), + $self->gaps(), $self->n(), + $self->length(), $self->length_aln('query'), + $self->ambiguous_aln(), + $self->frac_aligned_query, $self->frac_aligned_hit, + $self->matches('iden'), $self->frac_identical('query'), + $self->matches('cons'), $self->frac_conserved('query')); + } else { + printf( $OUT "%-3d %-20s %-11s %-5d %-5d %-9.1e %-4d %-3d %-5d %-5d %-5s %-6.2f %-6.2f %-4d(%.2f) %-4d(%.2f)\n", + $self->rank(), $self->name(), + ($self->database() || 'UNKNOWN DB'), + $self->score(),$self->bits(),$self->expect(), + $self->gaps(), $self->num_hsps, + $self->length(), $self->length_aln('query'), + $self->ambiguous_aln(), + $self->frac_aligned_query, $self->frac_aligned_hit, + $self->matches('iden'), $self->frac_identical('query'), + $self->matches('cons'), $self->frac_conserved('query') ); + } + +} + + +=head2 _display_hsps + + Usage : n/a; called automatically by display() + Purpose : Display information about each HSP in the current BLAST hit. + Example : n/a + Argument : one argument = filehandle object. + Returns : printf call. + Status : Experimental + +See Also : L<display()|display>, B<Bio::Tools::Blast::HSP::display()> + +=cut + +#---------------- +sub _display_hsps { +#---------------- + my( $self, %param) = @_; + my $OUT = $self->fh(); + my $hspCount = 0; + my $reply = undef; + + not defined $self->{'_hsps'} and do{ print $OUT "\nHSP data not loaded.\n\n"; return; }; + +# print $OUT "\n",$self->num_hsps, " HSPs\n\n"; + + my($hsp); + foreach $hsp ( $self->hsps() ) { + $hspCount++; + print $OUT "\n ", '-'x25, "< HSP #$hspCount >", '-'x25, "\n"; + + $hsp->display( %param ); + + if( $hspCount < $self->num_hsps ) { + print "\n\n\t--------> <RET> FOR NEXT HSP, q TO QUIT <--------\n"; + chop( $reply = <STDIN>); + $reply =~ /^q/i and return; + } + } +} + + + +=head2 homol_data + + Usage : $data = $sbjct_object->homo_data( %named_params ); + Purpose : Gets specific similarity data about all HSPs. + Returns : String + Argument : named parameters forwarded to Bio::Tools::Blast::HSP::homol_data(). + Throws : n/a + Status : Experimental + Comments : This is an experimental method used for obtaining an + : indication of: + : 1) how many HSPs are in a Blast alignment + : 2) how strong the similarity is between sequences in the HSP + : 3) the endpoints of the alignment (sequence monomer numbers) + : "Homology data" for each HSP is in the format: + : "<integer> <start> <stop>" + : Data for different HSPs are tab-delimited. + +See Also : B<Bio::Tools::Blast::homol_data()>, B<Bio::Tools::Blast::HSP::homol_data()> + +=cut + +#--------------- +sub homol_data { +#--------------- + my ($self,%param) = @_; + my $data = $self->name(); + + foreach ($self->hsps) { + $data .="\t".$_->homol_data(%param); + } + ## Record ambiguous alignment status. + $data .= "\t".$self->ambiguous_aln(); + $data; +} + + +=head2 is_signif + + Usage : $sbjct_object->is_signif(); + Purpose : Determine if the given BLAST hit is significant. + Example : + Returns : Boolean + Argument : n/a + Throws : n/a + Comments : Uses criteria defined in the parent Blast.pm object + : to assess significance. Currently, only relies on + : P-value and length criteria. + : This mehtod is largely obsolete since are hits are now by + : definition significant. + +=cut + +#--------------- +sub is_signif { +#--------------- + my $self = shift; + return ($self->{'_significance'} <= $self->parent->signif and + $self->length > $self->parent->signif_len); +} + +##################################################################################### +## CLASS METHODS ## +##################################################################################### + +=head1 CLASS METHODS + +=head2 get_exponent + + Usage : &get_exponent( number ); + Purpose : Determines the power of 10 exponent of an integer, float, + : or scientific notation number. + Example : &get_exponent("4.0e-206"); + : &get_exponent("0.00032"); + : &get_exponent("10."); + : &get_exponent("1000.0"); + : &get_exponent("e+83"); + Argument : Float, Integer, or scientific notation number + Returns : Integer representing the exponent part of the number (+ or -). + : If argument == 0 (zero), return value is "-999". + Comments : Exponents are rounded up (less negative) if the mantissa is >= 5. + : Exponents are rounded down (more negative) if the mantissa is <= -5. + : This method probably belongs in a more general utility class. + +=cut + +#------------------ +sub get_exponent { +#------------------ + my $data = shift; + + my($num, $exp) = split /[eE]/, $data; + + if( defined $exp) { + $num = 1 if not $num; + $num >= 5 and $exp++; + $num <= -5 and $exp--; + } elsif( $num == 0) { + $exp = -999; + } elsif( not $num =~ /\./) { + $exp = CORE::length($num) -1; + } else { + $exp = 0; + $num .= '0' if $num =~ /\.$/; + my ($c); + my $rev = 0; + if($num !~ /^0/) { + $num = reverse($num); + $rev = 1; + } + do { $c = chop($num); + $c == 0 && $exp++; + } while( $c ne '.'); + + $exp = -$exp if $num == 0 and not $rev; + $exp -= 1 if $rev; + } + return $exp; +} + +1; +__END__ + +##################################################################################### +# END OF CLASS # +##################################################################################### + + +=head1 FOR DEVELOPERS ONLY + +=head2 Data Members + +Information about the various data members of this module is provided for those +wishing to modify or understand the code. Two things to bear in mind: + +=over 4 + +=item 1 Do NOT rely on these in any code outside of this module. + +All data members are prefixed with an underscore to signify that they are private. +Always use accessor methods. If the accessor doesn't exist or is inadequate, +create or modify an accessor (and let me know, too!). (An exception to this might +be for HSP.pm which is more tightly coupled to Sbjct.pm and +may access Sbjct data members directly for efficiency purposes, but probably +should not). + +=item 2 This documentation may be incomplete and out of date. + +It is easy for these data member descriptions to become obsolete as +this module is still evolving. Always double check this info and search +for members not described here. + +=back + +An instance of Bio::Tools::Blast::Sbjct.pm is a blessed reference to a hash containing +all or some of the following fields: + + FIELD VALUE + -------------------------------------------------------------- + _hsps : Array ref for a list of Bio::Tools::Blast::HSP.pm objects. + : + _db : Database identifier from the summary line. + : + _desc : Description data for the hit from the summary line. + : + _length : Total length of the hit sequence. + : + _score : BLAST score. + : + _bits : BLAST score (in bits). Matrix-independent. + : + _p : BLAST P value. Obtained from summary section. (Blast1/WU-Blast only) + : + _expect : BLAST Expect value. Obtained from summary section. + : + _n : BLAST N value (number of HSPs) (Blast1/WU-Blast2 only) + : + _frame : Reading frame for TBLASTN and TBLASTX analyses. + : + _totalIdentical: Total number of identical aligned monomers. + : + _totalConserved: Total number of conserved aligned monomers (a.k.a. "positives"). + : + _overlap : Maximum number of overlapping residues between adjacent HSPs + : before considering the alignment to be ambiguous. + : + _ambiguous_aln : Boolean. True if the alignment of all HSPs is ambiguous. + : + _length_aln_query : Length of the aligned region of the query sequence. + : + _length_aln_sbjct : Length of the aligned region of the sbjct sequence. + + + INHERITED DATA MEMBERS + ---------------------- + _name : From Bio::Root::Object.pm. String representing the name of the + : sbjct sequence obtained from the BLAST report. + : + _parent : From Bio::Root::Object.pm. This member contains a reference to the + : Bio::Tools::Blast.pm object to which this hit belongs. + + +=cut + +1;