Mercurial > repos > mahtabm > ensembl
comparison variant_effect_predictor/Bio/Tools/Blast/HSP.pm @ 0:1f6dce3d34e0
Uploaded
| author | mahtabm |
|---|---|
| date | Thu, 11 Apr 2013 02:01:53 -0400 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:1f6dce3d34e0 |
|---|---|
| 1 #---------------------------------------------------------------------------- | |
| 2 # PACKAGE : Bio::Tools::Blast::HSP | |
| 3 # AUTHOR : Steve Chervitz (sac@bioperl.org) | |
| 4 # CREATED : March 1996 | |
| 5 # STATUS : Alpha | |
| 6 # REVISION: $Id: HSP.pm,v 1.18 2002/10/22 07:38:48 lapp Exp $ | |
| 7 # | |
| 8 # For the latest version and documentation, visit the distribution site: | |
| 9 # http://genome-www.stanford.edu/perlOOP/bioperl/blast/ | |
| 10 # | |
| 11 # To generate documentation, run this module through pod2html | |
| 12 # (preferably from Perl v5.004 or better). | |
| 13 # | |
| 14 # Copyright (c) 1996-2000 Steve Chervitz. All Rights Reserved. | |
| 15 # This module is free software; you can redistribute it and/or | |
| 16 # modify it under the same terms as Perl itself. | |
| 17 #---------------------------------------------------------------------------- | |
| 18 | |
| 19 package Bio::Tools::Blast::HSP; | |
| 20 | |
| 21 use Bio::Root::Global qw(:devel); | |
| 22 use Bio::Root::Object (); | |
| 23 #use Bio::Root::Err qw(:std); | |
| 24 | |
| 25 @ISA = qw( Bio::Root::Object); | |
| 26 | |
| 27 use strict; | |
| 28 use vars qw($ID $GAP_SYMBOL @SCORE_CUTOFFS $Revision %STRAND_SYMBOL); | |
| 29 $ID = 'Bio::Tools::Blast::HSP'; | |
| 30 $Revision = '$Id: HSP.pm,v 1.18 2002/10/22 07:38:48 lapp Exp $'; #' | |
| 31 | |
| 32 $GAP_SYMBOL = '-'; # Need a more general way to handle gap symbols. | |
| 33 @SCORE_CUTOFFS = ( 100, 30 ); # Bit score cutoffs (see homol_score()). | |
| 34 %STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1); | |
| 35 | |
| 36 ## POD Documentation: | |
| 37 | |
| 38 =head1 NAME | |
| 39 | |
| 40 Bio::Tools::Blast::HSP - Bioperl BLAST High-Scoring Segment Pair object | |
| 41 | |
| 42 =head1 SYNOPSIS | |
| 43 | |
| 44 =head2 Object Creation | |
| 45 | |
| 46 The construction of HSP objects is handled by Bio::Tools::Blast:: Sbjct.pm. | |
| 47 You should not need to use this package directly. See L<_initialize()|_initialize> | |
| 48 for a description of constructor parameters. | |
| 49 | |
| 50 require Bio::Tools::Blast::HSP; | |
| 51 | |
| 52 $hspObj = eval{ new Bio::Tools::Blast::HSP(-DATA =>\@hspData, | |
| 53 -PARENT =>$sbjct_object, | |
| 54 -NAME =>$hspCount, | |
| 55 -PROGRAM =>'TBLASTN', | |
| 56 ); | |
| 57 }; | |
| 58 | |
| 59 @hspData includes the raw BLAST report data for a specific HSP, | |
| 60 and is prepared by Bio::Tools::Blast::Sbjct.pm. | |
| 61 | |
| 62 =head1 INSTALLATION | |
| 63 | |
| 64 This module is included with the central Bioperl distribution: | |
| 65 | |
| 66 http://bio.perl.org/Core/Latest | |
| 67 ftp://bio.perl.org/pub/DIST | |
| 68 | |
| 69 Follow the installation instructions included in the README file. | |
| 70 | |
| 71 =head1 DESCRIPTION | |
| 72 | |
| 73 The Bio::Tools::Blast::HSP.pm module encapsulates data and methods for | |
| 74 manipulating, parsing, and analyzing HSPs ("High-scoring Segment Pairs") | |
| 75 derived from BLAST sequence analysis. | |
| 76 | |
| 77 This module is a utility module used by the B<Bio::Tools::Blast::Sbjct.pm> | |
| 78 and is not intended for separate use. Please see documentation for | |
| 79 B<Bio::Tools::Blast.pm> for some basic information about using | |
| 80 HSP objects (L<Links:>). | |
| 81 | |
| 82 =over 0 | |
| 83 | |
| 84 =item * Supports BLAST versions 1.x and 2.x, gapped and ungapped. | |
| 85 | |
| 86 =back | |
| 87 | |
| 88 Bio::Tools::Blast::HSP.pm has the ability to extract a list of all | |
| 89 residue indices for identical and conservative matches along both | |
| 90 query and sbjct sequences. Since this degree of detail is not always | |
| 91 needed, this behavior does not occur during construction of the HSP | |
| 92 object. These data will automatically be collected as necessary as | |
| 93 the HSP.pm object is used. | |
| 94 | |
| 95 =head1 DEPENDENCIES | |
| 96 | |
| 97 Bio::Tools::Blast::HSP.pm is a concrete class that inherits from | |
| 98 B<Bio::Root::Object.pm> and relies on B<Bio::Tools::Sbjct.pm> as a | |
| 99 container for HSP.pm objects. B<Bio::Seq.pm> and B<Bio::UnivAln.pm> | |
| 100 are employed for creating sequence and alignment objects, | |
| 101 respectively. | |
| 102 | |
| 103 | |
| 104 =head2 Relationship to UnivAln.pm & Seq.pm | |
| 105 | |
| 106 HSP.pm can provide the query or sbjct sequence as a B<Bio::Seq.pm> | |
| 107 object via the L<seq()|seq> method. The HSP.pm object can also create a | |
| 108 two-sequence B<Bio::UnivAln.pm> alignment object using the the query | |
| 109 and sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment | |
| 110 objects is not automatic when constructing the HSP.pm object since | |
| 111 this level of functionality is not always required and would generate | |
| 112 a lot of extra overhead when crunching many reports. | |
| 113 | |
| 114 | |
| 115 =head1 FEEDBACK | |
| 116 | |
| 117 =head2 Mailing Lists | |
| 118 | |
| 119 User feedback is an integral part of the evolution of this and other | |
| 120 Bioperl modules. Send your comments and suggestions preferably to one | |
| 121 of the Bioperl mailing lists. Your participation is much appreciated. | |
| 122 | |
| 123 bioperl-l@bioperl.org - General discussion | |
| 124 http://bio.perl.org/MailList.html - About the mailing lists | |
| 125 | |
| 126 =head2 Reporting Bugs | |
| 127 | |
| 128 Report bugs to the Bioperl bug tracking system to help us keep track | |
| 129 the bugs and their resolution. Bug reports can be submitted via email | |
| 130 or the web: | |
| 131 | |
| 132 bioperl-bugs@bio.perl.org | |
| 133 http://bugzilla.bioperl.org/ | |
| 134 | |
| 135 =head1 AUTHOR | |
| 136 | |
| 137 Steve Chervitz, E<lt>sac@bioperl.orgE<gt> | |
| 138 | |
| 139 =head1 SEE ALSO | |
| 140 | |
| 141 Bio::Tools::Blast::Sbjct.pm - Blast hit object. | |
| 142 Bio::Tools::Blast.pm - Blast object. | |
| 143 Bio::Seq.pm - Biosequence object | |
| 144 Bio::UnivAln.pm - Biosequence alignment object. | |
| 145 Bio::Root::Object.pm - Proposed base class for all Bioperl objects. | |
| 146 | |
| 147 =head2 Links: | |
| 148 | |
| 149 http://bio.perl.org/Core/POD/Tools/Blast/Sbjct.pm.html | |
| 150 | |
| 151 http://bio.perl.org/Projects/modules.html - Online module documentation | |
| 152 http://bio.perl.org/Projects/Blast/ - Bioperl Blast Project | |
| 153 http://bio.perl.org/ - Bioperl Project Homepage | |
| 154 | |
| 155 =head1 COPYRIGHT | |
| 156 | |
| 157 Copyright (c) 1996-98 Steve Chervitz. All Rights Reserved. | |
| 158 This module is free software; you can redistribute it and/or | |
| 159 modify it under the same terms as Perl itself. | |
| 160 | |
| 161 =cut | |
| 162 | |
| 163 | |
| 164 | |
| 165 # | |
| 166 ## | |
| 167 ### | |
| 168 #### END of main POD documentation. | |
| 169 ### | |
| 170 ## | |
| 171 # | |
| 172 | |
| 173 =head1 APPENDIX | |
| 174 | |
| 175 Methods beginning with a leading underscore are considered private | |
| 176 and are intended for internal use by this module. They are | |
| 177 B<not> considered part of the public interface and are described here | |
| 178 for documentation purposes only. | |
| 179 | |
| 180 =cut | |
| 181 | |
| 182 ##################################################################################### | |
| 183 ## CONSTRUCTOR ## | |
| 184 ##################################################################################### | |
| 185 | |
| 186 =head2 _initialize | |
| 187 | |
| 188 Usage : n/a; automatically called by Bio::Root::Object::new() | |
| 189 : Bio::Tools::Blast::HSP.pm objects are constructed | |
| 190 : automatically by Bio::Tools::Sbjct.pm, so there is no need | |
| 191 : for direct consumption. | |
| 192 Purpose : Initializes HSP data and calls private methods to extract | |
| 193 : the data for a given HSP. | |
| 194 : Calls superclass constructor first (Bio::Root::Object.pm). | |
| 195 Returns : n/a | |
| 196 Argument : Named parameters passed from new(): | |
| 197 : All tags must be uppercase (does not call _rearrange()). | |
| 198 : -DATA => array ref containing raw data for one HSP. | |
| 199 : -PARENT => Sbjct.pm object ref. | |
| 200 : -NAME => integer (1..n). | |
| 201 : -PROGRAM => string ('TBLASTN', 'BLASTP', etc.). | |
| 202 | |
| 203 See Also : L<_set_data()|_set_data>, B<Bio::Root::Object::new()>, B<Bio::Tools::Blast::Sbjct::_set_hsps()> | |
| 204 | |
| 205 =cut | |
| 206 | |
| 207 #---------------- | |
| 208 sub _initialize { | |
| 209 #---------------- | |
| 210 my( $self, %param ) = @_; | |
| 211 | |
| 212 $self->SUPER::_initialize( %param ); | |
| 213 | |
| 214 # The gapped and program booleans may be needed after the HSP object | |
| 215 # is built. | |
| 216 # $self->{'_gapped'} = $param{-GAPPED} || 0; | |
| 217 $self->{'_prog'} = $param{-PROGRAM} || 0; | |
| 218 $self->_set_data( @{$param{-DATA}} ); | |
| 219 } | |
| 220 | |
| 221 ##################################################################################### | |
| 222 ## ACCESSORS ## | |
| 223 ##################################################################################### | |
| 224 | |
| 225 | |
| 226 =head2 _set_data | |
| 227 | |
| 228 Usage : n/a; called automatically during object construction. | |
| 229 Purpose : Sets the query sequence, sbjct sequence, and the "match" data | |
| 230 : which consists of the symbols between the query and sbjct lines | |
| 231 : in the alignment. | |
| 232 Argument : Array (all lines from a single, complete HSP, one line per element) | |
| 233 Throws : Propagates any exceptions from the methods called ("See Also") | |
| 234 | |
| 235 See Also : L<_set_seq()|_set_seq>, L<_set_residues()|_set_residues>, L<_set_score_stats()|_set_score_stats>, L<_set_match_stats()|_set_match_stats>, L<_initialize()|_initialize> | |
| 236 | |
| 237 =cut | |
| 238 | |
| 239 #-------------- | |
| 240 sub _set_data { | |
| 241 #-------------- | |
| 242 my $self = shift; | |
| 243 my @data = @_; | |
| 244 my @queryList = (); # 'Query' = SEQUENCE USED TO QUERY THE DATABASE. | |
| 245 my @sbjctList = (); # 'Sbjct' = HOMOLOGOUS SEQUENCE FOUND IN THE DATABASE. | |
| 246 my @matchList = (); | |
| 247 my $matchLine = 0; # Alternating boolean: when true, load 'match' data. | |
| 248 my @linedat = (); | |
| 249 | |
| 250 $DEBUG and print STDERR "$ID: set_data()\n"; | |
| 251 | |
| 252 my($line, $aln_row_len, $length_diff); | |
| 253 $length_diff = 0; | |
| 254 | |
| 255 # Collecting data for all lines in the alignment | |
| 256 # and then storing the collections for possible processing later. | |
| 257 # | |
| 258 # Note that "match" lines may not be properly padded with spaces. | |
| 259 # This loop now properly handles such cases: | |
| 260 # Query: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVIXXXXX 1200 | |
| 261 # PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVI | |
| 262 # Sbjct: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVILSLKL 1200 | |
| 263 | |
| 264 foreach $line( @data ) { | |
| 265 next if $line =~ /^\s*$/; | |
| 266 | |
| 267 if( $line =~ /^ ?Score/ ) { | |
| 268 $self->_set_score_stats( $line ); | |
| 269 } elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) { | |
| 270 $self->_set_match_stats( $line ); | |
| 271 } elsif( $line =~ /^ ?Frame = ([\d+-]+)/ ) { | |
| 272 # Version 2.0.8 has Frame information on a separate line. | |
| 273 $self->{'_frame'} = $1; | |
| 274 } elsif( $line =~ /^(Query:?[\s\d]+)([^\s\d]+)/ ) { | |
| 275 push @queryList, $line; | |
| 276 $self->{'_match_indent'} = CORE::length $1; | |
| 277 $aln_row_len = (CORE::length $1) + (CORE::length $2); | |
| 278 $matchLine = 1; | |
| 279 } elsif( $matchLine ) { | |
| 280 # Pad the match line with spaces if necessary. | |
| 281 $length_diff = $aln_row_len - CORE::length $line; | |
| 282 $length_diff and $line .= ' 'x $length_diff; | |
| 283 push @matchList, $line; | |
| 284 $matchLine = 0; | |
| 285 } elsif( $line =~ /^Sbjct/ ) { | |
| 286 push @sbjctList, $line; | |
| 287 } | |
| 288 } | |
| 289 | |
| 290 # Storing the query and sbjct lists in case they are needed later. | |
| 291 # We could make this conditional to save memory. | |
| 292 $self->{'_queryList'} = \@queryList; | |
| 293 $self->{'_sbjctList'} = \@sbjctList; | |
| 294 | |
| 295 # Storing the match list in case it is needed later. | |
| 296 $self->{'_matchList'} = \@matchList; | |
| 297 | |
| 298 if(not defined ($self->{'_numIdentical'})) { | |
| 299 $self->throw("Can't parse match statistics.", | |
| 300 "Possibly a new or unrecognized Blast format."); | |
| 301 } | |
| 302 | |
| 303 if(!scalar @queryList or !scalar @sbjctList) { | |
| 304 $self->throw("Can't find query or sbjct alignment lines.", | |
| 305 "Possibly unrecognized Blast format."); | |
| 306 } | |
| 307 } | |
| 308 | |
| 309 | |
| 310 | |
| 311 =head2 _set_score_stats | |
| 312 | |
| 313 Usage : n/a; called automatically by _set_data() | |
| 314 Purpose : Sets various score statistics obtained from the HSP listing. | |
| 315 Argument : String with any of the following formats: | |
| 316 : blast2: Score = 30.1 bits (66), Expect = 9.2 | |
| 317 : blast2: Score = 158.2 bits (544), Expect(2) = e-110 | |
| 318 : blast1: Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40 | |
| 319 : blast1: Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99 | |
| 320 Throws : Exception if the stats cannot be parsed, probably due to a change | |
| 321 : in the Blast report format. | |
| 322 | |
| 323 See Also : L<_set_data()|_set_data> | |
| 324 | |
| 325 =cut | |
| 326 | |
| 327 #-------------------- | |
| 328 sub _set_score_stats { | |
| 329 #-------------------- | |
| 330 my ($self, $data) = @_; | |
| 331 | |
| 332 my ($expect, $p); | |
| 333 | |
| 334 if($data =~ /Score = +([\d.e+-]+) bits \(([\d.e+-]+)\), +Expect = +([\d.e+-]+)/) { | |
| 335 # blast2 format n = 1 | |
| 336 $self->{'_bits'} = $1; | |
| 337 $self->{'_score'} = $2; | |
| 338 $expect = $3; | |
| 339 } elsif($data =~ /Score = +([\d.e+-]+) bits \(([\d.e+-]+)\), +Expect\((\d+)\) = +([\d.e+-]+)/) { | |
| 340 # blast2 format n > 1 | |
| 341 $self->{'_bits'} = $1; | |
| 342 $self->{'_score'} = $2; | |
| 343 $self->{'_n'} = $3; | |
| 344 $expect = $4; | |
| 345 | |
| 346 } elsif($data =~ /Score = +([\d.e+-]+) \(([\d.e+-]+) bits\), +Expect = +([\d.e+-]+), P = +([\d.e-]+)/) { | |
| 347 # blast1 format, n = 1 | |
| 348 $self->{'_score'} = $1; | |
| 349 $self->{'_bits'} = $2; | |
| 350 $expect = $3; | |
| 351 $p = $4; | |
| 352 | |
| 353 } elsif($data =~ /Score = +([\d.e+-]+) \(([\d.e+-]+) bits\), +Expect = +([\d.e+-]+), +Sum P\((\d+)\) = +([\d.e-]+)/) { | |
| 354 # blast1 format, n > 1 | |
| 355 $self->{'_score'} = $1; | |
| 356 $self->{'_bits'} = $2; | |
| 357 $expect = $3; | |
| 358 $self->{'_n'} = $4; | |
| 359 $p = $5; | |
| 360 | |
| 361 } else { | |
| 362 $self->throw("Can't parse score statistics: unrecognized format.", "$data"); | |
| 363 } | |
| 364 | |
| 365 $expect = "1$expect" if $expect =~ /^e/i; | |
| 366 $p = "1$p" if defined $p and $p=~ /^e/i; | |
| 367 | |
| 368 $self->{'_expect'} = $expect; | |
| 369 $self->{'_p'} = $p || undef; | |
| 370 | |
| 371 } | |
| 372 | |
| 373 | |
| 374 | |
| 375 =head2 _set_match_stats | |
| 376 | |
| 377 Usage : n/a; called automatically by _set_data() | |
| 378 Purpose : Sets various matching statistics obtained from the HSP listing. | |
| 379 Argument : blast2: Identities = 23/74 (31%), Positives = 29/74 (39%), Gaps = 17/74 (22%) | |
| 380 : blast2: Identities = 57/98 (58%), Positives = 74/98 (75%) | |
| 381 : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%) | |
| 382 : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%), Frame = -3 | |
| 383 : WU-blast: Identities = 310/553 (56%), Positives = 310/553 (56%), Strand = Minus / Plus | |
| 384 Throws : Exception if the stats cannot be parsed, probably due to a change | |
| 385 : in the Blast report format. | |
| 386 Comments : The "Gaps = " data in the HSP header has a different meaning depending | |
| 387 : on the type of Blast: for BLASTP, this number is the total number of | |
| 388 : gaps in query+sbjct; for TBLASTN, it is the number of gaps in the | |
| 389 : query sequence only. Thus, it is safer to collect the data | |
| 390 : separately by examining the actual sequence strings as is done | |
| 391 : in _set_seq(). | |
| 392 | |
| 393 See Also : L<_set_data()|_set_data>, L<_set_seq()|_set_seq> | |
| 394 | |
| 395 =cut | |
| 396 | |
| 397 #-------------------- | |
| 398 sub _set_match_stats { | |
| 399 #-------------------- | |
| 400 my ($self, $data) = @_; | |
| 401 | |
| 402 if($data =~ m!Identities = (\d+)/(\d+)!) { | |
| 403 # blast1 or 2 format | |
| 404 $self->{'_numIdentical'} = $1; | |
| 405 $self->{'_totalLength'} = $2; | |
| 406 } | |
| 407 | |
| 408 if($data =~ m!Positives = (\d+)/(\d+)!) { | |
| 409 # blast1 or 2 format | |
| 410 $self->{'_numConserved'} = $1; | |
| 411 $self->{'_totalLength'} = $2; | |
| 412 } | |
| 413 | |
| 414 if($data =~ m!Frame = ([\d+-]+)!) { | |
| 415 $self->{'_frame'} = $1; | |
| 416 } | |
| 417 | |
| 418 # Strand data is not always present in this line. | |
| 419 # _set_seq() will also set strand information. | |
| 420 if($data =~ m!Strand = (\w+) / (\w+)!) { | |
| 421 $self->{'_queryStrand'} = $1; | |
| 422 $self->{'_sbjctStrand'} = $2; | |
| 423 } | |
| 424 | |
| 425 # if($data =~ m!Gaps = (\d+)/(\d+)!) { | |
| 426 # $self->{'_totalGaps'} = $1; | |
| 427 # } else { | |
| 428 # $self->{'_totalGaps'} = 0; | |
| 429 # } | |
| 430 } | |
| 431 | |
| 432 | |
| 433 | |
| 434 =head2 _set_seq_data | |
| 435 | |
| 436 Usage : n/a; called automatically when sequence data is requested. | |
| 437 Purpose : Sets the HSP sequence data for both query and sbjct sequences. | |
| 438 : Includes: start, stop, length, gaps, and raw sequence. | |
| 439 Argument : n/a | |
| 440 Throws : Propagates any exception thrown by _set_match_seq() | |
| 441 Comments : Uses raw data stored by _set_data() during object construction. | |
| 442 : These data are not always needed, so it is conditionally | |
| 443 : executed only upon demand by methods such as gaps(), _set_residues(), | |
| 444 : etc. _set_seq() does the dirty work. | |
| 445 | |
| 446 See Also : L<_set_seq()|_set_seq> | |
| 447 | |
| 448 =cut | |
| 449 | |
| 450 sub _set_seq_data { | |
| 451 my $self = shift; | |
| 452 | |
| 453 $self->_set_seq('query', @{$self->{'_queryList'}}); | |
| 454 $self->_set_seq('sbjct', @{$self->{'_sbjctList'}}); | |
| 455 | |
| 456 # Liberate some memory. | |
| 457 @{$self->{'_queryList'}} = @{$self->{'_sbjctList'}} = (); | |
| 458 undef $self->{'_queryList'}; | |
| 459 undef $self->{'_sbjctList'}; | |
| 460 | |
| 461 $self->{'_set_seq_data'} = 1; | |
| 462 } | |
| 463 | |
| 464 | |
| 465 | |
| 466 =head2 _set_seq | |
| 467 | |
| 468 Usage : n/a; called automatically by _set_seq_data() | |
| 469 : $hsp_obj->($seq_type, @data); | |
| 470 Purpose : Sets sequence information for both the query and sbjct sequences. | |
| 471 : Directly counts the number of gaps in each sequence (if gapped Blast). | |
| 472 Argument : $seq_type = 'query' or 'sbjct' | |
| 473 : @data = all seq lines with the form: | |
| 474 : Query: 61 SPHNVKDRKEQNGSINNAISPTATANTSGSQQINIDSALRDRSSNVAAQPSLSDASSGSN 120 | |
| 475 Throws : Exception if data strings cannot be parsed, probably due to a change | |
| 476 : in the Blast report format. | |
| 477 Comments : Uses first argument to determine which data members to set | |
| 478 : making this method sensitive data member name changes. | |
| 479 : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc). | |
| 480 Warning : Sequence endpoints are normalized so that start < end. This affects HSPs | |
| 481 : for TBLASTN/X hits on the minus strand. Normalization facilitates use | |
| 482 : of range information by methods such as match(). | |
| 483 | |
| 484 See Also : L<_set_seq_data()|_set_seq_data>, L<matches()|matches>, L<range()|range>, L<start()|start>, L<end()|end> | |
| 485 | |
| 486 =cut | |
| 487 | |
| 488 #------------- | |
| 489 sub _set_seq { | |
| 490 #------------- | |
| 491 my $self = shift; | |
| 492 my $seqType = shift; | |
| 493 my @data = @_; | |
| 494 my @ranges = (); | |
| 495 my @sequence = (); | |
| 496 my $numGaps = 0; | |
| 497 | |
| 498 foreach( @data ) { | |
| 499 if( m/(\d+) *(\D+) *(\d+)/) { | |
| 500 push @ranges, ( $1, $3 ) ; | |
| 501 push @sequence, $2; | |
| 502 } else { | |
| 503 $self->warn("Bad sequence data: $_"); | |
| 504 } | |
| 505 } | |
| 506 | |
| 507 (scalar(@sequence) and scalar(@ranges)) || $self->throw("Can't set sequence: missing data", | |
| 508 "Possibly unrecognized Blast format."); | |
| 509 | |
| 510 # Sensitive to member name changes. | |
| 511 $seqType = "_\L$seqType\E"; | |
| 512 $self->{$seqType.'Start'} = $ranges[0]; | |
| 513 $self->{$seqType.'Stop'} = $ranges[ $#ranges ]; | |
| 514 $self->{$seqType.'Seq'} = \@sequence; | |
| 515 | |
| 516 $self->{$seqType.'Length'} = abs($ranges[ $#ranges ] - $ranges[0]) + 1; | |
| 517 | |
| 518 # Adjust lengths for BLASTX, TBLASTN, TBLASTX sequences | |
| 519 # Converting nucl coords to amino acid coords. | |
| 520 | |
| 521 my $prog = $self->{'_prog'}; | |
| 522 if($prog eq 'TBLASTN' and $seqType eq '_sbjct') { | |
| 523 $self->{$seqType.'Length'} /= 3; | |
| 524 } elsif($prog eq 'BLASTX' and $seqType eq '_query') { | |
| 525 $self->{$seqType.'Length'} /= 3; | |
| 526 } elsif($prog eq 'TBLASTX') { | |
| 527 $self->{$seqType.'Length'} /= 3; | |
| 528 } | |
| 529 | |
| 530 $self->{$seqType.'Strand'} = 'Plus' if $prog =~ /BLAST[NX]/; | |
| 531 | |
| 532 # Normalize sequence endpoints so that start < end. | |
| 533 # Reverse complement or 'minus strand' HSPs get flipped here. | |
| 534 if($self->{$seqType.'Start'} > $self->{$seqType.'Stop'}) { | |
| 535 ($self->{$seqType.'Start'}, $self->{$seqType.'Stop'}) = | |
| 536 ($self->{$seqType.'Stop'}, $self->{$seqType.'Start'}); | |
| 537 $self->{$seqType.'Strand'} = 'Minus'; | |
| 538 } | |
| 539 | |
| 540 ## Count number of gaps in each seq. Only need to do this for gapped Blasts. | |
| 541 # if($self->{'_gapped'}) { | |
| 542 my $seqstr = join('', @sequence); | |
| 543 $seqstr =~ s/\s//g; | |
| 544 my $num_gaps = CORE::length($seqstr) - $self->{$seqType.'Length'}; | |
| 545 $self->{$seqType.'Gaps'} = $num_gaps if $num_gaps > 0; | |
| 546 # } | |
| 547 } | |
| 548 | |
| 549 | |
| 550 =head2 _set_residues | |
| 551 | |
| 552 Usage : n/a; called automatically when residue data is requested. | |
| 553 Purpose : Sets the residue numbers representing the identical and | |
| 554 : conserved positions. These data are obtained by analyzing the | |
| 555 : symbols between query and sbjct lines of the alignments. | |
| 556 Argument : n/a | |
| 557 Throws : Propagates any exception thrown by _set_seq_data() and _set_match_seq(). | |
| 558 Comments : These data are not always needed, so it is conditionally | |
| 559 : executed only upon demand by methods such as seq_inds(). | |
| 560 : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc). | |
| 561 | |
| 562 See Also : L<_set_seq_data()|_set_seq_data>, L<_set_match_seq()|_set_match_seq>, L<seq_inds()|seq_inds> | |
| 563 | |
| 564 =cut | |
| 565 | |
| 566 #------------------ | |
| 567 sub _set_residues { | |
| 568 #------------------ | |
| 569 my $self = shift; | |
| 570 my @sequence = (); | |
| 571 | |
| 572 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
| 573 | |
| 574 # Using hashes to avoid saving duplicate residue numbers. | |
| 575 my %identicalList_query = (); | |
| 576 my %identicalList_sbjct = (); | |
| 577 my %conservedList_query = (); | |
| 578 my %conservedList_sbjct = (); | |
| 579 | |
| 580 my $aref = $self->_set_match_seq() if not ref $self->{'_matchSeq'}; | |
| 581 $aref ||= $self->{'_matchSeq'}; | |
| 582 my $seqString = join('', @$aref ); | |
| 583 | |
| 584 my $qseq = join('',@{$self->{'_querySeq'}}); | |
| 585 my $sseq = join('',@{$self->{'_sbjctSeq'}}); | |
| 586 my $resCount_query = $self->{'_queryStop'} || 0; | |
| 587 my $resCount_sbjct = $self->{'_sbjctStop'} || 0; | |
| 588 | |
| 589 my $prog = $self->{'_prog'}; | |
| 590 if($prog !~ /^BLASTP|^BLASTN/) { | |
| 591 if($prog eq 'TBLASTN') { | |
| 592 $resCount_sbjct /= 3; | |
| 593 } elsif($prog eq 'BLASTX') { | |
| 594 $resCount_query /= 3; | |
| 595 } elsif($prog eq 'TBLASTX') { | |
| 596 $resCount_query /= 3; | |
| 597 $resCount_sbjct /= 3; | |
| 598 } | |
| 599 } | |
| 600 | |
| 601 my ($mchar, $schar, $qchar); | |
| 602 while( $mchar = chop($seqString) ) { | |
| 603 ($qchar, $schar) = (chop($qseq), chop($sseq)); | |
| 604 if( $mchar eq '+' ) { | |
| 605 $conservedList_query{ $resCount_query } = 1; | |
| 606 $conservedList_sbjct{ $resCount_sbjct } = 1; | |
| 607 } elsif( $mchar ne ' ' ) { | |
| 608 $identicalList_query{ $resCount_query } = 1; | |
| 609 $identicalList_sbjct{ $resCount_sbjct } = 1; | |
| 610 } | |
| 611 $resCount_query-- if $qchar ne $GAP_SYMBOL; | |
| 612 $resCount_sbjct-- if $schar ne $GAP_SYMBOL; | |
| 613 } | |
| 614 $self->{'_identicalRes_query'} = \%identicalList_query; | |
| 615 $self->{'_conservedRes_query'} = \%conservedList_query; | |
| 616 $self->{'_identicalRes_sbjct'} = \%identicalList_sbjct; | |
| 617 $self->{'_conservedRes_sbjct'} = \%conservedList_sbjct; | |
| 618 | |
| 619 } | |
| 620 | |
| 621 | |
| 622 | |
| 623 | |
| 624 =head2 _set_match_seq | |
| 625 | |
| 626 Usage : n/a. Internal method. | |
| 627 : $hsp_obj->_set_match_seq() | |
| 628 Purpose : Set the 'match' sequence for the current HSP (symbols in between | |
| 629 : the query and sbjct lines.) | |
| 630 Returns : Array reference holding the match sequences lines. | |
| 631 Argument : n/a | |
| 632 Throws : Exception if the _matchList field is not set. | |
| 633 Comments : The match information is not always necessary. This method | |
| 634 : allows it to be conditionally prepared. | |
| 635 : Called by _set_residues>() and seq_str(). | |
| 636 | |
| 637 See Also : L<_set_residues()|_set_residues>, L<seq_str()|seq_str> | |
| 638 | |
| 639 =cut | |
| 640 | |
| 641 #------------------- | |
| 642 sub _set_match_seq { | |
| 643 #------------------- | |
| 644 my $self = shift; | |
| 645 | |
| 646 ## DEBUGGING CODE: | |
| 647 # if($self->parent->name eq '1AK5_' and $self->parent->parent->name eq 'YAR073W') { | |
| 648 # print "\n_set_match_seq() called for HSP ", $self->name, " of hit ${\$self->parent->name} in query ${\$self->parent->parent->name}"; <STDIN>; | |
| 649 # } | |
| 650 | |
| 651 ref($self->{'_matchList'}) || $self->throw("Can't set HSP match sequence: No data"); | |
| 652 | |
| 653 my @data = @{$self->{'_matchList'}}; | |
| 654 | |
| 655 my(@sequence); | |
| 656 foreach( @data ) { | |
| 657 chomp($_); | |
| 658 ## Remove leading spaces; (note: aln may begin with a space | |
| 659 ## which is why we can't use s/^ +//). | |
| 660 s/^ {$self->{'_match_indent'}}//; | |
| 661 push @sequence, $_; | |
| 662 } | |
| 663 # Liberate some memory. | |
| 664 @{$self->{'_matchList'}} = undef; | |
| 665 $self->{'_matchList'} = undef; | |
| 666 | |
| 667 $self->{'_matchSeq'} = \@sequence; | |
| 668 | |
| 669 ## DEBUGGING CODE: | |
| 670 # if($self->parent->name eq '1AK5_' and $self->parent->parent->name eq 'YAR073W') { | |
| 671 # print "RETURNING: $self->{'_matchSeq'}:\n @{$self->{'_matchSeq'}}";<STDIN>; | |
| 672 # } | |
| 673 | |
| 674 $self->{'_matchSeq'}; | |
| 675 } | |
| 676 | |
| 677 | |
| 678 | |
| 679 =head2 score | |
| 680 | |
| 681 Usage : $hsp_obj->score() | |
| 682 Purpose : Get the Blast score for the HSP. | |
| 683 Returns : Integer | |
| 684 Argument : n/a | |
| 685 Throws : n/a | |
| 686 | |
| 687 See Also : L<bits()|bits> | |
| 688 | |
| 689 =cut | |
| 690 | |
| 691 #--------- | |
| 692 sub score { my $self = shift; $self->{'_score'}; } | |
| 693 #--------- | |
| 694 | |
| 695 | |
| 696 | |
| 697 =head2 bits | |
| 698 | |
| 699 Usage : $hsp_obj->bits() | |
| 700 Purpose : Get the Blast score in bits for the HSP. | |
| 701 Returns : Float | |
| 702 Argument : n/a | |
| 703 Throws : n/a | |
| 704 | |
| 705 | |
| 706 See Also : L<score()|score> | |
| 707 | |
| 708 =cut | |
| 709 | |
| 710 #-------- | |
| 711 sub bits { my $self = shift; $self->{'_bits'}; } | |
| 712 #-------- | |
| 713 | |
| 714 | |
| 715 | |
| 716 =head2 n | |
| 717 | |
| 718 Usage : $hsp_obj->n() | |
| 719 Purpose : Get the N value (num HSPs on which P/Expect is based). | |
| 720 : This value is not defined with NCBI Blast2 with gapping. | |
| 721 Returns : Integer or null string if not defined. | |
| 722 Argument : n/a | |
| 723 Throws : n/a | |
| 724 Comments : The 'N' value is listed in parenthesis with P/Expect value: | |
| 725 : e.g., P(3) = 1.2e-30 ---> (N = 3). | |
| 726 : Not defined in NCBI Blast2 with gaps. | |
| 727 : This typically is equal to the number of HSPs but not always. | |
| 728 : To obtain the number of HSPs, use Bio::Tools::Blast::Sbjct::num_hsps(). | |
| 729 | |
| 730 See Also : L<score()|score> | |
| 731 | |
| 732 =cut | |
| 733 | |
| 734 #----- | |
| 735 sub n { my $self = shift; $self->{'_n'} || ''; } | |
| 736 #----- | |
| 737 | |
| 738 | |
| 739 | |
| 740 =head2 frame | |
| 741 | |
| 742 Usage : $hsp_obj->frame() | |
| 743 Purpose : Get the reading frame number (-/+ 1, 2, 3) (TBLASTN/X only). | |
| 744 Returns : Integer or null string if not defined. | |
| 745 Argument : n/a | |
| 746 Throws : n/a | |
| 747 | |
| 748 =cut | |
| 749 | |
| 750 #--------- | |
| 751 sub frame { my $self = shift; $self->{'_frame'} || ''; } | |
| 752 #--------- | |
| 753 | |
| 754 | |
| 755 | |
| 756 =head2 signif() | |
| 757 | |
| 758 Usage : $hsp_obj->signif() | |
| 759 Purpose : Get the P-value or Expect value for the HSP. | |
| 760 Returns : Float (0.001 or 1.3e-43) | |
| 761 : Returns P-value if it is defined, otherwise, Expect value. | |
| 762 Argument : n/a | |
| 763 Throws : n/a | |
| 764 Comments : Provided for consistency with Sbjct::signif() | |
| 765 : Support for returning the significance data in different | |
| 766 : formats (e.g., exponent only), is not provided for HSP objects. | |
| 767 : This is only available for the Sbjct or Blast object. | |
| 768 | |
| 769 See Also : L<p()|p>, L<expect()|expect>, B<Bio::Tools::Blast::Sbjct::signif()> | |
| 770 | |
| 771 =cut | |
| 772 | |
| 773 #----------- | |
| 774 sub signif { | |
| 775 #----------- | |
| 776 my $self = shift; | |
| 777 my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'}; | |
| 778 $val; | |
| 779 } | |
| 780 | |
| 781 | |
| 782 | |
| 783 =head2 expect | |
| 784 | |
| 785 Usage : $hsp_obj->expect() | |
| 786 Purpose : Get the Expect value for the HSP. | |
| 787 Returns : Float (0.001 or 1.3e-43) | |
| 788 Argument : n/a | |
| 789 Throws : n/a | |
| 790 Comments : Support for returning the expectation data in different | |
| 791 : formats (e.g., exponent only), is not provided for HSP objects. | |
| 792 : This is only available for the Sbjct or Blast object. | |
| 793 | |
| 794 See Also : L<p()|p> | |
| 795 | |
| 796 =cut | |
| 797 | |
| 798 #---------- | |
| 799 sub expect { my $self = shift; $self->{'_expect'}; } | |
| 800 #---------- | |
| 801 | |
| 802 | |
| 803 | |
| 804 =head2 p | |
| 805 | |
| 806 Usage : $hsp_obj->p() | |
| 807 Purpose : Get the P-value for the HSP. | |
| 808 Returns : Float (0.001 or 1.3e-43) or undef if not defined. | |
| 809 Argument : n/a | |
| 810 Throws : n/a | |
| 811 Comments : P-value is not defined with NCBI Blast2 reports. | |
| 812 : Support for returning the expectation data in different | |
| 813 : formats (e.g., exponent only) is not provided for HSP objects. | |
| 814 : This is only available for the Sbjct or Blast object. | |
| 815 | |
| 816 See Also : L<expect()|expect> | |
| 817 | |
| 818 =cut | |
| 819 | |
| 820 #----- | |
| 821 sub p { my $self = shift; $self->{'_p'}; } | |
| 822 #----- | |
| 823 | |
| 824 | |
| 825 =head2 length | |
| 826 | |
| 827 Usage : $hsp->length( [seq_type] ) | |
| 828 Purpose : Get the length of the aligned portion of the query or sbjct. | |
| 829 Example : $hsp->length('query') | |
| 830 Returns : integer | |
| 831 Argument : seq_type: 'query' | 'sbjct' | 'total' (default = 'total') | |
| 832 Throws : n/a | |
| 833 Comments : 'total' length is the full length of the alignment | |
| 834 : as reported in the denominators in the alignment section: | |
| 835 : "Identical = 34/120 Positives = 67/120". | |
| 836 : Developer note: when using the built-in length function within | |
| 837 : this module, call it as CORE::length(). | |
| 838 | |
| 839 See Also : L<gaps()|gaps> | |
| 840 | |
| 841 =cut | |
| 842 | |
| 843 #----------- | |
| 844 sub length { | |
| 845 #----------- | |
| 846 my( $self, $type ) = @_; | |
| 847 $type ||= 'total'; | |
| 848 | |
| 849 $type ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
| 850 | |
| 851 ## Sensitive to member name format. | |
| 852 $type = "_\L$type\E"; | |
| 853 $self->{$type.'Length'}; | |
| 854 } | |
| 855 | |
| 856 | |
| 857 | |
| 858 =head2 gaps | |
| 859 | |
| 860 Usage : $hsp->gaps( [seq_type] ) | |
| 861 Purpose : Get the number of gaps in the query, sbjct, or total alignment. | |
| 862 : Also can return query gaps and sbjct gaps as a two-element list | |
| 863 : when in array context. | |
| 864 Example : $total_gaps = $hsp->gaps(); | |
| 865 : ($qgaps, $sgaps) = $hsp->gaps(); | |
| 866 : $qgaps = $hsp->gaps('query'); | |
| 867 Returns : scalar context: integer | |
| 868 : array context without args: (int, int) = ('queryGaps', 'sbjctGaps') | |
| 869 Argument : seq_type: 'query' | 'sbjct' | 'total' | |
| 870 : (default = 'total', scalar context) | |
| 871 : Array context can be "induced" by providing an argument of 'list' or 'array'. | |
| 872 Throws : n/a | |
| 873 | |
| 874 See Also : L<length()|length>, L<matches()|matches> | |
| 875 | |
| 876 =cut | |
| 877 | |
| 878 #--------- | |
| 879 sub gaps { | |
| 880 #--------- | |
| 881 my( $self, $seqType ) = @_; | |
| 882 | |
| 883 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
| 884 | |
| 885 $seqType ||= (wantarray ? 'list' : 'total'); | |
| 886 | |
| 887 if($seqType =~ /list|array/i) { | |
| 888 return (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0)); | |
| 889 } | |
| 890 | |
| 891 if($seqType eq 'total') { | |
| 892 return ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0; | |
| 893 } else { | |
| 894 ## Sensitive to member name format. | |
| 895 $seqType = "_\L$seqType\E"; | |
| 896 return $self->{$seqType.'Gaps'} || 0; | |
| 897 } | |
| 898 } | |
| 899 | |
| 900 | |
| 901 | |
| 902 =head2 matches | |
| 903 | |
| 904 Usage : $hsp->matches([seq_type], [start], [stop]); | |
| 905 Purpose : Get the total number of identical and conservative matches | |
| 906 : in the query or sbjct sequence for the given HSP. Optionally can | |
| 907 : report data within a defined interval along the seq. | |
| 908 : (Note: 'conservative' matches are called 'positives' in the | |
| 909 : Blast report.) | |
| 910 Example : ($id,$cons) = $hsp_object->matches('sbjct'); | |
| 911 : ($id,$cons) = $hsp_object->matches('query',300,400); | |
| 912 Returns : 2-element array of integers | |
| 913 Argument : (1) seq_type = 'query' | 'sbjct' (default = query) | |
| 914 : (2) start = Starting coordinate (optional) | |
| 915 : (3) stop = Ending coordinate (optional) | |
| 916 Throws : Exception if the supplied coordinates are out of range. | |
| 917 Comments : Relies on seq_str('match') to get the string of alignment symbols | |
| 918 : between the query and sbjct lines which are used for determining | |
| 919 : the number of identical and conservative matches. | |
| 920 | |
| 921 See Also : L<length()|length>, L<gaps()|gaps>, L<seq_str()|seq_str>, B<Bio::Tools::Blast::Sbjct::_adjust_contigs()> | |
| 922 | |
| 923 =cut | |
| 924 | |
| 925 #----------- | |
| 926 sub matches { | |
| 927 #----------- | |
| 928 my( $self, %param ) = @_; | |
| 929 my(@data); | |
| 930 my($seqType, $beg, $end) = ($param{-SEQ}, $param{-START}, $param{-STOP}); | |
| 931 $seqType ||= 'query'; | |
| 932 | |
| 933 if(!defined $beg && !defined $end) { | |
| 934 ## Get data for the whole alignment. | |
| 935 push @data, ($self->{'_numIdentical'}, $self->{'_numConserved'}); | |
| 936 } else { | |
| 937 ## Get the substring representing the desired sub-section of aln. | |
| 938 $beg ||= 0; | |
| 939 $end ||= 0; | |
| 940 my($start,$stop) = $self->range($seqType); | |
| 941 if($beg == 0) { $beg = $start; $end = $beg+$end; } | |
| 942 elsif($end == 0) { $end = $stop; $beg = $end-$beg; } | |
| 943 | |
| 944 if($end >= $stop) { $end = $stop; } ##ML changed from if (end >stop) | |
| 945 else { $end += 1;} ##ML moved from commented position below, makes | |
| 946 ##more sense here | |
| 947 # if($end > $stop) { $end = $stop; } | |
| 948 if($beg < $start) { $beg = $start; } | |
| 949 # else { $end += 1;} | |
| 950 | |
| 951 # my $seq = substr($self->seq_str('match'), $beg-$start, ($end-$beg)); | |
| 952 | |
| 953 ## ML: START fix for substr out of range error ------------------ | |
| 954 my $seq = ""; | |
| 955 if (($self->{'_prog'} eq 'TBLASTN') and ($seqType eq 'sbjct')) | |
| 956 { | |
| 957 $seq = substr($self->seq_str('match'), | |
| 958 int(($beg-$start)/3), int(($end-$beg+1)/3)); | |
| 959 | |
| 960 } elsif (($self->{'_prog'} eq 'BLASTX') and ($seqType eq 'query')) | |
| 961 { | |
| 962 $seq = substr($self->seq_str('match'), | |
| 963 int(($beg-$start)/3), int(($end-$beg+1)/3)); | |
| 964 } else { | |
| 965 $seq = substr($self->seq_str('match'), | |
| 966 $beg-$start, ($end-$beg)); | |
| 967 } | |
| 968 ## ML: End of fix for substr out of range error ----------------- | |
| 969 | |
| 970 | |
| 971 ## ML: debugging code | |
| 972 ## This is where we get our exception. Try printing out the values going | |
| 973 ## into this: | |
| 974 ## | |
| 975 # print STDERR | |
| 976 # qq(*------------MY EXCEPTION --------------------\nSeq: ") , | |
| 977 # $self->seq_str("$seqType"), qq("\n),$self->name,",( index:"; | |
| 978 # print STDERR $beg-$start, ", len: ", $end-$beg," ), (HSPRealLen:", | |
| 979 # CORE::length $self->seq_str("$seqType"); | |
| 980 # print STDERR ", HSPCalcLen: ", $stop - $start +1 ," ), | |
| 981 # ( beg: $beg, end: $end ), ( start: $start, stop: stop )\n"; | |
| 982 ## ML: END DEBUGGING CODE---------- | |
| 983 | |
| 984 if(!CORE::length $seq) { | |
| 985 $self->throw("Undefined sub-sequence ($beg,$end). Valid range = $start - $stop"); | |
| 986 } | |
| 987 ## Get data for a substring. | |
| 988 # printf "Collecting HSP subsection data: beg,end = %d,%d; start,stop = %d,%d\n%s<---\n", $beg, $end, $start, $stop, $seq; | |
| 989 # printf "Original match seq:\n%s\n",$self->seq_str('match'); | |
| 990 $seq =~ s/ //g; # remove space (no info). | |
| 991 my $len_cons = CORE::length $seq; | |
| 992 $seq =~ s/\+//g; # remove '+' characters (conservative substitutions) | |
| 993 my $len_id = CORE::length $seq; | |
| 994 push @data, ($len_id, $len_cons); | |
| 995 # printf " HSP = %s\n id = %d; cons = %d\n", $self->name, $len_id, $len_cons; <STDIN>; | |
| 996 } | |
| 997 @data; | |
| 998 } | |
| 999 | |
| 1000 | |
| 1001 | |
| 1002 =head2 frac_identical | |
| 1003 | |
| 1004 Usage : $hsp_object->frac_identical( [seq_type] ); | |
| 1005 Purpose : Get the fraction of identical positions within the given HSP. | |
| 1006 Example : $frac_iden = $hsp_object->frac_identical('query'); | |
| 1007 Returns : Float (2-decimal precision, e.g., 0.75). | |
| 1008 Argument : seq_type: 'query' | 'sbjct' | 'total' | |
| 1009 : default = 'total' (but see comments below). | |
| 1010 Throws : n/a | |
| 1011 Comments : Different versions of Blast report different values for the total | |
| 1012 : length of the alignment. This is the number reported in the | |
| 1013 : denominators in the stats section: | |
| 1014 : "Identical = 34/120 Positives = 67/120". | |
| 1015 : BLAST-GP uses the total length of the alignment (with gaps) | |
| 1016 : WU-BLAST uses the length of the query sequence (without gaps). | |
| 1017 : Therefore, when called without an argument or an argument of 'total', | |
| 1018 : this method will report different values depending on the | |
| 1019 : version of BLAST used. | |
| 1020 : | |
| 1021 : To get the fraction identical among only the aligned residues, | |
| 1022 : ignoring the gaps, call this method with an argument of 'query' | |
| 1023 : or 'sbjct'. | |
| 1024 | |
| 1025 See Also : L<frac_conserved()|frac_conserved>, L<num_identical()|num_identical>, L<matches()|matches> | |
| 1026 | |
| 1027 =cut | |
| 1028 | |
| 1029 #------------------- | |
| 1030 sub frac_identical { | |
| 1031 #------------------- | |
| 1032 # The value is calculated as opposed to storing it from the parsed results. | |
| 1033 # This saves storage and also permits flexibility in determining for which | |
| 1034 # sequence (query or sbjct) the figure is to be calculated. | |
| 1035 | |
| 1036 my( $self, $seqType ) = @_; | |
| 1037 $seqType ||= 'total'; | |
| 1038 | |
| 1039 if($seqType ne 'total') { | |
| 1040 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
| 1041 } | |
| 1042 ## Sensitive to member name format. | |
| 1043 $seqType = "_\L$seqType\E"; | |
| 1044 | |
| 1045 sprintf( "%.2f", $self->{'_numIdentical'}/$self->{$seqType.'Length'}); | |
| 1046 } | |
| 1047 | |
| 1048 | |
| 1049 =head2 frac_conserved | |
| 1050 | |
| 1051 Usage : $hsp_object->frac_conserved( [seq_type] ); | |
| 1052 Purpose : Get the fraction of conserved positions within the given HSP. | |
| 1053 : (Note: 'conservative' positions are called 'positives' in the | |
| 1054 : Blast report.) | |
| 1055 Example : $frac_cons = $hsp_object->frac_conserved('query'); | |
| 1056 Returns : Float (2-decimal precision, e.g., 0.75). | |
| 1057 Argument : seq_type: 'query' | 'sbjct' | |
| 1058 : default = 'total' (but see comments below). | |
| 1059 Throws : n/a | |
| 1060 Comments : Different versions of Blast report different values for the total | |
| 1061 : length of the alignment. This is the number reported in the | |
| 1062 : denominators in the stats section: | |
| 1063 : "Identical = 34/120 Positives = 67/120". | |
| 1064 : BLAST-GP uses the total length of the alignment (with gaps) | |
| 1065 : WU-BLAST uses the length of the query sequence (without gaps). | |
| 1066 : Therefore, when called without an argument or an argument of 'total', | |
| 1067 : this method will report different values depending on the | |
| 1068 : version of BLAST used. | |
| 1069 : | |
| 1070 : To get the fraction conserved among only the aligned residues, | |
| 1071 : ignoring the gaps, call this method with an argument of 'query' | |
| 1072 : or 'sbjct'. | |
| 1073 | |
| 1074 See Also : L<frac_conserved()|frac_conserved>, L<num_conserved()|num_conserved>, L<matches()|matches> | |
| 1075 | |
| 1076 =cut | |
| 1077 | |
| 1078 #-------------------- | |
| 1079 sub frac_conserved { | |
| 1080 #-------------------- | |
| 1081 # The value is calculated as opposed to storing it from the parsed results. | |
| 1082 # This saves storage and also permits flexibility in determining for which | |
| 1083 # sequence (query or sbjct) the figure is to be calculated. | |
| 1084 | |
| 1085 my( $self, $seqType ) = @_; | |
| 1086 $seqType ||= 'total'; | |
| 1087 | |
| 1088 if($seqType ne 'total') { | |
| 1089 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
| 1090 } | |
| 1091 | |
| 1092 ## Sensitive to member name format. | |
| 1093 $seqType = "_\L$seqType\E"; | |
| 1094 | |
| 1095 sprintf( "%.2f", $self->{'_numConserved'}/$self->{$seqType.'Length'}); | |
| 1096 } | |
| 1097 | |
| 1098 | |
| 1099 =head2 num_identical | |
| 1100 | |
| 1101 Usage : $hsp_object->num_identical(); | |
| 1102 Purpose : Get the number of identical positions within the given HSP. | |
| 1103 Example : $num_iden = $hsp_object->num_identical(); | |
| 1104 Returns : integer | |
| 1105 Argument : n/a | |
| 1106 Throws : n/a | |
| 1107 | |
| 1108 See Also : L<num_conserved()|num_conserved>, L<frac_identical()|frac_identical> | |
| 1109 | |
| 1110 =cut | |
| 1111 | |
| 1112 #------------------- | |
| 1113 sub num_identical { | |
| 1114 #------------------- | |
| 1115 my( $self) = shift; | |
| 1116 | |
| 1117 $self->{'_numIdentical'}; | |
| 1118 } | |
| 1119 | |
| 1120 | |
| 1121 =head2 num_conserved | |
| 1122 | |
| 1123 Usage : $hsp_object->num_conserved(); | |
| 1124 Purpose : Get the number of conserved positions within the given HSP. | |
| 1125 Example : $num_iden = $hsp_object->num_conserved(); | |
| 1126 Returns : integer | |
| 1127 Argument : n/a | |
| 1128 Throws : n/a | |
| 1129 | |
| 1130 See Also : L<num_identical()|num_identical>, L<frac_conserved()|frac_conserved> | |
| 1131 | |
| 1132 =cut | |
| 1133 | |
| 1134 #------------------- | |
| 1135 sub num_conserved { | |
| 1136 #------------------- | |
| 1137 my( $self) = shift; | |
| 1138 | |
| 1139 $self->{'_numConserved'}; | |
| 1140 } | |
| 1141 | |
| 1142 | |
| 1143 | |
| 1144 =head2 range | |
| 1145 | |
| 1146 Usage : $hsp->range( [seq_type] ); | |
| 1147 Purpose : Gets the (start, end) coordinates for the query or sbjct sequence | |
| 1148 : in the HSP alignment. | |
| 1149 Example : ($qbeg, $qend) = $hsp->range('query'); | |
| 1150 : ($sbeg, $send) = $hsp->range('sbjct'); | |
| 1151 Returns : Two-element array of integers | |
| 1152 Argument : seq_type = string, 'query' or 'sbjct' (default = 'query') | |
| 1153 : (case insensitive). | |
| 1154 Throws : n/a | |
| 1155 | |
| 1156 See Also : L<start()|start>, L<end()|end> | |
| 1157 | |
| 1158 =cut | |
| 1159 | |
| 1160 #---------- | |
| 1161 sub range { | |
| 1162 #---------- | |
| 1163 my ($self, $seqType) = @_; | |
| 1164 | |
| 1165 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
| 1166 | |
| 1167 $seqType ||= 'query'; | |
| 1168 ## Sensitive to member name changes. | |
| 1169 $seqType = "_\L$seqType\E"; | |
| 1170 | |
| 1171 return ($self->{$seqType.'Start'},$self->{$seqType.'Stop'}); | |
| 1172 } | |
| 1173 | |
| 1174 =head2 start | |
| 1175 | |
| 1176 Usage : $hsp->start( [seq_type] ); | |
| 1177 Purpose : Gets the start coordinate for the query, sbjct, or both sequences | |
| 1178 : in the HSP alignment. | |
| 1179 Example : $qbeg = $hsp->start('query'); | |
| 1180 : $sbeg = $hsp->start('sbjct'); | |
| 1181 : ($qbeg, $sbeg) = $hsp->start(); | |
| 1182 Returns : scalar context: integer | |
| 1183 : array context without args: list of two integers | |
| 1184 Argument : In scalar context: seq_type = 'query' or 'sbjct' | |
| 1185 : (case insensitive). If not supplied, 'query' is used. | |
| 1186 : Array context can be "induced" by providing an argument of 'list' or 'array'. | |
| 1187 Throws : n/a | |
| 1188 | |
| 1189 See Also : L<end()|end>, L<range()|range> | |
| 1190 | |
| 1191 =cut | |
| 1192 | |
| 1193 #---------- | |
| 1194 sub start { | |
| 1195 #---------- | |
| 1196 my ($self, $seqType) = @_; | |
| 1197 | |
| 1198 $seqType ||= (wantarray ? 'list' : 'query'); | |
| 1199 | |
| 1200 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
| 1201 | |
| 1202 if($seqType =~ /list|array/i) { | |
| 1203 return ($self->{'_queryStart'}, $self->{'_sbjctStart'}); | |
| 1204 } else { | |
| 1205 ## Sensitive to member name changes. | |
| 1206 $seqType = "_\L$seqType\E"; | |
| 1207 return $self->{$seqType.'Start'}; | |
| 1208 } | |
| 1209 } | |
| 1210 | |
| 1211 =head2 end | |
| 1212 | |
| 1213 Usage : $hsp->end( [seq_type] ); | |
| 1214 Purpose : Gets the end coordinate for the query, sbjct, or both sequences | |
| 1215 : in the HSP alignment. | |
| 1216 Example : $qbeg = $hsp->end('query'); | |
| 1217 : $sbeg = $hsp->end('sbjct'); | |
| 1218 : ($qbeg, $sbeg) = $hsp->end(); | |
| 1219 Returns : scalar context: integer | |
| 1220 : array context without args: list of two integers | |
| 1221 Argument : In scalar context: seq_type = 'query' or 'sbjct' | |
| 1222 : (case insensitive). If not supplied, 'query' is used. | |
| 1223 : Array context can be "induced" by providing an argument of 'list' or 'array'. | |
| 1224 Throws : n/a | |
| 1225 | |
| 1226 See Also : L<start()|start>, L<range()|range> | |
| 1227 | |
| 1228 =cut | |
| 1229 | |
| 1230 #---------- | |
| 1231 sub end { | |
| 1232 #---------- | |
| 1233 my ($self, $seqType) = @_; | |
| 1234 | |
| 1235 $seqType ||= (wantarray ? 'list' : 'query'); | |
| 1236 | |
| 1237 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
| 1238 | |
| 1239 if($seqType =~ /list|array/i) { | |
| 1240 return ($self->{'_queryStop'}, $self->{'_sbjctStop'}); | |
| 1241 } else { | |
| 1242 ## Sensitive to member name changes. | |
| 1243 $seqType = "_\L$seqType\E"; | |
| 1244 return $self->{$seqType.'Stop'}; | |
| 1245 } | |
| 1246 } | |
| 1247 | |
| 1248 | |
| 1249 | |
| 1250 =head2 strand | |
| 1251 | |
| 1252 Usage : $hsp_object->strand( [seq_type] ) | |
| 1253 Purpose : Get the strand of the query or sbjct sequence. | |
| 1254 Example : print $hsp->strand('query'); | |
| 1255 : ($qstrand, $sstrand) = $hsp->strand(); | |
| 1256 Returns : -1, 0, or 1 | |
| 1257 : -1 = Minus strand, +1 = Plus strand | |
| 1258 : Returns 0 if strand is not defined, which occurs | |
| 1259 : for non-TBLASTN/X reports. | |
| 1260 : In scalar context without arguments, returns queryStrand value. | |
| 1261 : In array context without arguments, returns a two-element list | |
| 1262 : of strings (queryStrand, sbjctStrand). | |
| 1263 : Array context can be "induced" by providing an argument of 'list' or 'array'. | |
| 1264 Argument : seq_type: 'query' | 'sbjct' or undef | |
| 1265 Throws : n/a | |
| 1266 | |
| 1267 See Also : L<_set_seq()|_set_seq>, L<_set_match_stats()|_set_match_stats> | |
| 1268 | |
| 1269 =cut | |
| 1270 | |
| 1271 #----------- | |
| 1272 sub strand { | |
| 1273 #----------- | |
| 1274 my( $self, $seqType ) = @_; | |
| 1275 $seqType ||= (wantarray ? 'list' : 'query'); | |
| 1276 | |
| 1277 return '' if $seqType eq 'query' and $self->{'_prog'} eq 'TBLASTN'; | |
| 1278 | |
| 1279 ## Sensitive to member name format. | |
| 1280 $seqType = "_\L$seqType\E"; | |
| 1281 | |
| 1282 # $seqType could be '_list'. | |
| 1283 $self->{'_queryStrand'} or $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
| 1284 | |
| 1285 if($seqType =~ /list|array/i) { | |
| 1286 return ('','') unless defined $self->{'_queryStrand'}; | |
| 1287 return ($self->{'_queryStrand'}, $self->{'_sbjctStrand'}); | |
| 1288 } | |
| 1289 local $^W = 0; | |
| 1290 $STRAND_SYMBOL{$self->{$seqType.'Strand'}} || 0; | |
| 1291 } | |
| 1292 | |
| 1293 | |
| 1294 ##################################################################################### | |
| 1295 ## INSTANCE METHODS ## | |
| 1296 ##################################################################################### | |
| 1297 | |
| 1298 | |
| 1299 =head2 seq | |
| 1300 | |
| 1301 Usage : $hsp->seq( [seq_type] ); | |
| 1302 Purpose : Get the query or sbjct sequence as a Bio::Seq.pm object. | |
| 1303 Example : $seqObj = $hsp->seq('query'); | |
| 1304 Returns : Object reference for a Bio::Seq.pm object. | |
| 1305 Argument : seq_type = 'query' or 'sbjct' (default = 'query'). | |
| 1306 Throws : Propagates any exception that occurs during construction | |
| 1307 : of the Bio::Seq.pm object. | |
| 1308 Comments : The sequence is returned in an array of strings corresponding | |
| 1309 : to the strings in the original format of the Blast alignment. | |
| 1310 : (i.e., same spacing). | |
| 1311 | |
| 1312 See Also : L<seq_str()|seq_str>, L<seq_inds()|seq_inds>, B<Bio::Seq.pm> | |
| 1313 | |
| 1314 =cut | |
| 1315 | |
| 1316 #------- | |
| 1317 sub seq { | |
| 1318 #------- | |
| 1319 my($self,$seqType) = @_; | |
| 1320 $seqType ||= 'query'; | |
| 1321 my $str = $self->seq_str($seqType); | |
| 1322 my $num = $self->name; | |
| 1323 my $name = $seqType =~ /query/i | |
| 1324 ? $self->parent->parent->name | |
| 1325 : $self->parent->name; | |
| 1326 | |
| 1327 require Bio::Seq; | |
| 1328 | |
| 1329 new Bio::Seq (-ID => $name, | |
| 1330 -SEQ => $str, | |
| 1331 -DESC => "Blast HSP #$num, $seqType sequence", | |
| 1332 ); | |
| 1333 } | |
| 1334 | |
| 1335 | |
| 1336 | |
| 1337 =head2 seq_str | |
| 1338 | |
| 1339 Usage : $hsp->seq_str( seq_type ); | |
| 1340 Purpose : Get the full query, sbjct, or 'match' sequence as a string. | |
| 1341 : The 'match' sequence is the string of symbols in between the | |
| 1342 : query and sbjct sequences. | |
| 1343 Example : $str = $hsp->seq_str('query'); | |
| 1344 Returns : String | |
| 1345 Argument : seq_Type = 'query' or 'sbjct' or 'match' | |
| 1346 Throws : Exception if the argument does not match an accepted seq_type. | |
| 1347 Comments : Calls _set_residues() to set the 'match' sequence if it has | |
| 1348 : not been set already. | |
| 1349 | |
| 1350 See Also : L<seq()|seq>, L<seq_inds()|seq_inds>, L<_set_match_seq()|_set_match_seq> | |
| 1351 | |
| 1352 =cut | |
| 1353 | |
| 1354 #------------ | |
| 1355 sub seq_str { | |
| 1356 #------------ | |
| 1357 my($self,$seqType) = @_; | |
| 1358 | |
| 1359 ## Sensitive to member name changes. | |
| 1360 $seqType = "_\L$seqType\E"; | |
| 1361 | |
| 1362 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
| 1363 | |
| 1364 if($seqType =~ /sbjct|query/) { | |
| 1365 my $seq = join('',@{$self->{$seqType.'Seq'}}); | |
| 1366 $seq =~ s/\s+//g; | |
| 1367 return $seq; | |
| 1368 | |
| 1369 } elsif( $seqType =~ /match/i) { | |
| 1370 # Only need to call _set_match_seq() if the match seq is requested. | |
| 1371 my $aref = $self->_set_match_seq() unless ref $self->{'_matchSeq'}; | |
| 1372 $aref = $self->{'_matchSeq'}; | |
| 1373 | |
| 1374 ## DEBUGGING CODE: | |
| 1375 # if($self->parent->name eq '1AK5_' and $self->parent->parent->name eq 'YAR073W') { | |
| 1376 # print "seq_str():\n @$aref";<STDIN>; | |
| 1377 # } | |
| 1378 | |
| 1379 return join('',@$aref); | |
| 1380 | |
| 1381 } else { | |
| 1382 $self->throw("Invalid or undefined sequence type: $seqType", | |
| 1383 "Valid types: query, sbjct, match"); | |
| 1384 } | |
| 1385 } | |
| 1386 | |
| 1387 | |
| 1388 | |
| 1389 | |
| 1390 =head2 seq_inds | |
| 1391 | |
| 1392 Usage : $hsp->seq_inds( seq_type, class, collapse ); | |
| 1393 Purpose : Get a list of residue positions (indices) for all identical | |
| 1394 : or conserved residues in the query or sbjct sequence. | |
| 1395 Example : @ind = $hsp->seq_inds('query', 'identical'); | |
| 1396 : @ind = $hsp->seq_inds('sbjct', 'conserved'); | |
| 1397 : @ind = $hsp->seq_inds('sbjct', 'conserved', 1); | |
| 1398 Returns : List of integers | |
| 1399 : May include ranges if collapse is true. | |
| 1400 Argument : seq_type = 'query' or 'sbjct' (default = query) | |
| 1401 : class = 'identical' or 'conserved' (default = identical) | |
| 1402 : (can be shortened to 'id' or 'cons') | |
| 1403 : (actually, anything not 'id' will evaluate to 'conserved'). | |
| 1404 : collapse = boolean, if true, consecutive positions are merged | |
| 1405 : using a range notation, e.g., "1 2 3 4 5 7 9 10 11" | |
| 1406 : collapses to "1-5 7 9-11". This is useful for | |
| 1407 : consolidating long lists. Default = no collapse. | |
| 1408 Throws : n/a. | |
| 1409 Comments : Calls _set_residues() to set the 'match' sequence if it has | |
| 1410 : not been set already. | |
| 1411 | |
| 1412 See Also : L<seq()|seq>, L<_set_residues()|_set_residues>, L<collapse_nums()|collapse_nums>, B<Bio::Tools::Blast::Sbjct::seq_inds()> | |
| 1413 | |
| 1414 =cut | |
| 1415 | |
| 1416 #--------------- | |
| 1417 sub seq_inds { | |
| 1418 #--------------- | |
| 1419 my ($self, $seq, $class, $collapse) = @_; | |
| 1420 | |
| 1421 $seq ||= 'query'; | |
| 1422 $class ||= 'identical'; | |
| 1423 $collapse ||= 0; | |
| 1424 | |
| 1425 $self->_set_residues() unless defined $self->{'_identicalRes_query'}; | |
| 1426 | |
| 1427 $seq = ($seq !~ /^q/i ? 'sbjct' : 'query'); | |
| 1428 $class = ($class !~ /^id/i ? 'conserved' : 'identical'); | |
| 1429 | |
| 1430 ## Sensitive to member name changes. | |
| 1431 $seq = "_\L$seq\E"; | |
| 1432 $class = "_\L$class\E"; | |
| 1433 | |
| 1434 my @ary = sort { $a <=> $b } keys %{ $self->{"${class}Res$seq"}}; | |
| 1435 | |
| 1436 return $collapse ? &collapse_nums(@ary) : @ary; | |
| 1437 } | |
| 1438 | |
| 1439 | |
| 1440 | |
| 1441 | |
| 1442 =head2 get_aln | |
| 1443 | |
| 1444 Usage : $hsp->get_aln() | |
| 1445 Purpose : Get a Bio::UnivAln.pm object constructed from the query + sbjct | |
| 1446 : sequences of the present HSP object. | |
| 1447 Example : $aln_obj = $hsp->get_aln(); | |
| 1448 Returns : Object reference for a Bio::UnivAln.pm object. | |
| 1449 Argument : n/a. | |
| 1450 Throws : Propagates any exception ocurring during the construction of | |
| 1451 : the Bio::UnivAln object. | |
| 1452 Comments : Requires Bio::UnivAln.pm. | |
| 1453 : The Bio::UnivAln.pm object is constructed from the query + sbjct | |
| 1454 : sequence objects obtained by calling seq(). | |
| 1455 : Gap residues are included (see $GAP_SYMBOL). It is important that | |
| 1456 : Bio::UnivAln.pm recognizes the gaps correctly. A strategy for doing | |
| 1457 : this is being considered. Currently it is hard-wired. | |
| 1458 | |
| 1459 See Also : L<seq()|seq>, B<Bio::UnivAln.pm> | |
| 1460 | |
| 1461 =cut | |
| 1462 | |
| 1463 #------------ | |
| 1464 sub get_aln { | |
| 1465 #------------ | |
| 1466 my $self = shift; | |
| 1467 | |
| 1468 require Bio::UnivAln; | |
| 1469 | |
| 1470 my $qseq = $self->seq('query'); | |
| 1471 my $sseq = $self->seq('sbjct'); | |
| 1472 | |
| 1473 my $desc = sprintf "HSP #%s of query %s vs. sbjct %s", | |
| 1474 $self->name, $self->parent->parent->name, $self->parent->name; | |
| 1475 | |
| 1476 my $type = $self->{'_prog'} =~ /P$|^T/ ? 'amino' : 'dna'; | |
| 1477 | |
| 1478 Bio::UnivAln->new( -seqs => [$qseq, $sseq], | |
| 1479 -desc => $desc, | |
| 1480 -type => $type, | |
| 1481 ); | |
| 1482 } | |
| 1483 | |
| 1484 | |
| 1485 =head2 display | |
| 1486 | |
| 1487 Usage : $sbjct_object->display( %named_parameters ); | |
| 1488 Purpose : Display information about Bio::Tools::Blast::Sbjct.pm data members | |
| 1489 : including: length, gaps, score, significance value, | |
| 1490 : sequences and sequence indices. | |
| 1491 Example : $object->display(-SHOW=>'stats'); | |
| 1492 Argument : Named parameters: (TAGS CAN BE UPPER OR LOWER CASE) | |
| 1493 : -SHOW => 'hsp', | |
| 1494 : -WHERE => filehandle (default = STDOUT) | |
| 1495 Returns : n/a | |
| 1496 Status : Experimental | |
| 1497 Comments : For more control over the display of sequence data, | |
| 1498 : use seq(), seq_str(), seq_inds(). | |
| 1499 | |
| 1500 See Also : L<_display_seq()|_display_seq>, L<seq()|seq>, L<seq_str()|seq_str>, L<seq_inds()|seq_inds>, L<_display_matches()|_display_matches>, B<Bio::Root::Object::display()> | |
| 1501 | |
| 1502 =cut | |
| 1503 | |
| 1504 #----------- | |
| 1505 sub display { | |
| 1506 #----------- | |
| 1507 my( $self, %param ) = @_; | |
| 1508 | |
| 1509 my $sbjctName = $self->parent->name(); | |
| 1510 my $queryName = $self->parent->parent->name(); | |
| 1511 my $layout = $self->parent->parent->_layout(); | |
| 1512 | |
| 1513 my $OUT = $self->set_display(%param); | |
| 1514 | |
| 1515 printf( $OUT "%-15s: %d\n", "LENGTH TOTAL", $self->length('total') ); | |
| 1516 printf( $OUT "%-15s: %d\n", "LENGTH QUERY", $self->length('query') ); | |
| 1517 printf( $OUT "%-15s: %d\n", "LENGTH SBJCT", $self->length('sbjct') ); | |
| 1518 printf( $OUT "%-15s: %d\n", "GAPS QUERY", $self->gaps('query') ); | |
| 1519 printf( $OUT "%-15s: %d\n", "GAPS SBJCT", $self->gaps('sbjct') ); | |
| 1520 printf( $OUT "%-15s: %d\n", "SCORE", $self->{'_score'} ); | |
| 1521 printf( $OUT "%-15s: %0.1f\n", "BITS", $self->{'_bits'} ); | |
| 1522 if($layout == 1) { | |
| 1523 printf( $OUT "%-15s: %.1e\n", "P-VAL", $self->{'_p'} ); | |
| 1524 printf( $OUT "%-15s: %.1e\n", "EXPECT", $self->{'_expect'} ); | |
| 1525 } else { | |
| 1526 printf( $OUT "%-15s: %.1e\n", "EXPECT", $self->{'_expect'} ); | |
| 1527 } | |
| 1528 | |
| 1529 my $queryLength = $self->length('query'); | |
| 1530 | |
| 1531 printf( $OUT "%-15s: %d (%0.0f%%)\n", "IDENTICAL", $self->{'_numIdentical'}, | |
| 1532 $self->{'_numIdentical'}/$queryLength * 100 ); | |
| 1533 printf( $OUT "%-15s: %d (%0.0f%%) %s \n", "CONSERVED", $self->{'_numConserved'}, | |
| 1534 $self->{'_numConserved'}/$queryLength * 100, | |
| 1535 "includes identical" ); | |
| 1536 | |
| 1537 $self->_display_seq('query', $queryName, $OUT); | |
| 1538 $self->_display_seq('sbjct', $sbjctName, $OUT); | |
| 1539 $self->_display_matches($queryName, $sbjctName, $OUT); | |
| 1540 } | |
| 1541 | |
| 1542 | |
| 1543 | |
| 1544 | |
| 1545 =head2 _display_seq | |
| 1546 | |
| 1547 Usage : n/a; called automatically by display() | |
| 1548 Purpose : Display information about query and sbjct HSP sequences. | |
| 1549 : Prints the start, stop coordinates and the actual sequence. | |
| 1550 Example : n/a | |
| 1551 Argument : | |
| 1552 Returns : printf call. | |
| 1553 Status : Experimental | |
| 1554 Comments : For more control, use seq(), seq_str(), or seq_inds(). | |
| 1555 | |
| 1556 See Also : L<display()|display>, L<seq()|seq>, L<seq_str()|seq_str>, L<seq_inds()|seq_inds>, L<_display_matches()|_display_matches> | |
| 1557 | |
| 1558 =cut | |
| 1559 | |
| 1560 #------------------ | |
| 1561 sub _display_seq { | |
| 1562 #------------------ | |
| 1563 my( $self, $seqType, $name, $OUT ) = @_; | |
| 1564 | |
| 1565 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
| 1566 | |
| 1567 # Sensitive to member name changes. | |
| 1568 my $mem = "_\L$seqType\E"; | |
| 1569 printf( $OUT "\n%10s: %s\n%10s %s\n", "\U$seqType\E", "$name", "-----", | |
| 1570 ('-'x ((CORE::length $name) + 2)) ); | |
| 1571 printf( $OUT "%13s: %d\n", "START", $self->{$mem.'Start'} ); | |
| 1572 printf( $OUT "%13s: %d\n", "STOP", $self->{$mem.'Stop'} ); | |
| 1573 printf( $OUT "%13s: \n", "SEQ" ); | |
| 1574 foreach( @{ $self->{$mem.'Seq'}} ) { | |
| 1575 printf( $OUT "%15s%s\n", "", $_); | |
| 1576 } | |
| 1577 } | |
| 1578 | |
| 1579 | |
| 1580 =head2 _display_matches | |
| 1581 | |
| 1582 Usage : n/a; called automatically by display() | |
| 1583 Purpose : Display information about identical and conserved positions | |
| 1584 : within both the query and sbjct sequences. | |
| 1585 Example : n/a | |
| 1586 Argument : | |
| 1587 Returns : printf call. | |
| 1588 Status : Experimental | |
| 1589 Comments : For more control, use seq_inds(). | |
| 1590 | |
| 1591 See Also : L<display()|display>, L<seq_inds()|seq_inds>, L<_display_seq()|_display_seq>, | |
| 1592 | |
| 1593 =cut | |
| 1594 | |
| 1595 #-------------------- | |
| 1596 sub _display_matches { | |
| 1597 #-------------------- | |
| 1598 my( $self, $queryName, $sbjctName, $OUT) = @_; | |
| 1599 my($resNum, $count); | |
| 1600 | |
| 1601 $self->_set_residues() unless defined $self->{'_identicalRes_query'}; | |
| 1602 | |
| 1603 printf( $OUT "\n%10s: \n%10s\n", "HITS", "-----" ); | |
| 1604 foreach( @{ $self->{'_matchSeq'}} ) { | |
| 1605 printf( $OUT "%15s%s\n", "", $_ ); | |
| 1606 } | |
| 1607 | |
| 1608 print $OUT "\n\U$queryName\E\n------------\n"; | |
| 1609 printf( $OUT "\n%5s%s:\n%5s%s\n\t", "", "IDENTICAL RESIDUES IN $queryName (n=$self->{'_numIdentical'})", | |
| 1610 "", "--------------------------------------------" ); | |
| 1611 $count = 0; | |
| 1612 foreach $resNum ( sort keys %{ $self->{'_identicalRes_query' }} ) { | |
| 1613 $count++; | |
| 1614 print $OUT "$resNum"; | |
| 1615 $count > 0 and print $OUT +( $count % 15 ? ", " : "\n\t"); | |
| 1616 } | |
| 1617 | |
| 1618 print $OUT "\n"; | |
| 1619 | |
| 1620 my $justConserved = ($self->{'_numConserved'})-($self->{'_numIdentical'}); | |
| 1621 printf( $OUT "\n%5s%s:\n%5s%s\n\t", "","CONSERVED RESIDUES IN $queryName (n=$justConserved)", | |
| 1622 "", "--------------------------------------------" ); | |
| 1623 $count = 0; | |
| 1624 foreach $resNum ( sort keys %{ $self->{'_conservedRes_query' }} ) { | |
| 1625 $count++; | |
| 1626 print $OUT "$resNum"; | |
| 1627 $count > 0 and print $OUT +( $count % 15 ? ", " : "\n\t"); | |
| 1628 } | |
| 1629 | |
| 1630 | |
| 1631 print $OUT "\n\n\U$sbjctName\E\n------------\n"; | |
| 1632 printf( $OUT "\n%5s%s:\n%5s%s\n\t", "", "IDENTICAL RESIDUES IN $sbjctName (n=$self->{'_numIdentical'})", | |
| 1633 "", "--------------------------------------------" ); | |
| 1634 $count = 0; | |
| 1635 foreach $resNum ( sort keys %{ $self->{'_identicalRes_sbjct' }} ) { | |
| 1636 $count++; | |
| 1637 print $OUT "$resNum"; | |
| 1638 $count > 0 and print $OUT +( $count % 15 ? ", " : "\n\t"); | |
| 1639 } | |
| 1640 | |
| 1641 print $OUT "\n"; | |
| 1642 $justConserved = ($self->{'_numConserved'})-($self->{'_numIdentical'}); | |
| 1643 printf( $OUT "\n%5s%s:\n%5s%s\n\t", "","CONSERVED RESIDUES IN $sbjctName (n=$justConserved)", | |
| 1644 "", "--------------------------------------------" ); | |
| 1645 $count = 0; | |
| 1646 foreach $resNum ( sort keys %{ $self->{'_conservedRes_sbjct' }} ) { | |
| 1647 $count++; | |
| 1648 print $OUT "$resNum"; | |
| 1649 $count > 0 and print $OUT +( $count % 15 ? ", " : "\n\t"); | |
| 1650 } | |
| 1651 } | |
| 1652 | |
| 1653 | |
| 1654 | |
| 1655 | |
| 1656 =head2 homol_data | |
| 1657 | |
| 1658 Usage : $data = $hsp_object->homo_data( %named_params ); | |
| 1659 Purpose : Gets similarity data for a single HSP. | |
| 1660 Returns : String: | |
| 1661 : "Homology data" for each HSP is in the format: | |
| 1662 : "<integer> <start> <stop>" | |
| 1663 : where integer is the value returned by homol_score(). | |
| 1664 Argument : Named params: (UPPER OR LOWERCASE TAGS) | |
| 1665 : currently just one param is used: | |
| 1666 : -SEQ =>'query' or 'sbjct' | |
| 1667 Throws : n/a | |
| 1668 Status : Experimental | |
| 1669 Comments : This is a very experimental method used for obtaining a | |
| 1670 : coarse indication of: | |
| 1671 : 1) how strong the similarity is between the sequences in the HSP, | |
| 1672 : 3) the endpoints of the alignment (sequence monomer numbers) | |
| 1673 | |
| 1674 See Also : L<homol_score()|homol_score>, B<Bio::Tools::Blast.::homol_data()>, B<Bio::Tools::Blast::Sbjct::homol_data()> | |
| 1675 | |
| 1676 =cut | |
| 1677 | |
| 1678 #--------------- | |
| 1679 sub homol_data { | |
| 1680 #--------------- | |
| 1681 my ($self, %param) = @_; | |
| 1682 my $seq = $param{-SEQ} || $param{'-seq'} || 'sbjct'; # 'query' or 'sbjct' | |
| 1683 my $homolScore = $self->homol_score(); | |
| 1684 # Sensitive to member name changes. | |
| 1685 $seq = "_\L$seq\E"; | |
| 1686 | |
| 1687 $self->_set_seq_data() unless $self->{'_set_seq_data'}; | |
| 1688 return ( $homolScore.' '.$self->{$seq.'Start'}.' '.$self->{$seq.'Stop'}); | |
| 1689 } | |
| 1690 | |
| 1691 | |
| 1692 =head2 homol_score | |
| 1693 | |
| 1694 Usage : $self->homol_score(); | |
| 1695 Purpose : Get a homology score (integer 1 - 3) as a coarse representation of | |
| 1696 : the strength of the similarity independent of sequence composition. | |
| 1697 : Based on the Blast bit score. | |
| 1698 Example : $hscore = $hsp->homol_score(); | |
| 1699 Returns : Integer | |
| 1700 Argument : n/a | |
| 1701 Throws : n/a | |
| 1702 Status : Experimental | |
| 1703 Comments : See @Bio::Tools::Blast::HSP::SCORE_CUTOFFS for the specific values. | |
| 1704 : Currently, BIT_SCORE HOMOL_SCORE | |
| 1705 : --------- ----------- | |
| 1706 : >=100 --> 3 | |
| 1707 : 30-100 --> 2 | |
| 1708 : < 30 --> 1 | |
| 1709 | |
| 1710 See Also : L<homol_data()|homol_data> | |
| 1711 | |
| 1712 =cut | |
| 1713 | |
| 1714 #---------------- | |
| 1715 sub homol_score { | |
| 1716 #---------------- | |
| 1717 my $self = shift; | |
| 1718 | |
| 1719 if( $self->{'_bits'} >= $SCORE_CUTOFFS[0] ) { 1 } | |
| 1720 elsif($self->{'_bits'} < $SCORE_CUTOFFS[0] and | |
| 1721 $self->{'_bits'} >= $SCORE_CUTOFFS[1] ) { 2 } | |
| 1722 else { 3 } | |
| 1723 } | |
| 1724 | |
| 1725 | |
| 1726 ##################################################################################### | |
| 1727 ## CLASS METHODS ## | |
| 1728 ##################################################################################### | |
| 1729 | |
| 1730 =head1 CLASS METHODS | |
| 1731 | |
| 1732 =head2 collapse_nums | |
| 1733 | |
| 1734 Usage : @cnums = collapse_nums( @numbers ); | |
| 1735 Purpose : Collapses a list of numbers into a set of ranges of consecutive terms: | |
| 1736 : Useful for condensing long lists of consecutive numbers. | |
| 1737 : EXPANDED: | |
| 1738 : 1 2 3 4 5 6 10 12 13 14 15 17 18 20 21 22 24 26 30 31 32 | |
| 1739 : COLLAPSED: | |
| 1740 : 1-6 10 12-15 17 18 20-22 24 26 30-32 | |
| 1741 Argument : List of numbers and sorted numerically. | |
| 1742 Returns : List of numbers mixed with ranges of numbers (see above). | |
| 1743 Throws : n/a | |
| 1744 Comments : Probably belongs in a more general utility class. | |
| 1745 | |
| 1746 See Also : L<seq_inds()|seq_inds> | |
| 1747 | |
| 1748 =cut | |
| 1749 | |
| 1750 #------------------ | |
| 1751 sub collapse_nums { | |
| 1752 #------------------ | |
| 1753 # This is not the slickest connectivity algorithm, but will do for now. | |
| 1754 my @a = @_; | |
| 1755 my ($from, $to, $i, @ca, $consec); | |
| 1756 | |
| 1757 $consec = 0; | |
| 1758 for($i=0; $i < @a; $i++) { | |
| 1759 not $from and do{ $from = $a[$i]; next; }; | |
| 1760 if($a[$i] == $a[$i-1]+1) { | |
| 1761 $to = $a[$i]; | |
| 1762 $consec++; | |
| 1763 } else { | |
| 1764 if($consec == 1) { $from .= ",$to"; } | |
| 1765 else { $from .= $consec>1 ? "\-$to" : ""; } | |
| 1766 push @ca, split(',', $from); | |
| 1767 $from = $a[$i]; | |
| 1768 $consec = 0; | |
| 1769 $to = undef; | |
| 1770 } | |
| 1771 } | |
| 1772 if(defined $to) { | |
| 1773 if($consec == 1) { $from .= ",$to"; } | |
| 1774 else { $from .= $consec>1 ? "\-$to" : ""; } | |
| 1775 } | |
| 1776 push @ca, split(',', $from) if $from; | |
| 1777 | |
| 1778 @ca; | |
| 1779 } | |
| 1780 | |
| 1781 | |
| 1782 1; | |
| 1783 __END__ | |
| 1784 | |
| 1785 ##################################################################################### | |
| 1786 # END OF CLASS | |
| 1787 ##################################################################################### | |
| 1788 | |
| 1789 =head1 FOR DEVELOPERS ONLY | |
| 1790 | |
| 1791 =head2 Data Members | |
| 1792 | |
| 1793 Information about the various data members of this module is provided for those | |
| 1794 wishing to modify or understand the code. Two things to bear in mind: | |
| 1795 | |
| 1796 =over 4 | |
| 1797 | |
| 1798 =item 1 Do NOT rely on these in any code outside of this module. | |
| 1799 | |
| 1800 All data members are prefixed with an underscore to signify that they are private. | |
| 1801 Always use accessor methods. If the accessor doesn't exist or is inadequate, | |
| 1802 create or modify an accessor (and let me know, too!). | |
| 1803 | |
| 1804 =item 2 This documentation may be incomplete and out of date. | |
| 1805 | |
| 1806 It is easy for these data member descriptions to become obsolete as | |
| 1807 this module is still evolving. Always double check this info and search | |
| 1808 for members not described here. | |
| 1809 | |
| 1810 =back | |
| 1811 | |
| 1812 An instance of Bio::Tools::Blast::HSP.pm is a blessed reference to a hash containing | |
| 1813 all or some of the following fields: | |
| 1814 | |
| 1815 FIELD VALUE | |
| 1816 -------------------------------------------------------------- | |
| 1817 (member names are mostly self-explanatory) | |
| 1818 | |
| 1819 _score : | |
| 1820 _bits : | |
| 1821 _p : | |
| 1822 _n : Integer. The 'N' value listed in parenthesis with P/Expect value: | |
| 1823 : e.g., P(3) = 1.2e-30 ---> (N = 3). | |
| 1824 : Not defined in NCBI Blast2 with gaps. | |
| 1825 : To obtain the number of HSPs, use Bio::Tools::Blast::Sbjct::num_hsps(). | |
| 1826 _expect : | |
| 1827 _queryLength : | |
| 1828 _queryGaps : | |
| 1829 _queryStart : | |
| 1830 _queryStop : | |
| 1831 _querySeq : | |
| 1832 _sbjctLength : | |
| 1833 _sbjctGaps : | |
| 1834 _sbjctStart : | |
| 1835 _sbjctStop : | |
| 1836 _sbjctSeq : | |
| 1837 _matchSeq : String. Contains the symbols between the query and sbjct lines | |
| 1838 which indicate identical (letter) and conserved ('+') matches | |
| 1839 or a mismatch (' '). | |
| 1840 _numIdentical : | |
| 1841 _numConserved : | |
| 1842 _identicalRes_query : | |
| 1843 _identicalRes_sbjct : | |
| 1844 _conservedRes_query : | |
| 1845 _conservedRes_sbjct : | |
| 1846 _match_indent : The number of leading space characters on each line containing | |
| 1847 the match symbols. _match_indent is 13 in this example: | |
| 1848 Query: 285 QNSAPWGLARISHRERLNLGSFNKYLYDDDAG | |
| 1849 Q +APWGLARIS G+ + Y YD+ AG | |
| 1850 ^^^^^^^^^^^^^ | |
| 1851 | |
| 1852 INHERITED DATA MEMBERS | |
| 1853 | |
| 1854 _name : From Bio::Root::Object.pm. | |
| 1855 : | |
| 1856 _parent : From Bio::Root::Object.pm. This member contains a reference to the | |
| 1857 : Bio::Tools::Blast::Sbjct.pm object to which this hit belongs. | |
| 1858 | |
| 1859 | |
| 1860 =cut | |
| 1861 | |
| 1862 1; | |
| 1863 |
