0
|
1 #-----------------------------------------------------------------
|
|
2 # $Id: BlastHSP.pm,v 1.20 2002/12/24 15:45:33 jason Exp $
|
|
3 #
|
|
4 # BioPerl module Bio::Search::HSP::BlastHSP
|
|
5 #
|
|
6 # (This module was originally called Bio::Tools::Blast::HSP)
|
|
7 #
|
|
8 # Cared for by Steve Chervitz <sac@bioperl.org>
|
|
9 #
|
|
10 # You may distribute this module under the same terms as perl itself
|
|
11 #-----------------------------------------------------------------
|
|
12
|
|
13 ## POD Documentation:
|
|
14
|
|
15 =head1 NAME
|
|
16
|
|
17 Bio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object
|
|
18
|
|
19 =head1 SYNOPSIS
|
|
20
|
|
21 The construction of BlastHSP objects is performed by
|
|
22 Bio::Factory::BlastHitFactory in a process that is
|
|
23 orchestrated by the Blast parser (B<Bio::SearchIO::psiblast>).
|
|
24 The resulting BlastHSPs are then accessed via
|
|
25 B<Bio::Search::Hit::BlastHit>). Therefore, you do not need to
|
|
26 use B<Bio::Search::HSP::BlastHSP>) directly. If you need to construct
|
|
27 BlastHSPs directly, see the new() function for details.
|
|
28
|
|
29 For B<Bio::SearchIO> BLAST parsing usage examples, see the
|
|
30 B<examples/search-blast> directory of the Bioperl distribution.
|
|
31
|
|
32 =head1 DESCRIPTION
|
|
33
|
|
34 A Bio::Search::HSP::BlastHSP object provides an interface to data
|
|
35 obtained in a single alignment section of a Blast report (known as a
|
|
36 "High-scoring Segment Pair"). This is essentially a pairwise
|
|
37 alignment with score information.
|
|
38
|
|
39 BlastHSP objects are accessed via B<Bio::Search::Hit::BlastHit>
|
|
40 objects after parsing a BLAST report using the B<Bio::SearchIO>
|
|
41 system.
|
|
42
|
|
43 =head2 Start and End coordinates
|
|
44
|
|
45 Sequence endpoints are swapped so that start is always less than
|
|
46 end. This affects For TBLASTN/X hits on the minus strand. Strand
|
|
47 information can be recovered using the strand() method. This
|
|
48 normalization step is standard Bioperl practice. It also facilitates
|
|
49 use of range information by methods such as match().
|
|
50
|
|
51 =over 1
|
|
52
|
|
53 =item * Supports BLAST versions 1.x and 2.x, gapped and ungapped.
|
|
54
|
|
55 =back
|
|
56
|
|
57 Bio::Search::HSP::BlastHSP.pm has the ability to extract a list of all
|
|
58 residue indices for identical and conservative matches along both
|
|
59 query and sbjct sequences. Since this degree of detail is not always
|
|
60 needed, this behavior does not occur during construction of the BlastHSP
|
|
61 object. These data will automatically be collected as necessary as
|
|
62 the BlastHSP.pm object is used.
|
|
63
|
|
64 =head1 DEPENDENCIES
|
|
65
|
|
66 Bio::Search::HSP::BlastHSP.pm is a concrete class that inherits from
|
|
67 B<Bio::SeqFeature::SimilarityPair> and B<Bio::Search::HSP::HSPI>.
|
|
68 B<Bio::Seq> and B<Bio::SimpleAlign> are employed for creating
|
|
69 sequence and alignment objects, respectively.
|
|
70
|
|
71 =head2 Relationship to SimpleAlign.pm & Seq.pm
|
|
72
|
|
73 BlastHSP.pm can provide the query or sbjct sequence as a B<Bio::Seq>
|
|
74 object via the L<seq()|seq> method. The BlastHSP.pm object can also create a
|
|
75 two-sequence B<Bio::SimpleAlign> alignment object using the the query
|
|
76 and sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment
|
|
77 objects is not automatic when constructing the BlastHSP.pm object since
|
|
78 this level of functionality is not always required and would generate
|
|
79 a lot of extra overhead when crunching many reports.
|
|
80
|
|
81
|
|
82 =head1 FEEDBACK
|
|
83
|
|
84 =head2 Mailing Lists
|
|
85
|
|
86 User feedback is an integral part of the evolution of this and other
|
|
87 Bioperl modules. Send your comments and suggestions preferably to one
|
|
88 of the Bioperl mailing lists. Your participation is much appreciated.
|
|
89
|
|
90 bioperl-l@bioperl.org - General discussion
|
|
91 http://bio.perl.org/MailList.html - About the mailing lists
|
|
92
|
|
93 =head2 Reporting Bugs
|
|
94
|
|
95 Report bugs to the Bioperl bug tracking system to help us keep track
|
|
96 the bugs and their resolution. Bug reports can be submitted via email
|
|
97 or the web:
|
|
98
|
|
99 bioperl-bugs@bio.perl.org
|
|
100 http://bugzilla.bioperl.org/
|
|
101
|
|
102 =head1 AUTHOR
|
|
103
|
|
104 Steve Chervitz E<lt>sac@bioperl.orgE<gt>
|
|
105
|
|
106 See L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.
|
|
107
|
|
108 =head1 ACKNOWLEDGEMENTS
|
|
109
|
|
110 This software was originally developed in the Department of Genetics
|
|
111 at Stanford University. I would also like to acknowledge my
|
|
112 colleagues at Affymetrix for useful feedback.
|
|
113
|
|
114 =head1 SEE ALSO
|
|
115
|
|
116 Bio::Search::Hit::BlastHit.pm - Blast hit object.
|
|
117 Bio::Search::Result::BlastResult.pm - Blast Result object.
|
|
118 Bio::Seq.pm - Biosequence object
|
|
119
|
|
120 =head2 Links:
|
|
121
|
|
122 http://bio.perl.org/Core/POD/Tools/Blast/BlastHit.pm.html
|
|
123
|
|
124 http://bio.perl.org/Projects/modules.html - Online module documentation
|
|
125 http://bio.perl.org/Projects/Blast/ - Bioperl Blast Project
|
|
126 http://bio.perl.org/ - Bioperl Project Homepage
|
|
127
|
|
128 =head1 COPYRIGHT
|
|
129
|
|
130 Copyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.
|
|
131
|
|
132 =head1 DISCLAIMER
|
|
133
|
|
134 This software is provided "as is" without warranty of any kind.
|
|
135
|
|
136 =cut
|
|
137
|
|
138
|
|
139 # END of main POD documentation.
|
|
140
|
|
141 =head1 APPENDIX
|
|
142
|
|
143 The rest of the documentation details each of the object methods.
|
|
144 Internal methods are usually preceded with a _
|
|
145
|
|
146 =cut
|
|
147
|
|
148 # Let the code begin...
|
|
149
|
|
150 package Bio::Search::HSP::BlastHSP;
|
|
151
|
|
152 use strict;
|
|
153 use Bio::SeqFeature::SimilarityPair;
|
|
154 use Bio::SeqFeature::Similarity;
|
|
155 use Bio::Search::HSP::HSPI;
|
|
156
|
|
157 use vars qw( @ISA $GAP_SYMBOL $Revision %STRAND_SYMBOL );
|
|
158
|
|
159 use overload
|
|
160 '""' => \&to_string;
|
|
161
|
|
162 $Revision = '$Id: BlastHSP.pm,v 1.20 2002/12/24 15:45:33 jason Exp $'; #'
|
|
163
|
|
164 @ISA = qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);
|
|
165
|
|
166 $GAP_SYMBOL = '-'; # Need a more general way to handle gap symbols.
|
|
167 %STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );
|
|
168
|
|
169
|
|
170 =head2 new
|
|
171
|
|
172 Usage : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );
|
|
173 : Bio::Search::HSP::BlastHSP.pm objects are constructed
|
|
174 : automatically by Bio::SearchIO::BlastHitFactory.pm,
|
|
175 : so there is no need for direct instantiation.
|
|
176 Purpose : Constructs a new BlastHSP object and Initializes key variables
|
|
177 : for the HSP.
|
|
178 Returns : A Bio::Search::HSP::BlastHSP object
|
|
179 Argument : Named parameters:
|
|
180 : Parameter keys are case-insensitive.
|
|
181 : -RAW_DATA => array ref containing raw BLAST report data for
|
|
182 : for a single HSP. This includes all lines
|
|
183 : of the HSP alignment from a traditional BLAST
|
|
184 or PSI-BLAST (non-XML) report,
|
|
185 : -RANK => integer (1..n).
|
|
186 : -PROGRAM => string ('TBLASTN', 'BLASTP', etc.).
|
|
187 : -QUERY_NAME => string, id of query sequence
|
|
188 : -HIT_NAME => string, id of hit sequence
|
|
189 :
|
|
190 Comments : Having the raw data allows this object to do lazy parsing of
|
|
191 : the raw HSP data (i.e., not parsed until needed).
|
|
192 :
|
|
193 : Note that there is a fair amount of basic parsing that is
|
|
194 : currently performed in this module that would be more appropriate
|
|
195 : to do within a separate factory object.
|
|
196 : This parsing code will likely be relocated and more initialization
|
|
197 : parameters will be added to new().
|
|
198 :
|
|
199 See Also : B<Bio::SeqFeature::SimilarityPair::new()>, B<Bio::SeqFeature::Similarity::new()>
|
|
200
|
|
201 =cut
|
|
202
|
|
203 #----------------
|
|
204 sub new {
|
|
205 #----------------
|
|
206 my ($class, @args ) = @_;
|
|
207
|
|
208 my $self = $class->SUPER::new( @args );
|
|
209 # Initialize placeholders
|
|
210 $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;
|
|
211 my ($raw_data, $qname, $hname, $qlen, $hlen);
|
|
212
|
|
213 ($self->{'_prog'}, $self->{'_rank'}, $raw_data,
|
|
214 $qname, $hname) =
|
|
215 $self->_rearrange([qw( PROGRAM
|
|
216 RANK
|
|
217 RAW_DATA
|
|
218 QUERY_NAME
|
|
219 HIT_NAME
|
|
220 )], @args );
|
|
221
|
|
222 # _set_data() does a fair amount of parsing.
|
|
223 # This will likely change (see comment above.)
|
|
224 $self->_set_data( @{$raw_data} );
|
|
225 # Store the aligned query as sequence feature
|
|
226 my ($qb, $hb) = ($self->start());
|
|
227 my ($qe, $he) = ($self->end());
|
|
228 my ($qs, $hs) = ($self->strand());
|
|
229 my ($qf,$hf) = ($self->query->frame(),
|
|
230 $self->hit->frame);
|
|
231
|
|
232 $self->query( Bio::SeqFeature::Similarity->new (-start =>$qb,
|
|
233 -end =>$qe,
|
|
234 -strand =>$qs,
|
|
235 -bits =>$self->bits,
|
|
236 -score =>$self->score,
|
|
237 -frame =>$qf,
|
|
238 -seq_id => $qname,
|
|
239 -source =>$self->{'_prog'} ));
|
|
240
|
|
241 $self->hit( Bio::SeqFeature::Similarity->new (-start =>$hb,
|
|
242 -end =>$he,
|
|
243 -strand =>$hs,
|
|
244 -bits =>$self->bits,
|
|
245 -score =>$self->score,
|
|
246 -frame =>$hf,
|
|
247 -seq_id => $hname,
|
|
248 -source =>$self->{'_prog'} ));
|
|
249
|
|
250 # set lengths
|
|
251 $self->query->seqlength($qlen); # query
|
|
252 $self->hit->seqlength($hlen); # subject
|
|
253
|
|
254 $self->query->frac_identical($self->frac_identical('query'));
|
|
255 $self->hit->frac_identical($self->frac_identical('hit'));
|
|
256 return $self;
|
|
257 }
|
|
258
|
|
259 #sub DESTROY {
|
|
260 # my $self = shift;
|
|
261 # #print STDERR "--->DESTROYING $self\n";
|
|
262 #}
|
|
263
|
|
264
|
|
265 # Title : _id_str;
|
|
266 # Purpose : Intended for internal use only to provide a string for use
|
|
267 # within exception messages to help users figure out which
|
|
268 # query/hit caused the problem.
|
|
269 # Returns : Short string with name of query and hit seq
|
|
270 sub _id_str {
|
|
271 my $self = shift;
|
|
272 if( not defined $self->{'_id_str'}) {
|
|
273 my $qname = $self->query->seqname;
|
|
274 my $hname = $self->hit->seqname;
|
|
275 $self->{'_id_str'} = "QUERY=\"$qname\" HIT=\"$hname\"";
|
|
276 }
|
|
277 return $self->{'_id_str'};
|
|
278 }
|
|
279
|
|
280 #=================================================
|
|
281 # Begin Bio::Search::HSP::HSPI implementation
|
|
282 #=================================================
|
|
283
|
|
284 =head2 algorithm
|
|
285
|
|
286 Title : algorithm
|
|
287 Usage : $alg = $hsp->algorithm();
|
|
288 Function: Gets the algorithm specification that was used to obtain the hsp
|
|
289 For BLAST, the algorithm denotes what type of sequence was aligned
|
|
290 against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated
|
|
291 dna-prt, TBLASTN prt-translated dna, TBLASTX translated
|
|
292 dna-translated dna).
|
|
293 Returns : a scalar string
|
|
294 Args : none
|
|
295
|
|
296 =cut
|
|
297
|
|
298 #----------------
|
|
299 sub algorithm {
|
|
300 #----------------
|
|
301 my ($self,@args) = @_;
|
|
302 return $self->{'_prog'};
|
|
303 }
|
|
304
|
|
305
|
|
306
|
|
307
|
|
308 =head2 signif()
|
|
309
|
|
310 Usage : $hsp_obj->signif()
|
|
311 Purpose : Get the P-value or Expect value for the HSP.
|
|
312 Returns : Float (0.001 or 1.3e-43)
|
|
313 : Returns P-value if it is defined, otherwise, Expect value.
|
|
314 Argument : n/a
|
|
315 Throws : n/a
|
|
316 Comments : Provided for consistency with BlastHit::signif()
|
|
317 : Support for returning the significance data in different
|
|
318 : formats (e.g., exponent only), is not provided for HSP objects.
|
|
319 : This is only available for the BlastHit or Blast object.
|
|
320
|
|
321 See Also : L<p()|p>, L<expect()|expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>
|
|
322
|
|
323 =cut
|
|
324
|
|
325 #-----------
|
|
326 sub signif {
|
|
327 #-----------
|
|
328 my $self = shift;
|
|
329 my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};
|
|
330 $val;
|
|
331 }
|
|
332
|
|
333
|
|
334
|
|
335 =head2 evalue
|
|
336
|
|
337 Usage : $hsp_obj->evalue()
|
|
338 Purpose : Get the Expect value for the HSP.
|
|
339 Returns : Float (0.001 or 1.3e-43)
|
|
340 Argument : n/a
|
|
341 Throws : n/a
|
|
342 Comments : Support for returning the expectation data in different
|
|
343 : formats (e.g., exponent only), is not provided for HSP objects.
|
|
344 : This is only available for the BlastHit or Blast object.
|
|
345
|
|
346 See Also : L<p()|p>
|
|
347
|
|
348 =cut
|
|
349
|
|
350 #----------
|
|
351 sub evalue { shift->{'_expect'} }
|
|
352 #----------
|
|
353
|
|
354
|
|
355 =head2 p
|
|
356
|
|
357 Usage : $hsp_obj->p()
|
|
358 Purpose : Get the P-value for the HSP.
|
|
359 Returns : Float (0.001 or 1.3e-43) or undef if not defined.
|
|
360 Argument : n/a
|
|
361 Throws : n/a
|
|
362 Comments : P-value is not defined with NCBI Blast2 reports.
|
|
363 : Support for returning the expectation data in different
|
|
364 : formats (e.g., exponent only) is not provided for HSP objects.
|
|
365 : This is only available for the BlastHit or Blast object.
|
|
366
|
|
367 See Also : L<expect()|expect>
|
|
368
|
|
369 =cut
|
|
370
|
|
371 #-----
|
|
372 sub p { my $self = shift; $self->{'_p'}; }
|
|
373 #-----
|
|
374
|
|
375 # alias
|
|
376 sub pvalue { shift->p(@_); }
|
|
377
|
|
378 =head2 length
|
|
379
|
|
380 Usage : $hsp->length( [seq_type] )
|
|
381 Purpose : Get the length of the aligned portion of the query or sbjct.
|
|
382 Example : $hsp->length('query')
|
|
383 Returns : integer
|
|
384 Argument : seq_type: 'query' | 'hit' or 'sbjct' | 'total' (default = 'total')
|
|
385 ('sbjct' is synonymous with 'hit')
|
|
386 Throws : n/a
|
|
387 Comments : 'total' length is the full length of the alignment
|
|
388 : as reported in the denominators in the alignment section:
|
|
389 : "Identical = 34/120 Positives = 67/120".
|
|
390
|
|
391 See Also : L<gaps()|gaps>
|
|
392
|
|
393 =cut
|
|
394
|
|
395 #-----------
|
|
396 sub length {
|
|
397 #-----------
|
|
398 ## Developer note: when using the built-in length function within
|
|
399 ## this module, call it as CORE::length().
|
|
400 my( $self, $seqType ) = @_;
|
|
401 $seqType ||= 'total';
|
|
402 $seqType = 'sbjct' if $seqType eq 'hit';
|
|
403
|
|
404 $seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};
|
|
405
|
|
406 ## Sensitive to member name format.
|
|
407 $seqType = "_\L$seqType\E";
|
|
408 $self->{$seqType.'Length'};
|
|
409 }
|
|
410
|
|
411
|
|
412
|
|
413 =head2 gaps
|
|
414
|
|
415 Usage : $hsp->gaps( [seq_type] )
|
|
416 Purpose : Get the number of gaps in the query, sbjct, or total alignment.
|
|
417 : Also can return query gaps and sbjct gaps as a two-element list
|
|
418 : when in array context.
|
|
419 Example : $total_gaps = $hsp->gaps();
|
|
420 : ($qgaps, $sgaps) = $hsp->gaps();
|
|
421 : $qgaps = $hsp->gaps('query');
|
|
422 Returns : scalar context: integer
|
|
423 : array context without args: (int, int) = ('queryGaps', 'sbjctGaps')
|
|
424 Argument : seq_type: 'query' or 'hit' or 'sbjct' or 'total'
|
|
425 : ('sbjct' is synonymous with 'hit')
|
|
426 : (default = 'total', scalar context)
|
|
427 : Array context can be "induced" by providing an argument of 'list' or 'array'.
|
|
428 Throws : n/a
|
|
429
|
|
430 See Also : L<length()|length>, L<matches()|matches>
|
|
431
|
|
432 =cut
|
|
433
|
|
434 #---------
|
|
435 sub gaps {
|
|
436 #---------
|
|
437 my( $self, $seqType ) = @_;
|
|
438
|
|
439 $self->_set_seq_data() unless $self->{'_set_seq_data'};
|
|
440
|
|
441 $seqType ||= (wantarray ? 'list' : 'total');
|
|
442 $seqType = 'sbjct' if $seqType eq 'hit';
|
|
443
|
|
444 if($seqType =~ /list|array/i) {
|
|
445 return (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));
|
|
446 }
|
|
447
|
|
448 if($seqType eq 'total') {
|
|
449 return ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;
|
|
450 } else {
|
|
451 ## Sensitive to member name format.
|
|
452 $seqType = "_\L$seqType\E";
|
|
453 return $self->{$seqType.'Gaps'} || 0;
|
|
454 }
|
|
455 }
|
|
456
|
|
457
|
|
458 =head2 frac_identical
|
|
459
|
|
460 Usage : $hsp_object->frac_identical( [seq_type] );
|
|
461 Purpose : Get the fraction of identical positions within the given HSP.
|
|
462 Example : $frac_iden = $hsp_object->frac_identical('query');
|
|
463 Returns : Float (2-decimal precision, e.g., 0.75).
|
|
464 Argument : seq_type: 'query' or 'hit' or 'sbjct' or 'total'
|
|
465 : ('sbjct' is synonymous with 'hit')
|
|
466 : default = 'total' (but see comments below).
|
|
467 Throws : n/a
|
|
468 Comments : Different versions of Blast report different values for the total
|
|
469 : length of the alignment. This is the number reported in the
|
|
470 : denominators in the stats section:
|
|
471 : "Identical = 34/120 Positives = 67/120".
|
|
472 : NCBI-BLAST uses the total length of the alignment (with gaps)
|
|
473 : WU-BLAST uses the length of the query sequence (without gaps).
|
|
474 : Therefore, when called without an argument or an argument of 'total',
|
|
475 : this method will report different values depending on the
|
|
476 : version of BLAST used.
|
|
477 :
|
|
478 : To get the fraction identical among only the aligned residues,
|
|
479 : ignoring the gaps, call this method with an argument of 'query'
|
|
480 : or 'sbjct' ('sbjct' is synonymous with 'hit').
|
|
481
|
|
482 See Also : L<frac_conserved()|frac_conserved>, L<num_identical()|num_identical>, L<matches()|matches>
|
|
483
|
|
484 =cut
|
|
485
|
|
486 #-------------------
|
|
487 sub frac_identical {
|
|
488 #-------------------
|
|
489 # The value is calculated as opposed to storing it from the parsed results.
|
|
490 # This saves storage and also permits flexibility in determining for which
|
|
491 # sequence (query or sbjct) the figure is to be calculated.
|
|
492
|
|
493 my( $self, $seqType ) = @_;
|
|
494 $seqType ||= 'total';
|
|
495 $seqType = 'sbjct' if $seqType eq 'hit';
|
|
496
|
|
497 if($seqType ne 'total') {
|
|
498 $self->_set_seq_data() unless $self->{'_set_seq_data'};
|
|
499 }
|
|
500 ## Sensitive to member name format.
|
|
501 $seqType = "_\L$seqType\E";
|
|
502
|
|
503 sprintf( "%.2f", $self->{'_numIdentical'}/$self->{$seqType.'Length'});
|
|
504 }
|
|
505
|
|
506
|
|
507 =head2 frac_conserved
|
|
508
|
|
509 Usage : $hsp_object->frac_conserved( [seq_type] );
|
|
510 Purpose : Get the fraction of conserved positions within the given HSP.
|
|
511 : (Note: 'conservative' positions are called 'positives' in the
|
|
512 : Blast report.)
|
|
513 Example : $frac_cons = $hsp_object->frac_conserved('query');
|
|
514 Returns : Float (2-decimal precision, e.g., 0.75).
|
|
515 Argument : seq_type: 'query' or 'hit' or 'sbjct' or 'total'
|
|
516 : ('sbjct' is synonymous with 'hit')
|
|
517 : default = 'total' (but see comments below).
|
|
518 Throws : n/a
|
|
519 Comments : Different versions of Blast report different values for the total
|
|
520 : length of the alignment. This is the number reported in the
|
|
521 : denominators in the stats section:
|
|
522 : "Identical = 34/120 Positives = 67/120".
|
|
523 : NCBI-BLAST uses the total length of the alignment (with gaps)
|
|
524 : WU-BLAST uses the length of the query sequence (without gaps).
|
|
525 : Therefore, when called without an argument or an argument of 'total',
|
|
526 : this method will report different values depending on the
|
|
527 : version of BLAST used.
|
|
528 :
|
|
529 : To get the fraction conserved among only the aligned residues,
|
|
530 : ignoring the gaps, call this method with an argument of 'query'
|
|
531 : or 'sbjct'.
|
|
532
|
|
533 See Also : L<frac_conserved()|frac_conserved>, L<num_conserved()|num_conserved>, L<matches()|matches>
|
|
534
|
|
535 =cut
|
|
536
|
|
537 #--------------------
|
|
538 sub frac_conserved {
|
|
539 #--------------------
|
|
540 # The value is calculated as opposed to storing it from the parsed results.
|
|
541 # This saves storage and also permits flexibility in determining for which
|
|
542 # sequence (query or sbjct) the figure is to be calculated.
|
|
543
|
|
544 my( $self, $seqType ) = @_;
|
|
545 $seqType ||= 'total';
|
|
546 $seqType = 'sbjct' if $seqType eq 'hit';
|
|
547
|
|
548 if($seqType ne 'total') {
|
|
549 $self->_set_seq_data() unless $self->{'_set_seq_data'};
|
|
550 }
|
|
551
|
|
552 ## Sensitive to member name format.
|
|
553 $seqType = "_\L$seqType\E";
|
|
554
|
|
555 sprintf( "%.2f", $self->{'_numConserved'}/$self->{$seqType.'Length'});
|
|
556 }
|
|
557
|
|
558 =head2 query_string
|
|
559
|
|
560 Title : query_string
|
|
561 Usage : my $qseq = $hsp->query_string;
|
|
562 Function: Retrieves the query sequence of this HSP as a string
|
|
563 Returns : string
|
|
564 Args : none
|
|
565
|
|
566
|
|
567 =cut
|
|
568
|
|
569 #----------------
|
|
570 sub query_string{ shift->seq_str('query'); }
|
|
571 #----------------
|
|
572
|
|
573 =head2 hit_string
|
|
574
|
|
575 Title : hit_string
|
|
576 Usage : my $hseq = $hsp->hit_string;
|
|
577 Function: Retrieves the hit sequence of this HSP as a string
|
|
578 Returns : string
|
|
579 Args : none
|
|
580
|
|
581
|
|
582 =cut
|
|
583
|
|
584 #----------------
|
|
585 sub hit_string{ shift->seq_str('hit'); }
|
|
586 #----------------
|
|
587
|
|
588
|
|
589 =head2 homology_string
|
|
590
|
|
591 Title : homology_string
|
|
592 Usage : my $homo_string = $hsp->homology_string;
|
|
593 Function: Retrieves the homology sequence for this HSP as a string.
|
|
594 : The homology sequence is the string of symbols in between the
|
|
595 : query and hit sequences in the alignment indicating the degree
|
|
596 : of conservation (e.g., identical, similar, not similar).
|
|
597 Returns : string
|
|
598 Args : none
|
|
599
|
|
600 =cut
|
|
601
|
|
602 #----------------
|
|
603 sub homology_string{ shift->seq_str('match'); }
|
|
604 #----------------
|
|
605
|
|
606 #=================================================
|
|
607 # End Bio::Search::HSP::HSPI implementation
|
|
608 #=================================================
|
|
609
|
|
610 # Older method delegating to method defined in HSPI.
|
|
611
|
|
612 =head2 expect
|
|
613
|
|
614 See L<Bio::Search::HSP::HSPI::expect()|Bio::Search::HSP::HSPI>
|
|
615
|
|
616 =cut
|
|
617
|
|
618 #----------
|
|
619 sub expect { shift->evalue( @_ ); }
|
|
620 #----------
|
|
621
|
|
622
|
|
623 =head2 rank
|
|
624
|
|
625 Usage : $hsp->rank( [string] );
|
|
626 Purpose : Get the rank of the HSP within a given Blast hit.
|
|
627 Example : $rank = $hsp->rank;
|
|
628 Returns : Integer (1..n) corresponding to the order in which the HSP
|
|
629 appears in the BLAST report.
|
|
630
|
|
631 =cut
|
|
632
|
|
633 #'
|
|
634
|
|
635 #----------
|
|
636 sub rank { shift->{'_rank'} }
|
|
637 #----------
|
|
638
|
|
639 # For backward compatibility
|
|
640 #----------
|
|
641 sub name { shift->rank }
|
|
642 #----------
|
|
643
|
|
644 =head2 to_string
|
|
645
|
|
646 Title : to_string
|
|
647 Usage : print $hsp->to_string;
|
|
648 Function: Returns a string representation for the Blast HSP.
|
|
649 Primarily intended for debugging purposes.
|
|
650 Example : see usage
|
|
651 Returns : A string of the form:
|
|
652 [BlastHSP] <rank>
|
|
653 e.g.:
|
|
654 [BlastHit] 1
|
|
655 Args : None
|
|
656
|
|
657 =cut
|
|
658
|
|
659 #----------
|
|
660 sub to_string {
|
|
661 #----------
|
|
662 my $self = shift;
|
|
663 return "[BlastHSP] " . $self->rank();
|
|
664 }
|
|
665
|
|
666
|
|
667 #=head2 _set_data (Private method)
|
|
668 #
|
|
669 # Usage : called automatically during object construction.
|
|
670 # Purpose : Parses the raw HSP section from a flat BLAST report and
|
|
671 # sets the query sequence, sbjct sequence, and the "match" data
|
|
672 # : which consists of the symbols between the query and sbjct lines
|
|
673 # : in the alignment.
|
|
674 # Argument : Array (all lines for a single, complete HSP, from a raw,
|
|
675 # flat (i.e., non-XML) BLAST report)
|
|
676 # Throws : Propagates any exceptions from the methods called ("See Also")
|
|
677 #
|
|
678 #See Also : L<_set_seq()|_set_seq>, L<_set_score_stats()|_set_score_stats>, L<_set_match_stats()|_set_match_stats>, L<_initialize()|_initialize>
|
|
679 #
|
|
680 #=cut
|
|
681
|
|
682 #--------------
|
|
683 sub _set_data {
|
|
684 #--------------
|
|
685 my $self = shift;
|
|
686 my @data = @_;
|
|
687 my @queryList = (); # 'Query' = SEQUENCE USED TO QUERY THE DATABASE.
|
|
688 my @sbjctList = (); # 'Sbjct' = HOMOLOGOUS SEQUENCE FOUND IN THE DATABASE.
|
|
689 my @matchList = ();
|
|
690 my $matchLine = 0; # Alternating boolean: when true, load 'match' data.
|
|
691 my @linedat = ();
|
|
692
|
|
693 #print STDERR "BlastHSP: set_data()\n";
|
|
694
|
|
695 my($line, $aln_row_len, $length_diff);
|
|
696 $length_diff = 0;
|
|
697
|
|
698 # Collecting data for all lines in the alignment
|
|
699 # and then storing the collections for possible processing later.
|
|
700 #
|
|
701 # Note that "match" lines may not be properly padded with spaces.
|
|
702 # This loop now properly handles such cases:
|
|
703 # Query: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVIXXXXX 1200
|
|
704 # PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVI
|
|
705 # Sbjct: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVILSLKL 1200
|
|
706
|
|
707 foreach $line( @data ) {
|
|
708 next if $line =~ /^\s*$/;
|
|
709
|
|
710 if( $line =~ /^ ?Score/ ) {
|
|
711 $self->_set_score_stats( $line );
|
|
712 } elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) {
|
|
713 $self->_set_match_stats( $line );
|
|
714 } elsif( $line =~ /^ ?Frame = ([\d+-]+)/ ) {
|
|
715 # Version 2.0.8 has Frame information on a separate line.
|
|
716 # Storing frame according to SeqFeature::Generic::frame()
|
|
717 # which does not contain strand info (use strand()).
|
|
718 my $frame = abs($1) - 1;
|
|
719 $self->frame( $frame );
|
|
720 } elsif( $line =~ /^(Query:?[\s\d]+)([^\s\d]+)/ ) {
|
|
721 push @queryList, $line;
|
|
722 $self->{'_match_indent'} = CORE::length $1;
|
|
723 $aln_row_len = (CORE::length $1) + (CORE::length $2);
|
|
724 $matchLine = 1;
|
|
725 } elsif( $matchLine ) {
|
|
726 # Pad the match line with spaces if necessary.
|
|
727 $length_diff = $aln_row_len - CORE::length $line;
|
|
728 $length_diff and $line .= ' 'x $length_diff;
|
|
729 push @matchList, $line;
|
|
730 $matchLine = 0;
|
|
731 } elsif( $line =~ /^Sbjct/ ) {
|
|
732 push @sbjctList, $line;
|
|
733 }
|
|
734 }
|
|
735 # Storing the query and sbjct lists in case they are needed later.
|
|
736 # We could make this conditional to save memory.
|
|
737 $self->{'_queryList'} = \@queryList;
|
|
738 $self->{'_sbjctList'} = \@sbjctList;
|
|
739
|
|
740 # Storing the match list in case it is needed later.
|
|
741 $self->{'_matchList'} = \@matchList;
|
|
742
|
|
743 if(not defined ($self->{'_numIdentical'})) {
|
|
744 my $id_str = $self->_id_str;
|
|
745 $self->throw( -text => "Can't parse match statistics. Possibly a new or unrecognized Blast format. ($id_str)");
|
|
746 }
|
|
747
|
|
748 if(!scalar @queryList or !scalar @sbjctList) {
|
|
749 my $id_str = $self->_id_str;
|
|
750 $self->throw( "Can't find query or sbjct alignment lines. Possibly unrecognized Blast format. ($id_str)");
|
|
751 }
|
|
752 }
|
|
753
|
|
754
|
|
755 #=head2 _set_score_stats (Private method)
|
|
756 #
|
|
757 # Usage : called automatically by _set_data()
|
|
758 # Purpose : Sets various score statistics obtained from the HSP listing.
|
|
759 # Argument : String with any of the following formats:
|
|
760 # : blast2: Score = 30.1 bits (66), Expect = 9.2
|
|
761 # : blast2: Score = 158.2 bits (544), Expect(2) = e-110
|
|
762 # : blast1: Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40
|
|
763 # : blast1: Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99
|
|
764 # Throws : Exception if the stats cannot be parsed, probably due to a change
|
|
765 # : in the Blast report format.
|
|
766 #
|
|
767 #See Also : L<_set_data()|_set_data>
|
|
768 #
|
|
769 #=cut
|
|
770
|
|
771 #--------------------
|
|
772 sub _set_score_stats {
|
|
773 #--------------------
|
|
774 my ($self, $data) = @_;
|
|
775
|
|
776 my ($expect, $p);
|
|
777
|
|
778 if($data =~ /Score = +([\d.e+-]+) bits \(([\d.e+-]+)\), +Expect = +([\d.e+-]+)/) {
|
|
779 # blast2 format n = 1
|
|
780 $self->bits($1);
|
|
781 $self->score($2);
|
|
782 $expect = $3;
|
|
783 } elsif($data =~ /Score = +([\d.e+-]+) bits \(([\d.e+-]+)\), +Expect\((\d+)\) = +([\d.e+-]+)/) {
|
|
784 # blast2 format n > 1
|
|
785 $self->bits($1);
|
|
786 $self->score($2);
|
|
787 $self->{'_n'} = $3;
|
|
788 $expect = $4;
|
|
789
|
|
790 } elsif($data =~ /Score = +([\d.e+-]+) \(([\d.e+-]+) bits\), +Expect = +([\d.e+-]+), P = +([\d.e-]+)/) {
|
|
791 # blast1 format, n = 1
|
|
792 $self->score($1);
|
|
793 $self->bits($2);
|
|
794 $expect = $3;
|
|
795 $p = $4;
|
|
796
|
|
797 } elsif($data =~ /Score = +([\d.e+-]+) \(([\d.e+-]+) bits\), +Expect = +([\d.e+-]+), +Sum P\((\d+)\) = +([\d.e-]+)/) {
|
|
798 # blast1 format, n > 1
|
|
799 $self->score($1);
|
|
800 $self->bits($2);
|
|
801 $expect = $3;
|
|
802 $self->{'_n'} = $4;
|
|
803 $p = $5;
|
|
804
|
|
805 } else {
|
|
806 my $id_str = $self->_id_str;
|
|
807 $self->throw(-class => 'Bio::Root::Exception',
|
|
808 -text => "Can't parse score statistics: unrecognized format. ($id_str)",
|
|
809 -value => $data);
|
|
810 }
|
|
811 $expect = "1$expect" if $expect =~ /^e/i;
|
|
812 $p = "1$p" if defined $p and $p=~ /^e/i;
|
|
813
|
|
814 $self->{'_expect'} = $expect;
|
|
815 $self->{'_p'} = $p || undef;
|
|
816 $self->significance( $p || $expect );
|
|
817 }
|
|
818
|
|
819
|
|
820 #=head2 _set_match_stats (Private method)
|
|
821 #
|
|
822 # Usage : Private method; called automatically by _set_data()
|
|
823 # Purpose : Sets various matching statistics obtained from the HSP listing.
|
|
824 # Argument : blast2: Identities = 23/74 (31%), Positives = 29/74 (39%), Gaps = 17/74 (22%)
|
|
825 # : blast2: Identities = 57/98 (58%), Positives = 74/98 (75%)
|
|
826 # : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%)
|
|
827 # : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%), Frame = -3
|
|
828 # : WU-blast: Identities = 310/553 (56%), Positives = 310/553 (56%), Strand = Minus / Plus
|
|
829 # Throws : Exception if the stats cannot be parsed, probably due to a change
|
|
830 # : in the Blast report format.
|
|
831 # Comments : The "Gaps = " data in the HSP header has a different meaning depending
|
|
832 # : on the type of Blast: for BLASTP, this number is the total number of
|
|
833 # : gaps in query+sbjct; for TBLASTN, it is the number of gaps in the
|
|
834 # : query sequence only. Thus, it is safer to collect the data
|
|
835 # : separately by examining the actual sequence strings as is done
|
|
836 # : in _set_seq().
|
|
837 #
|
|
838 #See Also : L<_set_data()|_set_data>, L<_set_seq()|_set_seq>
|
|
839 #
|
|
840 #=cut
|
|
841
|
|
842 #--------------------
|
|
843 sub _set_match_stats {
|
|
844 #--------------------
|
|
845 my ($self, $data) = @_;
|
|
846
|
|
847 if($data =~ m!Identities = (\d+)/(\d+)!) {
|
|
848 # blast1 or 2 format
|
|
849 $self->{'_numIdentical'} = $1;
|
|
850 $self->{'_totalLength'} = $2;
|
|
851 }
|
|
852
|
|
853 if($data =~ m!Positives = (\d+)/(\d+)!) {
|
|
854 # blast1 or 2 format
|
|
855 $self->{'_numConserved'} = $1;
|
|
856 $self->{'_totalLength'} = $2;
|
|
857 }
|
|
858
|
|
859 if($data =~ m!Frame = ([\d+-]+)!) {
|
|
860 $self->frame($1);
|
|
861 }
|
|
862
|
|
863 # Strand data is not always present in this line.
|
|
864 # _set_seq() will also set strand information.
|
|
865 if($data =~ m!Strand = (\w+) / (\w+)!) {
|
|
866 $self->{'_queryStrand'} = $1;
|
|
867 $self->{'_sbjctStrand'} = $2;
|
|
868 }
|
|
869
|
|
870 # if($data =~ m!Gaps = (\d+)/(\d+)!) {
|
|
871 # $self->{'_totalGaps'} = $1;
|
|
872 # } else {
|
|
873 # $self->{'_totalGaps'} = 0;
|
|
874 # }
|
|
875 }
|
|
876
|
|
877
|
|
878
|
|
879 #=head2 _set_seq_data (Private method)
|
|
880 #
|
|
881 # Usage : called automatically when sequence data is requested.
|
|
882 # Purpose : Sets the HSP sequence data for both query and sbjct sequences.
|
|
883 # : Includes: start, stop, length, gaps, and raw sequence.
|
|
884 # Argument : n/a
|
|
885 # Throws : Propagates any exception thrown by _set_match_seq()
|
|
886 # Comments : Uses raw data stored by _set_data() during object construction.
|
|
887 # : These data are not always needed, so it is conditionally
|
|
888 # : executed only upon demand by methods such as gaps(), _set_residues(),
|
|
889 # : etc. _set_seq() does the dirty work.
|
|
890 #
|
|
891 #See Also : L<_set_seq()|_set_seq>
|
|
892 #
|
|
893 #=cut
|
|
894
|
|
895 #-----------------
|
|
896 sub _set_seq_data {
|
|
897 #-----------------
|
|
898 my $self = shift;
|
|
899
|
|
900 $self->_set_seq('query', @{$self->{'_queryList'}});
|
|
901 $self->_set_seq('sbjct', @{$self->{'_sbjctList'}});
|
|
902
|
|
903 # Liberate some memory.
|
|
904 @{$self->{'_queryList'}} = @{$self->{'_sbjctList'}} = ();
|
|
905 undef $self->{'_queryList'};
|
|
906 undef $self->{'_sbjctList'};
|
|
907
|
|
908 $self->{'_set_seq_data'} = 1;
|
|
909 }
|
|
910
|
|
911
|
|
912
|
|
913 #=head2 _set_seq (Private method)
|
|
914 #
|
|
915 # Usage : called automatically by _set_seq_data()
|
|
916 # : $hsp_obj->($seq_type, @data);
|
|
917 # Purpose : Sets sequence information for both the query and sbjct sequences.
|
|
918 # : Directly counts the number of gaps in each sequence (if gapped Blast).
|
|
919 # Argument : $seq_type = 'query' or 'sbjct'
|
|
920 # : @data = all seq lines with the form:
|
|
921 # : Query: 61 SPHNVKDRKEQNGSINNAISPTATANTSGSQQINIDSALRDRSSNVAAQPSLSDASSGSN 120
|
|
922 # Throws : Exception if data strings cannot be parsed, probably due to a change
|
|
923 # : in the Blast report format.
|
|
924 # Comments : Uses first argument to determine which data members to set
|
|
925 # : making this method sensitive data member name changes.
|
|
926 # : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).
|
|
927 # Warning : Sequence endpoints are normalized so that start < end. This affects HSPs
|
|
928 # : for TBLASTN/X hits on the minus strand. Normalization facilitates use
|
|
929 # : of range information by methods such as match().
|
|
930 #
|
|
931 #See Also : L<_set_seq_data()|_set_seq_data>, L<matches()|matches>, L<range()|range>, L<start()|start>, L<end()|end>
|
|
932 #
|
|
933 #=cut
|
|
934
|
|
935 #-------------
|
|
936 sub _set_seq {
|
|
937 #-------------
|
|
938 my $self = shift;
|
|
939 my $seqType = shift;
|
|
940 my @data = @_;
|
|
941 my @ranges = ();
|
|
942 my @sequence = ();
|
|
943 my $numGaps = 0;
|
|
944
|
|
945 foreach( @data ) {
|
|
946 if( m/(\d+) *([^\d\s]+) *(\d+)/) {
|
|
947 push @ranges, ( $1, $3 ) ;
|
|
948 push @sequence, $2;
|
|
949 #print STDERR "_set_seq found sequence \"$2\"\n";
|
|
950 } else {
|
|
951 $self->warn("Bad sequence data: $_");
|
|
952 }
|
|
953 }
|
|
954
|
|
955 if( !(scalar(@sequence) and scalar(@ranges))) {
|
|
956 my $id_str = $self->_id_str;
|
|
957 $self->throw("Can't set sequence: missing data. Possibly unrecognized Blast format. ($id_str)");
|
|
958 }
|
|
959
|
|
960 # Sensitive to member name changes.
|
|
961 $seqType = "_\L$seqType\E";
|
|
962 $self->{$seqType.'Start'} = $ranges[0];
|
|
963 $self->{$seqType.'Stop'} = $ranges[ $#ranges ];
|
|
964 $self->{$seqType.'Seq'} = \@sequence;
|
|
965
|
|
966 $self->{$seqType.'Length'} = abs($ranges[ $#ranges ] - $ranges[0]) + 1;
|
|
967
|
|
968 # Adjust lengths for BLASTX, TBLASTN, TBLASTX sequences
|
|
969 # Converting nucl coords to amino acid coords.
|
|
970
|
|
971 my $prog = $self->algorithm;
|
|
972 if($prog eq 'TBLASTN' and $seqType eq '_sbjct') {
|
|
973 $self->{$seqType.'Length'} /= 3;
|
|
974 } elsif($prog eq 'BLASTX' and $seqType eq '_query') {
|
|
975 $self->{$seqType.'Length'} /= 3;
|
|
976 } elsif($prog eq 'TBLASTX') {
|
|
977 $self->{$seqType.'Length'} /= 3;
|
|
978 }
|
|
979
|
|
980 if( $prog ne 'BLASTP' ) {
|
|
981 $self->{$seqType.'Strand'} = 'Plus' if $prog =~ /BLASTN/;
|
|
982 $self->{$seqType.'Strand'} = 'Plus' if ($prog =~ /BLASTX/ and $seqType eq '_query');
|
|
983 # Normalize sequence endpoints so that start < end.
|
|
984 # Reverse complement or 'minus strand' HSPs get flipped here.
|
|
985 if($self->{$seqType.'Start'} > $self->{$seqType.'Stop'}) {
|
|
986 ($self->{$seqType.'Start'}, $self->{$seqType.'Stop'}) =
|
|
987 ($self->{$seqType.'Stop'}, $self->{$seqType.'Start'});
|
|
988 $self->{$seqType.'Strand'} = 'Minus';
|
|
989 }
|
|
990 }
|
|
991
|
|
992 ## Count number of gaps in each seq. Only need to do this for gapped Blasts.
|
|
993 # if($self->{'_gapped'}) {
|
|
994 my $seqstr = join('', @sequence);
|
|
995 $seqstr =~ s/\s//g;
|
|
996 my $num_gaps = CORE::length($seqstr) - $self->{$seqType.'Length'};
|
|
997 $self->{$seqType.'Gaps'} = $num_gaps if $num_gaps > 0;
|
|
998 # }
|
|
999 }
|
|
1000
|
|
1001
|
|
1002 #=head2 _set_residues (Private method)
|
|
1003 #
|
|
1004 # Usage : called automatically when residue data is requested.
|
|
1005 # Purpose : Sets the residue numbers representing the identical and
|
|
1006 # : conserved positions. These data are obtained by analyzing the
|
|
1007 # : symbols between query and sbjct lines of the alignments.
|
|
1008 # Argument : n/a
|
|
1009 # Throws : Propagates any exception thrown by _set_seq_data() and _set_match_seq().
|
|
1010 # Comments : These data are not always needed, so it is conditionally
|
|
1011 # : executed only upon demand by methods such as seq_inds().
|
|
1012 # : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).
|
|
1013 #
|
|
1014 #See Also : L<_set_seq_data()|_set_seq_data>, L<_set_match_seq()|_set_match_seq>, seq_inds()
|
|
1015 #
|
|
1016 #=cut
|
|
1017
|
|
1018 #------------------
|
|
1019 sub _set_residues {
|
|
1020 #------------------
|
|
1021 my $self = shift;
|
|
1022 my @sequence = ();
|
|
1023
|
|
1024 $self->_set_seq_data() unless $self->{'_set_seq_data'};
|
|
1025
|
|
1026 # Using hashes to avoid saving duplicate residue numbers.
|
|
1027 my %identicalList_query = ();
|
|
1028 my %identicalList_sbjct = ();
|
|
1029 my %conservedList_query = ();
|
|
1030 my %conservedList_sbjct = ();
|
|
1031
|
|
1032 my $aref = $self->_set_match_seq() if not ref $self->{'_matchSeq'};
|
|
1033 $aref ||= $self->{'_matchSeq'};
|
|
1034 my $seqString = join('', @$aref );
|
|
1035
|
|
1036 my $qseq = join('',@{$self->{'_querySeq'}});
|
|
1037 my $sseq = join('',@{$self->{'_sbjctSeq'}});
|
|
1038 my $resCount_query = $self->{'_queryStop'} || 0;
|
|
1039 my $resCount_sbjct = $self->{'_sbjctStop'} || 0;
|
|
1040
|
|
1041 my $prog = $self->algorithm;
|
|
1042 if($prog !~ /^BLASTP|^BLASTN/) {
|
|
1043 if($prog eq 'TBLASTN') {
|
|
1044 $resCount_sbjct /= 3;
|
|
1045 } elsif($prog eq 'BLASTX') {
|
|
1046 $resCount_query /= 3;
|
|
1047 } elsif($prog eq 'TBLASTX') {
|
|
1048 $resCount_query /= 3;
|
|
1049 $resCount_sbjct /= 3;
|
|
1050 }
|
|
1051 }
|
|
1052
|
|
1053 my ($mchar, $schar, $qchar);
|
|
1054 while( $mchar = chop($seqString) ) {
|
|
1055 ($qchar, $schar) = (chop($qseq), chop($sseq));
|
|
1056 if( $mchar eq '+' ) {
|
|
1057 $conservedList_query{ $resCount_query } = 1;
|
|
1058 $conservedList_sbjct{ $resCount_sbjct } = 1;
|
|
1059 } elsif( $mchar ne ' ' ) {
|
|
1060 $identicalList_query{ $resCount_query } = 1;
|
|
1061 $identicalList_sbjct{ $resCount_sbjct } = 1;
|
|
1062 }
|
|
1063 $resCount_query-- if $qchar ne $GAP_SYMBOL;
|
|
1064 $resCount_sbjct-- if $schar ne $GAP_SYMBOL;
|
|
1065 }
|
|
1066 $self->{'_identicalRes_query'} = \%identicalList_query;
|
|
1067 $self->{'_conservedRes_query'} = \%conservedList_query;
|
|
1068 $self->{'_identicalRes_sbjct'} = \%identicalList_sbjct;
|
|
1069 $self->{'_conservedRes_sbjct'} = \%conservedList_sbjct;
|
|
1070
|
|
1071 }
|
|
1072
|
|
1073
|
|
1074
|
|
1075
|
|
1076 #=head2 _set_match_seq (Private method)
|
|
1077 #
|
|
1078 # Usage : $hsp_obj->_set_match_seq()
|
|
1079 # Purpose : Set the 'match' sequence for the current HSP (symbols in between
|
|
1080 # : the query and sbjct lines.)
|
|
1081 # Returns : Array reference holding the match sequences lines.
|
|
1082 # Argument : n/a
|
|
1083 # Throws : Exception if the _matchList field is not set.
|
|
1084 # Comments : The match information is not always necessary. This method
|
|
1085 # : allows it to be conditionally prepared.
|
|
1086 # : Called by _set_residues>() and seq_str().
|
|
1087 #
|
|
1088 #See Also : L<_set_residues()|_set_residues>, L<seq_str()|seq_str>
|
|
1089 #
|
|
1090 #=cut
|
|
1091
|
|
1092 #-------------------
|
|
1093 sub _set_match_seq {
|
|
1094 #-------------------
|
|
1095 my $self = shift;
|
|
1096
|
|
1097 if( ! ref($self->{'_matchList'}) ) {
|
|
1098 my $id_str = $self->_id_str;
|
|
1099 $self->throw("Can't set HSP match sequence: No data ($id_str)");
|
|
1100 }
|
|
1101
|
|
1102 my @data = @{$self->{'_matchList'}};
|
|
1103
|
|
1104 my(@sequence);
|
|
1105 foreach( @data ) {
|
|
1106 chomp($_);
|
|
1107 ## Remove leading spaces; (note: aln may begin with a space
|
|
1108 ## which is why we can't use s/^ +//).
|
|
1109 s/^ {$self->{'_match_indent'}}//;
|
|
1110 push @sequence, $_;
|
|
1111 }
|
|
1112 # Liberate some memory.
|
|
1113 @{$self->{'_matchList'}} = undef;
|
|
1114 $self->{'_matchList'} = undef;
|
|
1115
|
|
1116 $self->{'_matchSeq'} = \@sequence;
|
|
1117
|
|
1118 return $self->{'_matchSeq'};
|
|
1119 }
|
|
1120
|
|
1121
|
|
1122 =head2 n
|
|
1123
|
|
1124 Usage : $hsp_obj->n()
|
|
1125 Purpose : Get the N value (num HSPs on which P/Expect is based).
|
|
1126 : This value is not defined with NCBI Blast2 with gapping.
|
|
1127 Returns : Integer or null string if not defined.
|
|
1128 Argument : n/a
|
|
1129 Throws : n/a
|
|
1130 Comments : The 'N' value is listed in parenthesis with P/Expect value:
|
|
1131 : e.g., P(3) = 1.2e-30 ---> (N = 3).
|
|
1132 : Not defined in NCBI Blast2 with gaps.
|
|
1133 : This typically is equal to the number of HSPs but not always.
|
|
1134 : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().
|
|
1135
|
|
1136 See Also : L<Bio::SeqFeature::SimilarityPair::score()|Bio::SeqFeature::SimilarityPair>
|
|
1137
|
|
1138 =cut
|
|
1139
|
|
1140 #-----
|
|
1141 sub n { my $self = shift; $self->{'_n'} || ''; }
|
|
1142 #-----
|
|
1143
|
|
1144
|
|
1145 =head2 matches
|
|
1146
|
|
1147 Usage : $hsp->matches([seq_type], [start], [stop]);
|
|
1148 Purpose : Get the total number of identical and conservative matches
|
|
1149 : in the query or sbjct sequence for the given HSP. Optionally can
|
|
1150 : report data within a defined interval along the seq.
|
|
1151 : (Note: 'conservative' matches are called 'positives' in the
|
|
1152 : Blast report.)
|
|
1153 Example : ($id,$cons) = $hsp_object->matches('hit');
|
|
1154 : ($id,$cons) = $hsp_object->matches('query',300,400);
|
|
1155 Returns : 2-element array of integers
|
|
1156 Argument : (1) seq_type = 'query' or 'hit' or 'sbjct' (default = query)
|
|
1157 : ('sbjct' is synonymous with 'hit')
|
|
1158 : (2) start = Starting coordinate (optional)
|
|
1159 : (3) stop = Ending coordinate (optional)
|
|
1160 Throws : Exception if the supplied coordinates are out of range.
|
|
1161 Comments : Relies on seq_str('match') to get the string of alignment symbols
|
|
1162 : between the query and sbjct lines which are used for determining
|
|
1163 : the number of identical and conservative matches.
|
|
1164
|
|
1165 See Also : L<length()|length>, L<gaps()|gaps>, L<seq_str()|seq_str>, L<Bio::Search::Hit::BlastHit::_adjust_contigs()|Bio::Search::Hit::BlastHit>
|
|
1166
|
|
1167 =cut
|
|
1168
|
|
1169 #-----------
|
|
1170 sub matches {
|
|
1171 #-----------
|
|
1172 my( $self, %param ) = @_;
|
|
1173 my(@data);
|
|
1174 my($seqType, $beg, $end) = ($param{-SEQ}, $param{-START}, $param{-STOP});
|
|
1175 $seqType ||= 'query';
|
|
1176 $seqType = 'sbjct' if $seqType eq 'hit';
|
|
1177
|
|
1178 my($start,$stop);
|
|
1179
|
|
1180 if(!defined $beg && !defined $end) {
|
|
1181 ## Get data for the whole alignment.
|
|
1182 push @data, ($self->{'_numIdentical'}, $self->{'_numConserved'});
|
|
1183 } else {
|
|
1184 ## Get the substring representing the desired sub-section of aln.
|
|
1185 $beg ||= 0;
|
|
1186 $end ||= 0;
|
|
1187 ($start,$stop) = $self->range($seqType);
|
|
1188 if($beg == 0) { $beg = $start; $end = $beg+$end; }
|
|
1189 elsif($end == 0) { $end = $stop; $beg = $end-$beg; }
|
|
1190
|
|
1191 if($end >= $stop) { $end = $stop; } ##ML changed from if (end >stop)
|
|
1192 else { $end += 1;} ##ML moved from commented position below, makes
|
|
1193 ##more sense here
|
|
1194 # if($end > $stop) { $end = $stop; }
|
|
1195 if($beg < $start) { $beg = $start; }
|
|
1196 # else { $end += 1;}
|
|
1197
|
|
1198 # my $seq = substr($self->seq_str('match'), $beg-$start, ($end-$beg));
|
|
1199
|
|
1200 ## ML: START fix for substr out of range error ------------------
|
|
1201 my $seq = "";
|
|
1202 my $prog = $self->algorithm;
|
|
1203 if (($prog eq 'TBLASTN') and ($seqType eq 'sbjct'))
|
|
1204 {
|
|
1205 $seq = substr($self->seq_str('match'),
|
|
1206 int(($beg-$start)/3), int(($end-$beg+1)/3));
|
|
1207
|
|
1208 } elsif (($prog eq 'BLASTX') and ($seqType eq 'query'))
|
|
1209 {
|
|
1210 $seq = substr($self->seq_str('match'),
|
|
1211 int(($beg-$start)/3), int(($end-$beg+1)/3));
|
|
1212 } else {
|
|
1213 $seq = substr($self->seq_str('match'),
|
|
1214 $beg-$start, ($end-$beg));
|
|
1215 }
|
|
1216 ## ML: End of fix for substr out of range error -----------------
|
|
1217
|
|
1218
|
|
1219 ## ML: debugging code
|
|
1220 ## This is where we get our exception. Try printing out the values going
|
|
1221 ## into this:
|
|
1222 ##
|
|
1223 # print STDERR
|
|
1224 # qq(*------------MY EXCEPTION --------------------\nSeq: ") ,
|
|
1225 # $self->seq_str("$seqType"), qq("\n),$self->rank,",( index:";
|
|
1226 # print STDERR $beg-$start, ", len: ", $end-$beg," ), (HSPRealLen:",
|
|
1227 # CORE::length $self->seq_str("$seqType");
|
|
1228 # print STDERR ", HSPCalcLen: ", $stop - $start +1 ," ),
|
|
1229 # ( beg: $beg, end: $end ), ( start: $start, stop: stop )\n";
|
|
1230 ## ML: END DEBUGGING CODE----------
|
|
1231
|
|
1232 if(!CORE::length $seq) {
|
|
1233 my $id_str = $self->_id_str;
|
|
1234 $self->throw("Undefined $seqType sub-sequence ($beg,$end). Valid range = $start - $stop ($id_str)");
|
|
1235 }
|
|
1236 ## Get data for a substring.
|
|
1237 # printf "Collecting HSP subsection data: beg,end = %d,%d; start,stop = %d,%d\n%s<---\n", $beg, $end, $start, $stop, $seq;
|
|
1238 # printf "Original match seq:\n%s\n",$self->seq_str('match');
|
|
1239 $seq =~ s/ //g; # remove space (no info).
|
|
1240 my $len_cons = CORE::length $seq;
|
|
1241 $seq =~ s/\+//g; # remove '+' characters (conservative substitutions)
|
|
1242 my $len_id = CORE::length $seq;
|
|
1243 push @data, ($len_id, $len_cons);
|
|
1244 # printf " HSP = %s\n id = %d; cons = %d\n", $self->rank, $len_id, $len_cons; <STDIN>;
|
|
1245 }
|
|
1246 @data;
|
|
1247 }
|
|
1248
|
|
1249
|
|
1250 =head2 num_identical
|
|
1251
|
|
1252 Usage : $hsp_object->num_identical();
|
|
1253 Purpose : Get the number of identical positions within the given HSP.
|
|
1254 Example : $num_iden = $hsp_object->num_identical();
|
|
1255 Returns : integer
|
|
1256 Argument : n/a
|
|
1257 Throws : n/a
|
|
1258
|
|
1259 See Also : L<num_conserved()|num_conserved>, L<frac_identical()|frac_identical>
|
|
1260
|
|
1261 =cut
|
|
1262
|
|
1263 #-------------------
|
|
1264 sub num_identical {
|
|
1265 #-------------------
|
|
1266 my( $self) = shift;
|
|
1267
|
|
1268 $self->{'_numIdentical'};
|
|
1269 }
|
|
1270
|
|
1271
|
|
1272 =head2 num_conserved
|
|
1273
|
|
1274 Usage : $hsp_object->num_conserved();
|
|
1275 Purpose : Get the number of conserved positions within the given HSP.
|
|
1276 Example : $num_iden = $hsp_object->num_conserved();
|
|
1277 Returns : integer
|
|
1278 Argument : n/a
|
|
1279 Throws : n/a
|
|
1280
|
|
1281 See Also : L<num_identical()|num_identical>, L<frac_conserved()|frac_conserved>
|
|
1282
|
|
1283 =cut
|
|
1284
|
|
1285 #-------------------
|
|
1286 sub num_conserved {
|
|
1287 #-------------------
|
|
1288 my( $self) = shift;
|
|
1289
|
|
1290 $self->{'_numConserved'};
|
|
1291 }
|
|
1292
|
|
1293
|
|
1294
|
|
1295 =head2 range
|
|
1296
|
|
1297 Usage : $hsp->range( [seq_type] );
|
|
1298 Purpose : Gets the (start, end) coordinates for the query or sbjct sequence
|
|
1299 : in the HSP alignment.
|
|
1300 Example : ($query_beg, $query_end) = $hsp->range('query');
|
|
1301 : ($hit_beg, $hit_end) = $hsp->range('hit');
|
|
1302 Returns : Two-element array of integers
|
|
1303 Argument : seq_type = string, 'query' or 'hit' or 'sbjct' (default = 'query')
|
|
1304 : ('sbjct' is synonymous with 'hit')
|
|
1305 Throws : n/a
|
|
1306
|
|
1307 See Also : L<start()|start>, L<end()|end>
|
|
1308
|
|
1309 =cut
|
|
1310
|
|
1311 #----------
|
|
1312 sub range {
|
|
1313 #----------
|
|
1314 my ($self, $seqType) = @_;
|
|
1315
|
|
1316 $self->_set_seq_data() unless $self->{'_set_seq_data'};
|
|
1317
|
|
1318 $seqType ||= 'query';
|
|
1319 $seqType = 'sbjct' if $seqType eq 'hit';
|
|
1320
|
|
1321 ## Sensitive to member name changes.
|
|
1322 $seqType = "_\L$seqType\E";
|
|
1323
|
|
1324 return ($self->{$seqType.'Start'},$self->{$seqType.'Stop'});
|
|
1325 }
|
|
1326
|
|
1327 =head2 start
|
|
1328
|
|
1329 Usage : $hsp->start( [seq_type] );
|
|
1330 Purpose : Gets the start coordinate for the query, sbjct, or both sequences
|
|
1331 : in the HSP alignment.
|
|
1332 : NOTE: Start will always be less than end.
|
|
1333 : To determine strand, use $hsp->strand()
|
|
1334 Example : $query_beg = $hsp->start('query');
|
|
1335 : $hit_beg = $hsp->start('hit');
|
|
1336 : ($query_beg, $hit_beg) = $hsp->start();
|
|
1337 Returns : scalar context: integer
|
|
1338 : array context without args: list of two integers
|
|
1339 Argument : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')
|
|
1340 : ('sbjct' is synonymous with 'hit')
|
|
1341 : Array context can be "induced" by providing an argument of 'list' or 'array'.
|
|
1342 Throws : n/a
|
|
1343
|
|
1344 See Also : L<end()|end>, L<range()|range>
|
|
1345
|
|
1346 =cut
|
|
1347
|
|
1348 #----------
|
|
1349 sub start {
|
|
1350 #----------
|
|
1351 my ($self, $seqType) = @_;
|
|
1352
|
|
1353 $seqType ||= (wantarray ? 'list' : 'query');
|
|
1354 $seqType = 'sbjct' if $seqType eq 'hit';
|
|
1355
|
|
1356 $self->_set_seq_data() unless $self->{'_set_seq_data'};
|
|
1357
|
|
1358 if($seqType =~ /list|array/i) {
|
|
1359 return ($self->{'_queryStart'}, $self->{'_sbjctStart'});
|
|
1360 } else {
|
|
1361 ## Sensitive to member name changes.
|
|
1362 $seqType = "_\L$seqType\E";
|
|
1363 return $self->{$seqType.'Start'};
|
|
1364 }
|
|
1365 }
|
|
1366
|
|
1367 =head2 end
|
|
1368
|
|
1369 Usage : $hsp->end( [seq_type] );
|
|
1370 Purpose : Gets the end coordinate for the query, sbjct, or both sequences
|
|
1371 : in the HSP alignment.
|
|
1372 : NOTE: Start will always be less than end.
|
|
1373 : To determine strand, use $hsp->strand()
|
|
1374 Example : $query_end = $hsp->end('query');
|
|
1375 : $hit_end = $hsp->end('hit');
|
|
1376 : ($query_end, $hit_end) = $hsp->end();
|
|
1377 Returns : scalar context: integer
|
|
1378 : array context without args: list of two integers
|
|
1379 Argument : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')
|
|
1380 : ('sbjct' is synonymous with 'hit')
|
|
1381 : Array context can be "induced" by providing an argument of 'list' or 'array'.
|
|
1382 Throws : n/a
|
|
1383
|
|
1384 See Also : L<start()|start>, L<range()|range>, L<strand()|strand>
|
|
1385
|
|
1386 =cut
|
|
1387
|
|
1388 #----------
|
|
1389 sub end {
|
|
1390 #----------
|
|
1391 my ($self, $seqType) = @_;
|
|
1392
|
|
1393 $seqType ||= (wantarray ? 'list' : 'query');
|
|
1394 $seqType = 'sbjct' if $seqType eq 'hit';
|
|
1395
|
|
1396 $self->_set_seq_data() unless $self->{'_set_seq_data'};
|
|
1397
|
|
1398 if($seqType =~ /list|array/i) {
|
|
1399 return ($self->{'_queryStop'}, $self->{'_sbjctStop'});
|
|
1400 } else {
|
|
1401 ## Sensitive to member name changes.
|
|
1402 $seqType = "_\L$seqType\E";
|
|
1403 return $self->{$seqType.'Stop'};
|
|
1404 }
|
|
1405 }
|
|
1406
|
|
1407
|
|
1408
|
|
1409 =head2 strand
|
|
1410
|
|
1411 Usage : $hsp_object->strand( [seq_type] )
|
|
1412 Purpose : Get the strand of the query or sbjct sequence.
|
|
1413 Example : print $hsp->strand('query');
|
|
1414 : ($query_strand, $hit_strand) = $hsp->strand();
|
|
1415 Returns : -1, 0, or 1
|
|
1416 : -1 = Minus strand, +1 = Plus strand
|
|
1417 : Returns 0 if strand is not defined, which occurs
|
|
1418 : for BLASTP reports, and the query of TBLASTN
|
|
1419 : as well as the hit if BLASTX reports.
|
|
1420 : In scalar context without arguments, returns queryStrand value.
|
|
1421 : In array context without arguments, returns a two-element list
|
|
1422 : of strings (queryStrand, sbjctStrand).
|
|
1423 : Array context can be "induced" by providing an argument of 'list' or 'array'.
|
|
1424 Argument : seq_type: 'query' or 'hit' or 'sbjct' or undef
|
|
1425 : ('sbjct' is synonymous with 'hit')
|
|
1426 Throws : n/a
|
|
1427
|
|
1428 See Also : B<_set_seq()>, B<_set_match_stats()>
|
|
1429
|
|
1430 =cut
|
|
1431
|
|
1432 #-----------
|
|
1433 sub strand {
|
|
1434 #-----------
|
|
1435 my( $self, $seqType ) = @_;
|
|
1436
|
|
1437 # Hack to deal with the fact that SimilarityPair calls strand()
|
|
1438 # which will lead to an error because parsing hasn't yet occurred.
|
|
1439 # See SimilarityPair::new().
|
|
1440 return if $self->{'_initializing'};
|
|
1441
|
|
1442 $seqType ||= (wantarray ? 'list' : 'query');
|
|
1443 $seqType = 'sbjct' if $seqType eq 'hit';
|
|
1444
|
|
1445 ## Sensitive to member name format.
|
|
1446 $seqType = "_\L$seqType\E";
|
|
1447
|
|
1448 # $seqType could be '_list'.
|
|
1449 $self->{'_queryStrand'} or $self->_set_seq_data() unless $self->{'_set_seq_data'};
|
|
1450
|
|
1451 my $prog = $self->algorithm;
|
|
1452
|
|
1453 if($seqType =~ /list|array/i) {
|
|
1454 my ($qstr, $hstr);
|
|
1455 if( $prog eq 'BLASTP') {
|
|
1456 $qstr = 0;
|
|
1457 $hstr = 0;
|
|
1458 }
|
|
1459 elsif( $prog eq 'TBLASTN') {
|
|
1460 $qstr = 0;
|
|
1461 $hstr = $STRAND_SYMBOL{$self->{'_sbjctStrand'}};
|
|
1462 }
|
|
1463 elsif( $prog eq 'BLASTX') {
|
|
1464 $qstr = $STRAND_SYMBOL{$self->{'_queryStrand'}};
|
|
1465 $hstr = 0;
|
|
1466 }
|
|
1467 else {
|
|
1468 $qstr = $STRAND_SYMBOL{$self->{'_queryStrand'}} if defined $self->{'_queryStrand'};
|
|
1469 $hstr = $STRAND_SYMBOL{$self->{'_sbjctStrand'}} if defined $self->{'_sbjctStrand'};
|
|
1470 }
|
|
1471 $qstr ||= 0;
|
|
1472 $hstr ||= 0;
|
|
1473 return ($qstr, $hstr);
|
|
1474 }
|
|
1475 local $^W = 0;
|
|
1476 $STRAND_SYMBOL{$self->{$seqType.'Strand'}} || 0;
|
|
1477 }
|
|
1478
|
|
1479
|
|
1480 =head2 seq
|
|
1481
|
|
1482 Usage : $hsp->seq( [seq_type] );
|
|
1483 Purpose : Get the query or sbjct sequence as a Bio::Seq.pm object.
|
|
1484 Example : $seqObj = $hsp->seq('query');
|
|
1485 Returns : Object reference for a Bio::Seq.pm object.
|
|
1486 Argument : seq_type = 'query' or 'hit' or 'sbjct' (default = 'query').
|
|
1487 : ('sbjct' is synonymous with 'hit')
|
|
1488 Throws : Propagates any exception that occurs during construction
|
|
1489 : of the Bio::Seq.pm object.
|
|
1490 Comments : The sequence is returned in an array of strings corresponding
|
|
1491 : to the strings in the original format of the Blast alignment.
|
|
1492 : (i.e., same spacing).
|
|
1493
|
|
1494 See Also : L<seq_str()|seq_str>, L<seq_inds()|seq_inds>, B<Bio::Seq>
|
|
1495
|
|
1496 =cut
|
|
1497
|
|
1498 #-------
|
|
1499 sub seq {
|
|
1500 #-------
|
|
1501 my($self,$seqType) = @_;
|
|
1502 $seqType ||= 'query';
|
|
1503 $seqType = 'sbjct' if $seqType eq 'hit';
|
|
1504 my $str = $self->seq_str($seqType);
|
|
1505
|
|
1506 require Bio::Seq;
|
|
1507
|
|
1508 new Bio::Seq (-ID => $self->to_string,
|
|
1509 -SEQ => $str,
|
|
1510 -DESC => "$seqType sequence",
|
|
1511 );
|
|
1512 }
|
|
1513
|
|
1514 =head2 seq_str
|
|
1515
|
|
1516 Usage : $hsp->seq_str( seq_type );
|
|
1517 Purpose : Get the full query, sbjct, or 'match' sequence as a string.
|
|
1518 : The 'match' sequence is the string of symbols in between the
|
|
1519 : query and sbjct sequences.
|
|
1520 Example : $str = $hsp->seq_str('query');
|
|
1521 Returns : String
|
|
1522 Argument : seq_Type = 'query' or 'hit' or 'sbjct' or 'match'
|
|
1523 : ('sbjct' is synonymous with 'hit')
|
|
1524 Throws : Exception if the argument does not match an accepted seq_type.
|
|
1525 Comments : Calls _set_seq_data() to set the 'match' sequence if it has
|
|
1526 : not been set already.
|
|
1527
|
|
1528 See Also : L<seq()|seq>, L<seq_inds()|seq_inds>, B<_set_match_seq()>
|
|
1529
|
|
1530 =cut
|
|
1531
|
|
1532 #------------
|
|
1533 sub seq_str {
|
|
1534 #------------
|
|
1535 my($self,$seqType) = @_;
|
|
1536
|
|
1537 $seqType ||= 'query';
|
|
1538 $seqType = 'sbjct' if $seqType eq 'hit';
|
|
1539 ## Sensitive to member name changes.
|
|
1540 $seqType = "_\L$seqType\E";
|
|
1541
|
|
1542 $self->_set_seq_data() unless $self->{'_set_seq_data'};
|
|
1543
|
|
1544 if($seqType =~ /sbjct|query/) {
|
|
1545 my $seq = join('',@{$self->{$seqType.'Seq'}});
|
|
1546 $seq =~ s/\s+//g;
|
|
1547 return $seq;
|
|
1548
|
|
1549 } elsif( $seqType =~ /match/i) {
|
|
1550 # Only need to call _set_match_seq() if the match seq is requested.
|
|
1551 my $aref = $self->_set_match_seq() unless ref $self->{'_matchSeq'};
|
|
1552 $aref = $self->{'_matchSeq'};
|
|
1553
|
|
1554 return join('',@$aref);
|
|
1555
|
|
1556 } else {
|
|
1557 my $id_str = $self->_id_str;
|
|
1558 $self->throw(-class => 'Bio::Root::BadParameter',
|
|
1559 -text => "Invalid or undefined sequence type: $seqType ($id_str)\n" .
|
|
1560 "Valid types: query, sbjct, match",
|
|
1561 -value => $seqType);
|
|
1562 }
|
|
1563 }
|
|
1564
|
|
1565 =head2 seq_inds
|
|
1566
|
|
1567 Usage : $hsp->seq_inds( seq_type, class, collapse );
|
|
1568 Purpose : Get a list of residue positions (indices) for all identical
|
|
1569 : or conserved residues in the query or sbjct sequence.
|
|
1570 Example : @s_ind = $hsp->seq_inds('query', 'identical');
|
|
1571 : @h_ind = $hsp->seq_inds('hit', 'conserved');
|
|
1572 : @h_ind = $hsp->seq_inds('hit', 'conserved', 1);
|
|
1573 Returns : List of integers
|
|
1574 : May include ranges if collapse is true.
|
|
1575 Argument : seq_type = 'query' or 'hit' or 'sbjct' (default = query)
|
|
1576 : ('sbjct' is synonymous with 'hit')
|
|
1577 : class = 'identical' or 'conserved' (default = identical)
|
|
1578 : (can be shortened to 'id' or 'cons')
|
|
1579 : (actually, anything not 'id' will evaluate to 'conserved').
|
|
1580 : collapse = boolean, if true, consecutive positions are merged
|
|
1581 : using a range notation, e.g., "1 2 3 4 5 7 9 10 11"
|
|
1582 : collapses to "1-5 7 9-11". This is useful for
|
|
1583 : consolidating long lists. Default = no collapse.
|
|
1584 Throws : n/a.
|
|
1585 Comments : Calls _set_residues() to set the 'match' sequence if it has
|
|
1586 : not been set already.
|
|
1587
|
|
1588 See Also : L<seq()|seq>, B<_set_residues()>, L<Bio::Search::BlastUtils::collapse_nums()|Bio::Search::BlastUtils>, L<Bio::Search::Hit::BlastHit::seq_inds()|Bio::Search::Hit::BlastHit>
|
|
1589
|
|
1590 =cut
|
|
1591
|
|
1592 #---------------
|
|
1593 sub seq_inds {
|
|
1594 #---------------
|
|
1595 my ($self, $seqType, $class, $collapse) = @_;
|
|
1596
|
|
1597 $seqType ||= 'query';
|
|
1598 $class ||= 'identical';
|
|
1599 $collapse ||= 0;
|
|
1600 $seqType = 'sbjct' if $seqType eq 'hit';
|
|
1601
|
|
1602 $self->_set_residues() unless defined $self->{'_identicalRes_query'};
|
|
1603
|
|
1604 $seqType = ($seqType !~ /^q/i ? 'sbjct' : 'query');
|
|
1605 $class = ($class !~ /^id/i ? 'conserved' : 'identical');
|
|
1606
|
|
1607 ## Sensitive to member name changes.
|
|
1608 $seqType = "_\L$seqType\E";
|
|
1609 $class = "_\L$class\E";
|
|
1610
|
|
1611 my @ary = sort { $a <=> $b } keys %{ $self->{"${class}Res$seqType"}};
|
|
1612
|
|
1613 require Bio::Search::BlastUtils if $collapse;
|
|
1614
|
|
1615 return $collapse ? &Bio::Search::BlastUtils::collapse_nums(@ary) : @ary;
|
|
1616 }
|
|
1617
|
|
1618
|
|
1619 =head2 get_aln
|
|
1620
|
|
1621 Usage : $hsp->get_aln()
|
|
1622 Purpose : Get a Bio::SimpleAlign object constructed from the query + sbjct
|
|
1623 : sequences of the present HSP object.
|
|
1624 Example : $aln_obj = $hsp->get_aln();
|
|
1625 Returns : Object reference for a Bio::SimpleAlign.pm object.
|
|
1626 Argument : n/a.
|
|
1627 Throws : Propagates any exception ocurring during the construction of
|
|
1628 : the Bio::SimpleAlign object.
|
|
1629 Comments : Requires Bio::SimpleAlign.
|
|
1630 : The Bio::SimpleAlign object is constructed from the query + sbjct
|
|
1631 : sequence objects obtained by calling seq().
|
|
1632 : Gap residues are included (see $GAP_SYMBOL).
|
|
1633
|
|
1634 See Also : L<seq()|seq>, L<Bio::SimpleAlign>
|
|
1635
|
|
1636 =cut
|
|
1637
|
|
1638 #------------
|
|
1639 sub get_aln {
|
|
1640 #------------
|
|
1641 my $self = shift;
|
|
1642
|
|
1643 require Bio::SimpleAlign;
|
|
1644 require Bio::LocatableSeq;
|
|
1645 my $qseq = $self->seq('query');
|
|
1646 my $sseq = $self->seq('sbjct');
|
|
1647
|
|
1648 my $type = $self->algorithm =~ /P$|^T/ ? 'amino' : 'dna';
|
|
1649 my $aln = new Bio::SimpleAlign();
|
|
1650 $aln->add_seq(new Bio::LocatableSeq(-seq => $qseq->seq(),
|
|
1651 -id => 'query_'.$qseq->display_id(),
|
|
1652 -start => 1,
|
|
1653 -end => CORE::length($qseq)));
|
|
1654
|
|
1655 $aln->add_seq(new Bio::LocatableSeq(-seq => $sseq->seq(),
|
|
1656 -id => 'hit_'.$sseq->display_id(),
|
|
1657 -start => 1,
|
|
1658 -end => CORE::length($sseq)));
|
|
1659
|
|
1660 return $aln;
|
|
1661 }
|
|
1662
|
|
1663
|
|
1664 1;
|
|
1665 __END__
|
|
1666
|
|
1667
|
|
1668 =head1 FOR DEVELOPERS ONLY
|
|
1669
|
|
1670 =head2 Data Members
|
|
1671
|
|
1672 Information about the various data members of this module is provided for those
|
|
1673 wishing to modify or understand the code. Two things to bear in mind:
|
|
1674
|
|
1675 =over 4
|
|
1676
|
|
1677 =item 1 Do NOT rely on these in any code outside of this module.
|
|
1678
|
|
1679 All data members are prefixed with an underscore to signify that they are private.
|
|
1680 Always use accessor methods. If the accessor doesn't exist or is inadequate,
|
|
1681 create or modify an accessor (and let me know, too!).
|
|
1682
|
|
1683 =item 2 This documentation may be incomplete and out of date.
|
|
1684
|
|
1685 It is easy for these data member descriptions to become obsolete as
|
|
1686 this module is still evolving. Always double check this info and search
|
|
1687 for members not described here.
|
|
1688
|
|
1689 =back
|
|
1690
|
|
1691 An instance of Bio::Search::HSP::BlastHSP.pm is a blessed reference to a hash containing
|
|
1692 all or some of the following fields:
|
|
1693
|
|
1694 FIELD VALUE
|
|
1695 --------------------------------------------------------------
|
|
1696 (member names are mostly self-explanatory)
|
|
1697
|
|
1698 _score :
|
|
1699 _bits :
|
|
1700 _p :
|
|
1701 _n : Integer. The 'N' value listed in parenthesis with P/Expect value:
|
|
1702 : e.g., P(3) = 1.2e-30 ---> (N = 3).
|
|
1703 : Not defined in NCBI Blast2 with gaps.
|
|
1704 : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().
|
|
1705 _expect :
|
|
1706 _queryLength :
|
|
1707 _queryGaps :
|
|
1708 _queryStart :
|
|
1709 _queryStop :
|
|
1710 _querySeq :
|
|
1711 _sbjctLength :
|
|
1712 _sbjctGaps :
|
|
1713 _sbjctStart :
|
|
1714 _sbjctStop :
|
|
1715 _sbjctSeq :
|
|
1716 _matchSeq : String. Contains the symbols between the query and sbjct lines
|
|
1717 which indicate identical (letter) and conserved ('+') matches
|
|
1718 or a mismatch (' ').
|
|
1719 _numIdentical :
|
|
1720 _numConserved :
|
|
1721 _identicalRes_query :
|
|
1722 _identicalRes_sbjct :
|
|
1723 _conservedRes_query :
|
|
1724 _conservedRes_sbjct :
|
|
1725 _match_indent : The number of leading space characters on each line containing
|
|
1726 the match symbols. _match_indent is 13 in this example:
|
|
1727 Query: 285 QNSAPWGLARISHRERLNLGSFNKYLYDDDAG
|
|
1728 Q +APWGLARIS G+ + Y YD+ AG
|
|
1729 ^^^^^^^^^^^^^
|
|
1730
|
|
1731
|
|
1732 =cut
|
|
1733
|
|
1734 1;
|
|
1735
|