annotate variant_effect_predictor/Bio/Search/HSP/BlastHSP.pm @ 1:d6778b5d8382 draft default tip

Deleted selected files
author willmclaren
date Fri, 03 Aug 2012 10:05:43 -0400
parents 21066c0abaf5
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1 #-----------------------------------------------------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
2 # $Id: BlastHSP.pm,v 1.20 2002/12/24 15:45:33 jason Exp $
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
3 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
4 # BioPerl module Bio::Search::HSP::BlastHSP
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
5 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
6 # (This module was originally called Bio::Tools::Blast::HSP)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
7 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
8 # Cared for by Steve Chervitz <sac@bioperl.org>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
9 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
10 # You may distribute this module under the same terms as perl itself
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
11 #-----------------------------------------------------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
12
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
13 ## POD Documentation:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
14
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
15 =head1 NAME
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
16
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
17 Bio::Search::HSP::BlastHSP - Bioperl BLAST High-Scoring Pair object
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
18
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
19 =head1 SYNOPSIS
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
20
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
21 The construction of BlastHSP objects is performed by
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
22 Bio::Factory::BlastHitFactory in a process that is
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
23 orchestrated by the Blast parser (B<Bio::SearchIO::psiblast>).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
24 The resulting BlastHSPs are then accessed via
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
25 B<Bio::Search::Hit::BlastHit>). Therefore, you do not need to
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
26 use B<Bio::Search::HSP::BlastHSP>) directly. If you need to construct
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
27 BlastHSPs directly, see the new() function for details.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
28
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
29 For B<Bio::SearchIO> BLAST parsing usage examples, see the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
30 B<examples/search-blast> directory of the Bioperl distribution.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
31
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
32 =head1 DESCRIPTION
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
33
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
34 A Bio::Search::HSP::BlastHSP object provides an interface to data
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
35 obtained in a single alignment section of a Blast report (known as a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
36 "High-scoring Segment Pair"). This is essentially a pairwise
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
37 alignment with score information.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
38
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
39 BlastHSP objects are accessed via B<Bio::Search::Hit::BlastHit>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
40 objects after parsing a BLAST report using the B<Bio::SearchIO>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
41 system.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
42
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
43 =head2 Start and End coordinates
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
44
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
45 Sequence endpoints are swapped so that start is always less than
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
46 end. This affects For TBLASTN/X hits on the minus strand. Strand
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
47 information can be recovered using the strand() method. This
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
48 normalization step is standard Bioperl practice. It also facilitates
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
49 use of range information by methods such as match().
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
50
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
51 =over 1
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
52
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
53 =item * Supports BLAST versions 1.x and 2.x, gapped and ungapped.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
54
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
55 =back
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
56
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
57 Bio::Search::HSP::BlastHSP.pm has the ability to extract a list of all
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
58 residue indices for identical and conservative matches along both
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
59 query and sbjct sequences. Since this degree of detail is not always
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
60 needed, this behavior does not occur during construction of the BlastHSP
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
61 object. These data will automatically be collected as necessary as
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
62 the BlastHSP.pm object is used.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
63
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
64 =head1 DEPENDENCIES
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
65
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
66 Bio::Search::HSP::BlastHSP.pm is a concrete class that inherits from
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
67 B<Bio::SeqFeature::SimilarityPair> and B<Bio::Search::HSP::HSPI>.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
68 B<Bio::Seq> and B<Bio::SimpleAlign> are employed for creating
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
69 sequence and alignment objects, respectively.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
70
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
71 =head2 Relationship to SimpleAlign.pm & Seq.pm
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
72
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
73 BlastHSP.pm can provide the query or sbjct sequence as a B<Bio::Seq>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
74 object via the L<seq()|seq> method. The BlastHSP.pm object can also create a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
75 two-sequence B<Bio::SimpleAlign> alignment object using the the query
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
76 and sbjct sequences via the L<get_aln()|get_aln> method. Creation of alignment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
77 objects is not automatic when constructing the BlastHSP.pm object since
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
78 this level of functionality is not always required and would generate
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
79 a lot of extra overhead when crunching many reports.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
80
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
81
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
82 =head1 FEEDBACK
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
83
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
84 =head2 Mailing Lists
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
85
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
86 User feedback is an integral part of the evolution of this and other
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
87 Bioperl modules. Send your comments and suggestions preferably to one
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
88 of the Bioperl mailing lists. Your participation is much appreciated.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
89
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
90 bioperl-l@bioperl.org - General discussion
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
91 http://bio.perl.org/MailList.html - About the mailing lists
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
92
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
93 =head2 Reporting Bugs
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
94
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
95 Report bugs to the Bioperl bug tracking system to help us keep track
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
96 the bugs and their resolution. Bug reports can be submitted via email
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
97 or the web:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
98
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
99 bioperl-bugs@bio.perl.org
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
100 http://bugzilla.bioperl.org/
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
101
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
102 =head1 AUTHOR
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
103
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
104 Steve Chervitz E<lt>sac@bioperl.orgE<gt>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
105
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
106 See L<the FEEDBACK section | FEEDBACK> for where to send bug reports and comments.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
107
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
108 =head1 ACKNOWLEDGEMENTS
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
109
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
110 This software was originally developed in the Department of Genetics
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
111 at Stanford University. I would also like to acknowledge my
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
112 colleagues at Affymetrix for useful feedback.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
113
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
114 =head1 SEE ALSO
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
115
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
116 Bio::Search::Hit::BlastHit.pm - Blast hit object.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
117 Bio::Search::Result::BlastResult.pm - Blast Result object.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
118 Bio::Seq.pm - Biosequence object
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
119
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
120 =head2 Links:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
121
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
122 http://bio.perl.org/Core/POD/Tools/Blast/BlastHit.pm.html
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
123
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
124 http://bio.perl.org/Projects/modules.html - Online module documentation
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
125 http://bio.perl.org/Projects/Blast/ - Bioperl Blast Project
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
126 http://bio.perl.org/ - Bioperl Project Homepage
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
127
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
128 =head1 COPYRIGHT
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
129
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
130 Copyright (c) 1996-2001 Steve Chervitz. All Rights Reserved.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
131
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
132 =head1 DISCLAIMER
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
133
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
134 This software is provided "as is" without warranty of any kind.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
135
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
136 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
137
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
138
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
139 # END of main POD documentation.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
140
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
141 =head1 APPENDIX
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
142
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
143 The rest of the documentation details each of the object methods.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
144 Internal methods are usually preceded with a _
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
145
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
146 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
147
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
148 # Let the code begin...
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
149
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
150 package Bio::Search::HSP::BlastHSP;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
151
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
152 use strict;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
153 use Bio::SeqFeature::SimilarityPair;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
154 use Bio::SeqFeature::Similarity;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
155 use Bio::Search::HSP::HSPI;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
156
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
157 use vars qw( @ISA $GAP_SYMBOL $Revision %STRAND_SYMBOL );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
158
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
159 use overload
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
160 '""' => \&to_string;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
161
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
162 $Revision = '$Id: BlastHSP.pm,v 1.20 2002/12/24 15:45:33 jason Exp $'; #'
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
163
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
164 @ISA = qw(Bio::SeqFeature::SimilarityPair Bio::Search::HSP::HSPI);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
165
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
166 $GAP_SYMBOL = '-'; # Need a more general way to handle gap symbols.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
167 %STRAND_SYMBOL = ('Plus' => 1, 'Minus' => -1 );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
168
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
169
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
170 =head2 new
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
171
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
172 Usage : $hsp = Bio::Search::HSP::BlastHSP->new( %named_params );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
173 : Bio::Search::HSP::BlastHSP.pm objects are constructed
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
174 : automatically by Bio::SearchIO::BlastHitFactory.pm,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
175 : so there is no need for direct instantiation.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
176 Purpose : Constructs a new BlastHSP object and Initializes key variables
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
177 : for the HSP.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
178 Returns : A Bio::Search::HSP::BlastHSP object
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
179 Argument : Named parameters:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
180 : Parameter keys are case-insensitive.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
181 : -RAW_DATA => array ref containing raw BLAST report data for
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
182 : for a single HSP. This includes all lines
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
183 : of the HSP alignment from a traditional BLAST
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
184 or PSI-BLAST (non-XML) report,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
185 : -RANK => integer (1..n).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
186 : -PROGRAM => string ('TBLASTN', 'BLASTP', etc.).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
187 : -QUERY_NAME => string, id of query sequence
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
188 : -HIT_NAME => string, id of hit sequence
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
189 :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
190 Comments : Having the raw data allows this object to do lazy parsing of
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
191 : the raw HSP data (i.e., not parsed until needed).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
192 :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
193 : Note that there is a fair amount of basic parsing that is
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
194 : currently performed in this module that would be more appropriate
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
195 : to do within a separate factory object.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
196 : This parsing code will likely be relocated and more initialization
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
197 : parameters will be added to new().
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
198 :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
199 See Also : B<Bio::SeqFeature::SimilarityPair::new()>, B<Bio::SeqFeature::Similarity::new()>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
200
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
201 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
202
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
203 #----------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
204 sub new {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
205 #----------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
206 my ($class, @args ) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
207
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
208 my $self = $class->SUPER::new( @args );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
209 # Initialize placeholders
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
210 $self->{'_queryGaps'} = $self->{'_sbjctGaps'} = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
211 my ($raw_data, $qname, $hname, $qlen, $hlen);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
212
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
213 ($self->{'_prog'}, $self->{'_rank'}, $raw_data,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
214 $qname, $hname) =
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
215 $self->_rearrange([qw( PROGRAM
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
216 RANK
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
217 RAW_DATA
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
218 QUERY_NAME
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
219 HIT_NAME
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
220 )], @args );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
221
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
222 # _set_data() does a fair amount of parsing.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
223 # This will likely change (see comment above.)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
224 $self->_set_data( @{$raw_data} );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
225 # Store the aligned query as sequence feature
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
226 my ($qb, $hb) = ($self->start());
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
227 my ($qe, $he) = ($self->end());
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
228 my ($qs, $hs) = ($self->strand());
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
229 my ($qf,$hf) = ($self->query->frame(),
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
230 $self->hit->frame);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
231
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
232 $self->query( Bio::SeqFeature::Similarity->new (-start =>$qb,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
233 -end =>$qe,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
234 -strand =>$qs,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
235 -bits =>$self->bits,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
236 -score =>$self->score,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
237 -frame =>$qf,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
238 -seq_id => $qname,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
239 -source =>$self->{'_prog'} ));
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
240
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
241 $self->hit( Bio::SeqFeature::Similarity->new (-start =>$hb,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
242 -end =>$he,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
243 -strand =>$hs,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
244 -bits =>$self->bits,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
245 -score =>$self->score,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
246 -frame =>$hf,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
247 -seq_id => $hname,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
248 -source =>$self->{'_prog'} ));
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
249
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
250 # set lengths
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
251 $self->query->seqlength($qlen); # query
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
252 $self->hit->seqlength($hlen); # subject
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
253
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
254 $self->query->frac_identical($self->frac_identical('query'));
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
255 $self->hit->frac_identical($self->frac_identical('hit'));
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
256 return $self;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
257 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
258
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
259 #sub DESTROY {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
260 # my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
261 # #print STDERR "--->DESTROYING $self\n";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
262 #}
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
263
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
264
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
265 # Title : _id_str;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
266 # Purpose : Intended for internal use only to provide a string for use
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
267 # within exception messages to help users figure out which
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
268 # query/hit caused the problem.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
269 # Returns : Short string with name of query and hit seq
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
270 sub _id_str {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
271 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
272 if( not defined $self->{'_id_str'}) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
273 my $qname = $self->query->seqname;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
274 my $hname = $self->hit->seqname;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
275 $self->{'_id_str'} = "QUERY=\"$qname\" HIT=\"$hname\"";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
276 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
277 return $self->{'_id_str'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
278 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
279
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
280 #=================================================
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
281 # Begin Bio::Search::HSP::HSPI implementation
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
282 #=================================================
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
283
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
284 =head2 algorithm
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
285
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
286 Title : algorithm
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
287 Usage : $alg = $hsp->algorithm();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
288 Function: Gets the algorithm specification that was used to obtain the hsp
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
289 For BLAST, the algorithm denotes what type of sequence was aligned
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
290 against what (BLASTN: dna-dna, BLASTP prt-prt, BLASTX translated
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
291 dna-prt, TBLASTN prt-translated dna, TBLASTX translated
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
292 dna-translated dna).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
293 Returns : a scalar string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
294 Args : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
295
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
296 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
297
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
298 #----------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
299 sub algorithm {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
300 #----------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
301 my ($self,@args) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
302 return $self->{'_prog'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
303 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
304
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
305
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
306
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
307
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
308 =head2 signif()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
309
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
310 Usage : $hsp_obj->signif()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
311 Purpose : Get the P-value or Expect value for the HSP.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
312 Returns : Float (0.001 or 1.3e-43)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
313 : Returns P-value if it is defined, otherwise, Expect value.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
314 Argument : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
315 Throws : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
316 Comments : Provided for consistency with BlastHit::signif()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
317 : Support for returning the significance data in different
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
318 : formats (e.g., exponent only), is not provided for HSP objects.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
319 : This is only available for the BlastHit or Blast object.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
320
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
321 See Also : L<p()|p>, L<expect()|expect>, L<Bio::Search::Hit::BlastHit::signif()|Bio::Search::Hit::BlastHit>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
322
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
323 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
324
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
325 #-----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
326 sub signif {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
327 #-----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
328 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
329 my $val ||= defined($self->{'_p'}) ? $self->{'_p'} : $self->{'_expect'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
330 $val;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
331 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
332
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
333
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
334
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
335 =head2 evalue
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
336
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
337 Usage : $hsp_obj->evalue()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
338 Purpose : Get the Expect value for the HSP.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
339 Returns : Float (0.001 or 1.3e-43)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
340 Argument : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
341 Throws : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
342 Comments : Support for returning the expectation data in different
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
343 : formats (e.g., exponent only), is not provided for HSP objects.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
344 : This is only available for the BlastHit or Blast object.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
345
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
346 See Also : L<p()|p>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
347
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
348 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
349
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
350 #----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
351 sub evalue { shift->{'_expect'} }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
352 #----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
353
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
354
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
355 =head2 p
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
356
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
357 Usage : $hsp_obj->p()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
358 Purpose : Get the P-value for the HSP.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
359 Returns : Float (0.001 or 1.3e-43) or undef if not defined.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
360 Argument : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
361 Throws : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
362 Comments : P-value is not defined with NCBI Blast2 reports.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
363 : Support for returning the expectation data in different
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
364 : formats (e.g., exponent only) is not provided for HSP objects.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
365 : This is only available for the BlastHit or Blast object.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
366
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
367 See Also : L<expect()|expect>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
368
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
369 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
370
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
371 #-----
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
372 sub p { my $self = shift; $self->{'_p'}; }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
373 #-----
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
374
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
375 # alias
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
376 sub pvalue { shift->p(@_); }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
377
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
378 =head2 length
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
379
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
380 Usage : $hsp->length( [seq_type] )
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
381 Purpose : Get the length of the aligned portion of the query or sbjct.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
382 Example : $hsp->length('query')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
383 Returns : integer
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
384 Argument : seq_type: 'query' | 'hit' or 'sbjct' | 'total' (default = 'total')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
385 ('sbjct' is synonymous with 'hit')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
386 Throws : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
387 Comments : 'total' length is the full length of the alignment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
388 : as reported in the denominators in the alignment section:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
389 : "Identical = 34/120 Positives = 67/120".
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
390
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
391 See Also : L<gaps()|gaps>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
392
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
393 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
394
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
395 #-----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
396 sub length {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
397 #-----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
398 ## Developer note: when using the built-in length function within
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
399 ## this module, call it as CORE::length().
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
400 my( $self, $seqType ) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
401 $seqType ||= 'total';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
402 $seqType = 'sbjct' if $seqType eq 'hit';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
403
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
404 $seqType ne 'total' and $self->_set_seq_data() unless $self->{'_set_seq_data'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
405
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
406 ## Sensitive to member name format.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
407 $seqType = "_\L$seqType\E";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
408 $self->{$seqType.'Length'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
409 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
410
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
411
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
412
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
413 =head2 gaps
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
414
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
415 Usage : $hsp->gaps( [seq_type] )
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
416 Purpose : Get the number of gaps in the query, sbjct, or total alignment.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
417 : Also can return query gaps and sbjct gaps as a two-element list
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
418 : when in array context.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
419 Example : $total_gaps = $hsp->gaps();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
420 : ($qgaps, $sgaps) = $hsp->gaps();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
421 : $qgaps = $hsp->gaps('query');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
422 Returns : scalar context: integer
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
423 : array context without args: (int, int) = ('queryGaps', 'sbjctGaps')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
424 Argument : seq_type: 'query' or 'hit' or 'sbjct' or 'total'
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
425 : ('sbjct' is synonymous with 'hit')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
426 : (default = 'total', scalar context)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
427 : Array context can be "induced" by providing an argument of 'list' or 'array'.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
428 Throws : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
429
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
430 See Also : L<length()|length>, L<matches()|matches>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
431
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
432 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
433
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
434 #---------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
435 sub gaps {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
436 #---------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
437 my( $self, $seqType ) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
438
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
439 $self->_set_seq_data() unless $self->{'_set_seq_data'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
440
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
441 $seqType ||= (wantarray ? 'list' : 'total');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
442 $seqType = 'sbjct' if $seqType eq 'hit';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
443
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
444 if($seqType =~ /list|array/i) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
445 return (($self->{'_queryGaps'} || 0), ($self->{'_sbjctGaps'} || 0));
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
446 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
447
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
448 if($seqType eq 'total') {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
449 return ($self->{'_queryGaps'} + $self->{'_sbjctGaps'}) || 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
450 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
451 ## Sensitive to member name format.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
452 $seqType = "_\L$seqType\E";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
453 return $self->{$seqType.'Gaps'} || 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
454 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
455 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
456
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
457
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
458 =head2 frac_identical
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
459
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
460 Usage : $hsp_object->frac_identical( [seq_type] );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
461 Purpose : Get the fraction of identical positions within the given HSP.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
462 Example : $frac_iden = $hsp_object->frac_identical('query');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
463 Returns : Float (2-decimal precision, e.g., 0.75).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
464 Argument : seq_type: 'query' or 'hit' or 'sbjct' or 'total'
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
465 : ('sbjct' is synonymous with 'hit')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
466 : default = 'total' (but see comments below).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
467 Throws : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
468 Comments : Different versions of Blast report different values for the total
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
469 : length of the alignment. This is the number reported in the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
470 : denominators in the stats section:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
471 : "Identical = 34/120 Positives = 67/120".
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
472 : NCBI-BLAST uses the total length of the alignment (with gaps)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
473 : WU-BLAST uses the length of the query sequence (without gaps).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
474 : Therefore, when called without an argument or an argument of 'total',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
475 : this method will report different values depending on the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
476 : version of BLAST used.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
477 :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
478 : To get the fraction identical among only the aligned residues,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
479 : ignoring the gaps, call this method with an argument of 'query'
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
480 : or 'sbjct' ('sbjct' is synonymous with 'hit').
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
481
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
482 See Also : L<frac_conserved()|frac_conserved>, L<num_identical()|num_identical>, L<matches()|matches>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
483
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
484 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
485
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
486 #-------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
487 sub frac_identical {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
488 #-------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
489 # The value is calculated as opposed to storing it from the parsed results.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
490 # This saves storage and also permits flexibility in determining for which
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
491 # sequence (query or sbjct) the figure is to be calculated.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
492
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
493 my( $self, $seqType ) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
494 $seqType ||= 'total';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
495 $seqType = 'sbjct' if $seqType eq 'hit';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
496
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
497 if($seqType ne 'total') {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
498 $self->_set_seq_data() unless $self->{'_set_seq_data'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
499 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
500 ## Sensitive to member name format.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
501 $seqType = "_\L$seqType\E";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
502
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
503 sprintf( "%.2f", $self->{'_numIdentical'}/$self->{$seqType.'Length'});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
504 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
505
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
506
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
507 =head2 frac_conserved
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
508
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
509 Usage : $hsp_object->frac_conserved( [seq_type] );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
510 Purpose : Get the fraction of conserved positions within the given HSP.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
511 : (Note: 'conservative' positions are called 'positives' in the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
512 : Blast report.)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
513 Example : $frac_cons = $hsp_object->frac_conserved('query');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
514 Returns : Float (2-decimal precision, e.g., 0.75).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
515 Argument : seq_type: 'query' or 'hit' or 'sbjct' or 'total'
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
516 : ('sbjct' is synonymous with 'hit')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
517 : default = 'total' (but see comments below).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
518 Throws : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
519 Comments : Different versions of Blast report different values for the total
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
520 : length of the alignment. This is the number reported in the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
521 : denominators in the stats section:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
522 : "Identical = 34/120 Positives = 67/120".
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
523 : NCBI-BLAST uses the total length of the alignment (with gaps)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
524 : WU-BLAST uses the length of the query sequence (without gaps).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
525 : Therefore, when called without an argument or an argument of 'total',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
526 : this method will report different values depending on the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
527 : version of BLAST used.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
528 :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
529 : To get the fraction conserved among only the aligned residues,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
530 : ignoring the gaps, call this method with an argument of 'query'
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
531 : or 'sbjct'.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
532
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
533 See Also : L<frac_conserved()|frac_conserved>, L<num_conserved()|num_conserved>, L<matches()|matches>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
534
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
535 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
536
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
537 #--------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
538 sub frac_conserved {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
539 #--------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
540 # The value is calculated as opposed to storing it from the parsed results.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
541 # This saves storage and also permits flexibility in determining for which
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
542 # sequence (query or sbjct) the figure is to be calculated.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
543
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
544 my( $self, $seqType ) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
545 $seqType ||= 'total';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
546 $seqType = 'sbjct' if $seqType eq 'hit';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
547
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
548 if($seqType ne 'total') {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
549 $self->_set_seq_data() unless $self->{'_set_seq_data'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
550 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
551
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
552 ## Sensitive to member name format.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
553 $seqType = "_\L$seqType\E";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
554
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
555 sprintf( "%.2f", $self->{'_numConserved'}/$self->{$seqType.'Length'});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
556 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
557
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
558 =head2 query_string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
559
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
560 Title : query_string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
561 Usage : my $qseq = $hsp->query_string;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
562 Function: Retrieves the query sequence of this HSP as a string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
563 Returns : string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
564 Args : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
565
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
566
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
567 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
568
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
569 #----------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
570 sub query_string{ shift->seq_str('query'); }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
571 #----------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
572
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
573 =head2 hit_string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
574
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
575 Title : hit_string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
576 Usage : my $hseq = $hsp->hit_string;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
577 Function: Retrieves the hit sequence of this HSP as a string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
578 Returns : string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
579 Args : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
580
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
581
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
582 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
583
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
584 #----------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
585 sub hit_string{ shift->seq_str('hit'); }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
586 #----------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
587
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
588
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
589 =head2 homology_string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
590
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
591 Title : homology_string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
592 Usage : my $homo_string = $hsp->homology_string;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
593 Function: Retrieves the homology sequence for this HSP as a string.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
594 : The homology sequence is the string of symbols in between the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
595 : query and hit sequences in the alignment indicating the degree
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
596 : of conservation (e.g., identical, similar, not similar).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
597 Returns : string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
598 Args : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
599
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
600 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
601
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
602 #----------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
603 sub homology_string{ shift->seq_str('match'); }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
604 #----------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
605
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
606 #=================================================
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
607 # End Bio::Search::HSP::HSPI implementation
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
608 #=================================================
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
609
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
610 # Older method delegating to method defined in HSPI.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
611
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
612 =head2 expect
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
613
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
614 See L<Bio::Search::HSP::HSPI::expect()|Bio::Search::HSP::HSPI>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
615
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
616 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
617
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
618 #----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
619 sub expect { shift->evalue( @_ ); }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
620 #----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
621
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
622
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
623 =head2 rank
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
624
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
625 Usage : $hsp->rank( [string] );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
626 Purpose : Get the rank of the HSP within a given Blast hit.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
627 Example : $rank = $hsp->rank;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
628 Returns : Integer (1..n) corresponding to the order in which the HSP
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
629 appears in the BLAST report.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
630
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
631 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
632
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
633 #'
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
634
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
635 #----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
636 sub rank { shift->{'_rank'} }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
637 #----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
638
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
639 # For backward compatibility
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
640 #----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
641 sub name { shift->rank }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
642 #----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
643
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
644 =head2 to_string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
645
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
646 Title : to_string
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
647 Usage : print $hsp->to_string;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
648 Function: Returns a string representation for the Blast HSP.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
649 Primarily intended for debugging purposes.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
650 Example : see usage
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
651 Returns : A string of the form:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
652 [BlastHSP] <rank>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
653 e.g.:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
654 [BlastHit] 1
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
655 Args : None
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
656
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
657 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
658
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
659 #----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
660 sub to_string {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
661 #----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
662 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
663 return "[BlastHSP] " . $self->rank();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
664 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
665
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
666
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
667 #=head2 _set_data (Private method)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
668 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
669 # Usage : called automatically during object construction.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
670 # Purpose : Parses the raw HSP section from a flat BLAST report and
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
671 # sets the query sequence, sbjct sequence, and the "match" data
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
672 # : which consists of the symbols between the query and sbjct lines
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
673 # : in the alignment.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
674 # Argument : Array (all lines for a single, complete HSP, from a raw,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
675 # flat (i.e., non-XML) BLAST report)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
676 # Throws : Propagates any exceptions from the methods called ("See Also")
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
677 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
678 #See Also : L<_set_seq()|_set_seq>, L<_set_score_stats()|_set_score_stats>, L<_set_match_stats()|_set_match_stats>, L<_initialize()|_initialize>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
679 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
680 #=cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
681
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
682 #--------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
683 sub _set_data {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
684 #--------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
685 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
686 my @data = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
687 my @queryList = (); # 'Query' = SEQUENCE USED TO QUERY THE DATABASE.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
688 my @sbjctList = (); # 'Sbjct' = HOMOLOGOUS SEQUENCE FOUND IN THE DATABASE.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
689 my @matchList = ();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
690 my $matchLine = 0; # Alternating boolean: when true, load 'match' data.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
691 my @linedat = ();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
692
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
693 #print STDERR "BlastHSP: set_data()\n";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
694
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
695 my($line, $aln_row_len, $length_diff);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
696 $length_diff = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
697
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
698 # Collecting data for all lines in the alignment
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
699 # and then storing the collections for possible processing later.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
700 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
701 # Note that "match" lines may not be properly padded with spaces.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
702 # This loop now properly handles such cases:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
703 # Query: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVIXXXXX 1200
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
704 # PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVI
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
705 # Sbjct: 1141 PSLVELTIRDCPRLEVGPMIRSLPKFPMLKKLDLAVANIIEEDLDVIGSLEELVILSLKL 1200
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
706
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
707 foreach $line( @data ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
708 next if $line =~ /^\s*$/;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
709
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
710 if( $line =~ /^ ?Score/ ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
711 $self->_set_score_stats( $line );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
712 } elsif( $line =~ /^ ?(Identities|Positives|Strand)/ ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
713 $self->_set_match_stats( $line );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
714 } elsif( $line =~ /^ ?Frame = ([\d+-]+)/ ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
715 # Version 2.0.8 has Frame information on a separate line.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
716 # Storing frame according to SeqFeature::Generic::frame()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
717 # which does not contain strand info (use strand()).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
718 my $frame = abs($1) - 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
719 $self->frame( $frame );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
720 } elsif( $line =~ /^(Query:?[\s\d]+)([^\s\d]+)/ ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
721 push @queryList, $line;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
722 $self->{'_match_indent'} = CORE::length $1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
723 $aln_row_len = (CORE::length $1) + (CORE::length $2);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
724 $matchLine = 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
725 } elsif( $matchLine ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
726 # Pad the match line with spaces if necessary.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
727 $length_diff = $aln_row_len - CORE::length $line;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
728 $length_diff and $line .= ' 'x $length_diff;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
729 push @matchList, $line;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
730 $matchLine = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
731 } elsif( $line =~ /^Sbjct/ ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
732 push @sbjctList, $line;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
733 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
734 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
735 # Storing the query and sbjct lists in case they are needed later.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
736 # We could make this conditional to save memory.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
737 $self->{'_queryList'} = \@queryList;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
738 $self->{'_sbjctList'} = \@sbjctList;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
739
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
740 # Storing the match list in case it is needed later.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
741 $self->{'_matchList'} = \@matchList;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
742
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
743 if(not defined ($self->{'_numIdentical'})) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
744 my $id_str = $self->_id_str;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
745 $self->throw( -text => "Can't parse match statistics. Possibly a new or unrecognized Blast format. ($id_str)");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
746 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
747
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
748 if(!scalar @queryList or !scalar @sbjctList) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
749 my $id_str = $self->_id_str;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
750 $self->throw( "Can't find query or sbjct alignment lines. Possibly unrecognized Blast format. ($id_str)");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
751 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
752 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
753
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
754
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
755 #=head2 _set_score_stats (Private method)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
756 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
757 # Usage : called automatically by _set_data()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
758 # Purpose : Sets various score statistics obtained from the HSP listing.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
759 # Argument : String with any of the following formats:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
760 # : blast2: Score = 30.1 bits (66), Expect = 9.2
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
761 # : blast2: Score = 158.2 bits (544), Expect(2) = e-110
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
762 # : blast1: Score = 410 (144.3 bits), Expect = 1.7e-40, P = 1.7e-40
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
763 # : blast1: Score = 55 (19.4 bits), Expect = 5.3, Sum P(3) = 0.99
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
764 # Throws : Exception if the stats cannot be parsed, probably due to a change
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
765 # : in the Blast report format.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
766 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
767 #See Also : L<_set_data()|_set_data>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
768 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
769 #=cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
770
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
771 #--------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
772 sub _set_score_stats {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
773 #--------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
774 my ($self, $data) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
775
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
776 my ($expect, $p);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
777
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
778 if($data =~ /Score = +([\d.e+-]+) bits \(([\d.e+-]+)\), +Expect = +([\d.e+-]+)/) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
779 # blast2 format n = 1
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
780 $self->bits($1);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
781 $self->score($2);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
782 $expect = $3;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
783 } elsif($data =~ /Score = +([\d.e+-]+) bits \(([\d.e+-]+)\), +Expect\((\d+)\) = +([\d.e+-]+)/) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
784 # blast2 format n > 1
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
785 $self->bits($1);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
786 $self->score($2);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
787 $self->{'_n'} = $3;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
788 $expect = $4;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
789
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
790 } elsif($data =~ /Score = +([\d.e+-]+) \(([\d.e+-]+) bits\), +Expect = +([\d.e+-]+), P = +([\d.e-]+)/) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
791 # blast1 format, n = 1
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
792 $self->score($1);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
793 $self->bits($2);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
794 $expect = $3;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
795 $p = $4;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
796
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
797 } elsif($data =~ /Score = +([\d.e+-]+) \(([\d.e+-]+) bits\), +Expect = +([\d.e+-]+), +Sum P\((\d+)\) = +([\d.e-]+)/) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
798 # blast1 format, n > 1
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
799 $self->score($1);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
800 $self->bits($2);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
801 $expect = $3;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
802 $self->{'_n'} = $4;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
803 $p = $5;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
804
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
805 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
806 my $id_str = $self->_id_str;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
807 $self->throw(-class => 'Bio::Root::Exception',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
808 -text => "Can't parse score statistics: unrecognized format. ($id_str)",
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
809 -value => $data);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
810 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
811 $expect = "1$expect" if $expect =~ /^e/i;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
812 $p = "1$p" if defined $p and $p=~ /^e/i;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
813
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
814 $self->{'_expect'} = $expect;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
815 $self->{'_p'} = $p || undef;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
816 $self->significance( $p || $expect );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
817 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
818
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
819
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
820 #=head2 _set_match_stats (Private method)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
821 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
822 # Usage : Private method; called automatically by _set_data()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
823 # Purpose : Sets various matching statistics obtained from the HSP listing.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
824 # Argument : blast2: Identities = 23/74 (31%), Positives = 29/74 (39%), Gaps = 17/74 (22%)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
825 # : blast2: Identities = 57/98 (58%), Positives = 74/98 (75%)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
826 # : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
827 # : blast1: Identities = 87/204 (42%), Positives = 126/204 (61%), Frame = -3
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
828 # : WU-blast: Identities = 310/553 (56%), Positives = 310/553 (56%), Strand = Minus / Plus
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
829 # Throws : Exception if the stats cannot be parsed, probably due to a change
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
830 # : in the Blast report format.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
831 # Comments : The "Gaps = " data in the HSP header has a different meaning depending
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
832 # : on the type of Blast: for BLASTP, this number is the total number of
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
833 # : gaps in query+sbjct; for TBLASTN, it is the number of gaps in the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
834 # : query sequence only. Thus, it is safer to collect the data
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
835 # : separately by examining the actual sequence strings as is done
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
836 # : in _set_seq().
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
837 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
838 #See Also : L<_set_data()|_set_data>, L<_set_seq()|_set_seq>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
839 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
840 #=cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
841
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
842 #--------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
843 sub _set_match_stats {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
844 #--------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
845 my ($self, $data) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
846
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
847 if($data =~ m!Identities = (\d+)/(\d+)!) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
848 # blast1 or 2 format
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
849 $self->{'_numIdentical'} = $1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
850 $self->{'_totalLength'} = $2;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
851 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
852
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
853 if($data =~ m!Positives = (\d+)/(\d+)!) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
854 # blast1 or 2 format
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
855 $self->{'_numConserved'} = $1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
856 $self->{'_totalLength'} = $2;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
857 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
858
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
859 if($data =~ m!Frame = ([\d+-]+)!) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
860 $self->frame($1);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
861 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
862
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
863 # Strand data is not always present in this line.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
864 # _set_seq() will also set strand information.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
865 if($data =~ m!Strand = (\w+) / (\w+)!) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
866 $self->{'_queryStrand'} = $1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
867 $self->{'_sbjctStrand'} = $2;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
868 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
869
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
870 # if($data =~ m!Gaps = (\d+)/(\d+)!) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
871 # $self->{'_totalGaps'} = $1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
872 # } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
873 # $self->{'_totalGaps'} = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
874 # }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
875 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
876
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
877
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
878
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
879 #=head2 _set_seq_data (Private method)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
880 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
881 # Usage : called automatically when sequence data is requested.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
882 # Purpose : Sets the HSP sequence data for both query and sbjct sequences.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
883 # : Includes: start, stop, length, gaps, and raw sequence.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
884 # Argument : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
885 # Throws : Propagates any exception thrown by _set_match_seq()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
886 # Comments : Uses raw data stored by _set_data() during object construction.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
887 # : These data are not always needed, so it is conditionally
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
888 # : executed only upon demand by methods such as gaps(), _set_residues(),
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
889 # : etc. _set_seq() does the dirty work.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
890 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
891 #See Also : L<_set_seq()|_set_seq>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
892 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
893 #=cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
894
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
895 #-----------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
896 sub _set_seq_data {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
897 #-----------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
898 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
899
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
900 $self->_set_seq('query', @{$self->{'_queryList'}});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
901 $self->_set_seq('sbjct', @{$self->{'_sbjctList'}});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
902
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
903 # Liberate some memory.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
904 @{$self->{'_queryList'}} = @{$self->{'_sbjctList'}} = ();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
905 undef $self->{'_queryList'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
906 undef $self->{'_sbjctList'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
907
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
908 $self->{'_set_seq_data'} = 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
909 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
910
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
911
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
912
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
913 #=head2 _set_seq (Private method)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
914 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
915 # Usage : called automatically by _set_seq_data()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
916 # : $hsp_obj->($seq_type, @data);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
917 # Purpose : Sets sequence information for both the query and sbjct sequences.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
918 # : Directly counts the number of gaps in each sequence (if gapped Blast).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
919 # Argument : $seq_type = 'query' or 'sbjct'
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
920 # : @data = all seq lines with the form:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
921 # : Query: 61 SPHNVKDRKEQNGSINNAISPTATANTSGSQQINIDSALRDRSSNVAAQPSLSDASSGSN 120
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
922 # Throws : Exception if data strings cannot be parsed, probably due to a change
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
923 # : in the Blast report format.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
924 # Comments : Uses first argument to determine which data members to set
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
925 # : making this method sensitive data member name changes.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
926 # : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
927 # Warning : Sequence endpoints are normalized so that start < end. This affects HSPs
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
928 # : for TBLASTN/X hits on the minus strand. Normalization facilitates use
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
929 # : of range information by methods such as match().
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
930 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
931 #See Also : L<_set_seq_data()|_set_seq_data>, L<matches()|matches>, L<range()|range>, L<start()|start>, L<end()|end>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
932 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
933 #=cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
934
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
935 #-------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
936 sub _set_seq {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
937 #-------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
938 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
939 my $seqType = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
940 my @data = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
941 my @ranges = ();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
942 my @sequence = ();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
943 my $numGaps = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
944
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
945 foreach( @data ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
946 if( m/(\d+) *([^\d\s]+) *(\d+)/) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
947 push @ranges, ( $1, $3 ) ;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
948 push @sequence, $2;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
949 #print STDERR "_set_seq found sequence \"$2\"\n";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
950 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
951 $self->warn("Bad sequence data: $_");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
952 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
953 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
954
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
955 if( !(scalar(@sequence) and scalar(@ranges))) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
956 my $id_str = $self->_id_str;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
957 $self->throw("Can't set sequence: missing data. Possibly unrecognized Blast format. ($id_str)");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
958 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
959
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
960 # Sensitive to member name changes.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
961 $seqType = "_\L$seqType\E";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
962 $self->{$seqType.'Start'} = $ranges[0];
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
963 $self->{$seqType.'Stop'} = $ranges[ $#ranges ];
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
964 $self->{$seqType.'Seq'} = \@sequence;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
965
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
966 $self->{$seqType.'Length'} = abs($ranges[ $#ranges ] - $ranges[0]) + 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
967
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
968 # Adjust lengths for BLASTX, TBLASTN, TBLASTX sequences
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
969 # Converting nucl coords to amino acid coords.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
970
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
971 my $prog = $self->algorithm;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
972 if($prog eq 'TBLASTN' and $seqType eq '_sbjct') {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
973 $self->{$seqType.'Length'} /= 3;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
974 } elsif($prog eq 'BLASTX' and $seqType eq '_query') {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
975 $self->{$seqType.'Length'} /= 3;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
976 } elsif($prog eq 'TBLASTX') {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
977 $self->{$seqType.'Length'} /= 3;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
978 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
979
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
980 if( $prog ne 'BLASTP' ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
981 $self->{$seqType.'Strand'} = 'Plus' if $prog =~ /BLASTN/;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
982 $self->{$seqType.'Strand'} = 'Plus' if ($prog =~ /BLASTX/ and $seqType eq '_query');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
983 # Normalize sequence endpoints so that start < end.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
984 # Reverse complement or 'minus strand' HSPs get flipped here.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
985 if($self->{$seqType.'Start'} > $self->{$seqType.'Stop'}) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
986 ($self->{$seqType.'Start'}, $self->{$seqType.'Stop'}) =
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
987 ($self->{$seqType.'Stop'}, $self->{$seqType.'Start'});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
988 $self->{$seqType.'Strand'} = 'Minus';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
989 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
990 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
991
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
992 ## Count number of gaps in each seq. Only need to do this for gapped Blasts.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
993 # if($self->{'_gapped'}) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
994 my $seqstr = join('', @sequence);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
995 $seqstr =~ s/\s//g;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
996 my $num_gaps = CORE::length($seqstr) - $self->{$seqType.'Length'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
997 $self->{$seqType.'Gaps'} = $num_gaps if $num_gaps > 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
998 # }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
999 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1000
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1001
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1002 #=head2 _set_residues (Private method)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1003 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1004 # Usage : called automatically when residue data is requested.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1005 # Purpose : Sets the residue numbers representing the identical and
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1006 # : conserved positions. These data are obtained by analyzing the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1007 # : symbols between query and sbjct lines of the alignments.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1008 # Argument : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1009 # Throws : Propagates any exception thrown by _set_seq_data() and _set_match_seq().
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1010 # Comments : These data are not always needed, so it is conditionally
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1011 # : executed only upon demand by methods such as seq_inds().
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1012 # : Behavior is dependent on the type of BLAST analysis (TBLASTN, BLASTP, etc).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1013 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1014 #See Also : L<_set_seq_data()|_set_seq_data>, L<_set_match_seq()|_set_match_seq>, seq_inds()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1015 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1016 #=cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1017
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1018 #------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1019 sub _set_residues {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1020 #------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1021 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1022 my @sequence = ();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1023
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1024 $self->_set_seq_data() unless $self->{'_set_seq_data'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1025
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1026 # Using hashes to avoid saving duplicate residue numbers.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1027 my %identicalList_query = ();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1028 my %identicalList_sbjct = ();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1029 my %conservedList_query = ();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1030 my %conservedList_sbjct = ();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1031
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1032 my $aref = $self->_set_match_seq() if not ref $self->{'_matchSeq'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1033 $aref ||= $self->{'_matchSeq'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1034 my $seqString = join('', @$aref );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1035
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1036 my $qseq = join('',@{$self->{'_querySeq'}});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1037 my $sseq = join('',@{$self->{'_sbjctSeq'}});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1038 my $resCount_query = $self->{'_queryStop'} || 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1039 my $resCount_sbjct = $self->{'_sbjctStop'} || 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1040
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1041 my $prog = $self->algorithm;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1042 if($prog !~ /^BLASTP|^BLASTN/) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1043 if($prog eq 'TBLASTN') {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1044 $resCount_sbjct /= 3;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1045 } elsif($prog eq 'BLASTX') {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1046 $resCount_query /= 3;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1047 } elsif($prog eq 'TBLASTX') {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1048 $resCount_query /= 3;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1049 $resCount_sbjct /= 3;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1050 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1051 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1052
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1053 my ($mchar, $schar, $qchar);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1054 while( $mchar = chop($seqString) ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1055 ($qchar, $schar) = (chop($qseq), chop($sseq));
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1056 if( $mchar eq '+' ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1057 $conservedList_query{ $resCount_query } = 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1058 $conservedList_sbjct{ $resCount_sbjct } = 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1059 } elsif( $mchar ne ' ' ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1060 $identicalList_query{ $resCount_query } = 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1061 $identicalList_sbjct{ $resCount_sbjct } = 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1062 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1063 $resCount_query-- if $qchar ne $GAP_SYMBOL;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1064 $resCount_sbjct-- if $schar ne $GAP_SYMBOL;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1065 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1066 $self->{'_identicalRes_query'} = \%identicalList_query;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1067 $self->{'_conservedRes_query'} = \%conservedList_query;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1068 $self->{'_identicalRes_sbjct'} = \%identicalList_sbjct;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1069 $self->{'_conservedRes_sbjct'} = \%conservedList_sbjct;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1070
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1071 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1072
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1073
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1074
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1075
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1076 #=head2 _set_match_seq (Private method)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1077 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1078 # Usage : $hsp_obj->_set_match_seq()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1079 # Purpose : Set the 'match' sequence for the current HSP (symbols in between
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1080 # : the query and sbjct lines.)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1081 # Returns : Array reference holding the match sequences lines.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1082 # Argument : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1083 # Throws : Exception if the _matchList field is not set.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1084 # Comments : The match information is not always necessary. This method
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1085 # : allows it to be conditionally prepared.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1086 # : Called by _set_residues>() and seq_str().
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1087 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1088 #See Also : L<_set_residues()|_set_residues>, L<seq_str()|seq_str>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1089 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1090 #=cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1091
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1092 #-------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1093 sub _set_match_seq {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1094 #-------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1095 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1096
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1097 if( ! ref($self->{'_matchList'}) ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1098 my $id_str = $self->_id_str;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1099 $self->throw("Can't set HSP match sequence: No data ($id_str)");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1100 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1101
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1102 my @data = @{$self->{'_matchList'}};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1103
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1104 my(@sequence);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1105 foreach( @data ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1106 chomp($_);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1107 ## Remove leading spaces; (note: aln may begin with a space
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1108 ## which is why we can't use s/^ +//).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1109 s/^ {$self->{'_match_indent'}}//;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1110 push @sequence, $_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1111 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1112 # Liberate some memory.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1113 @{$self->{'_matchList'}} = undef;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1114 $self->{'_matchList'} = undef;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1115
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1116 $self->{'_matchSeq'} = \@sequence;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1117
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1118 return $self->{'_matchSeq'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1119 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1120
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1121
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1122 =head2 n
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1123
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1124 Usage : $hsp_obj->n()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1125 Purpose : Get the N value (num HSPs on which P/Expect is based).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1126 : This value is not defined with NCBI Blast2 with gapping.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1127 Returns : Integer or null string if not defined.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1128 Argument : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1129 Throws : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1130 Comments : The 'N' value is listed in parenthesis with P/Expect value:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1131 : e.g., P(3) = 1.2e-30 ---> (N = 3).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1132 : Not defined in NCBI Blast2 with gaps.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1133 : This typically is equal to the number of HSPs but not always.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1134 : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1135
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1136 See Also : L<Bio::SeqFeature::SimilarityPair::score()|Bio::SeqFeature::SimilarityPair>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1137
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1138 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1139
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1140 #-----
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1141 sub n { my $self = shift; $self->{'_n'} || ''; }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1142 #-----
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1143
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1144
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1145 =head2 matches
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1146
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1147 Usage : $hsp->matches([seq_type], [start], [stop]);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1148 Purpose : Get the total number of identical and conservative matches
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1149 : in the query or sbjct sequence for the given HSP. Optionally can
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1150 : report data within a defined interval along the seq.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1151 : (Note: 'conservative' matches are called 'positives' in the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1152 : Blast report.)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1153 Example : ($id,$cons) = $hsp_object->matches('hit');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1154 : ($id,$cons) = $hsp_object->matches('query',300,400);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1155 Returns : 2-element array of integers
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1156 Argument : (1) seq_type = 'query' or 'hit' or 'sbjct' (default = query)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1157 : ('sbjct' is synonymous with 'hit')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1158 : (2) start = Starting coordinate (optional)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1159 : (3) stop = Ending coordinate (optional)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1160 Throws : Exception if the supplied coordinates are out of range.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1161 Comments : Relies on seq_str('match') to get the string of alignment symbols
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1162 : between the query and sbjct lines which are used for determining
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1163 : the number of identical and conservative matches.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1164
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1165 See Also : L<length()|length>, L<gaps()|gaps>, L<seq_str()|seq_str>, L<Bio::Search::Hit::BlastHit::_adjust_contigs()|Bio::Search::Hit::BlastHit>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1166
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1167 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1168
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1169 #-----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1170 sub matches {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1171 #-----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1172 my( $self, %param ) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1173 my(@data);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1174 my($seqType, $beg, $end) = ($param{-SEQ}, $param{-START}, $param{-STOP});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1175 $seqType ||= 'query';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1176 $seqType = 'sbjct' if $seqType eq 'hit';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1177
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1178 my($start,$stop);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1179
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1180 if(!defined $beg && !defined $end) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1181 ## Get data for the whole alignment.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1182 push @data, ($self->{'_numIdentical'}, $self->{'_numConserved'});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1183 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1184 ## Get the substring representing the desired sub-section of aln.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1185 $beg ||= 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1186 $end ||= 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1187 ($start,$stop) = $self->range($seqType);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1188 if($beg == 0) { $beg = $start; $end = $beg+$end; }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1189 elsif($end == 0) { $end = $stop; $beg = $end-$beg; }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1190
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1191 if($end >= $stop) { $end = $stop; } ##ML changed from if (end >stop)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1192 else { $end += 1;} ##ML moved from commented position below, makes
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1193 ##more sense here
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1194 # if($end > $stop) { $end = $stop; }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1195 if($beg < $start) { $beg = $start; }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1196 # else { $end += 1;}
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1197
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1198 # my $seq = substr($self->seq_str('match'), $beg-$start, ($end-$beg));
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1199
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1200 ## ML: START fix for substr out of range error ------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1201 my $seq = "";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1202 my $prog = $self->algorithm;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1203 if (($prog eq 'TBLASTN') and ($seqType eq 'sbjct'))
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1204 {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1205 $seq = substr($self->seq_str('match'),
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1206 int(($beg-$start)/3), int(($end-$beg+1)/3));
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1207
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1208 } elsif (($prog eq 'BLASTX') and ($seqType eq 'query'))
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1209 {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1210 $seq = substr($self->seq_str('match'),
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1211 int(($beg-$start)/3), int(($end-$beg+1)/3));
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1212 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1213 $seq = substr($self->seq_str('match'),
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1214 $beg-$start, ($end-$beg));
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1215 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1216 ## ML: End of fix for substr out of range error -----------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1217
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1218
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1219 ## ML: debugging code
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1220 ## This is where we get our exception. Try printing out the values going
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1221 ## into this:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1222 ##
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1223 # print STDERR
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1224 # qq(*------------MY EXCEPTION --------------------\nSeq: ") ,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1225 # $self->seq_str("$seqType"), qq("\n),$self->rank,",( index:";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1226 # print STDERR $beg-$start, ", len: ", $end-$beg," ), (HSPRealLen:",
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1227 # CORE::length $self->seq_str("$seqType");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1228 # print STDERR ", HSPCalcLen: ", $stop - $start +1 ," ),
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1229 # ( beg: $beg, end: $end ), ( start: $start, stop: stop )\n";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1230 ## ML: END DEBUGGING CODE----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1231
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1232 if(!CORE::length $seq) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1233 my $id_str = $self->_id_str;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1234 $self->throw("Undefined $seqType sub-sequence ($beg,$end). Valid range = $start - $stop ($id_str)");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1235 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1236 ## Get data for a substring.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1237 # printf "Collecting HSP subsection data: beg,end = %d,%d; start,stop = %d,%d\n%s<---\n", $beg, $end, $start, $stop, $seq;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1238 # printf "Original match seq:\n%s\n",$self->seq_str('match');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1239 $seq =~ s/ //g; # remove space (no info).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1240 my $len_cons = CORE::length $seq;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1241 $seq =~ s/\+//g; # remove '+' characters (conservative substitutions)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1242 my $len_id = CORE::length $seq;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1243 push @data, ($len_id, $len_cons);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1244 # printf " HSP = %s\n id = %d; cons = %d\n", $self->rank, $len_id, $len_cons; <STDIN>;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1245 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1246 @data;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1247 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1248
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1249
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1250 =head2 num_identical
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1251
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1252 Usage : $hsp_object->num_identical();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1253 Purpose : Get the number of identical positions within the given HSP.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1254 Example : $num_iden = $hsp_object->num_identical();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1255 Returns : integer
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1256 Argument : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1257 Throws : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1258
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1259 See Also : L<num_conserved()|num_conserved>, L<frac_identical()|frac_identical>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1260
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1261 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1262
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1263 #-------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1264 sub num_identical {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1265 #-------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1266 my( $self) = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1267
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1268 $self->{'_numIdentical'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1269 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1270
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1271
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1272 =head2 num_conserved
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1273
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1274 Usage : $hsp_object->num_conserved();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1275 Purpose : Get the number of conserved positions within the given HSP.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1276 Example : $num_iden = $hsp_object->num_conserved();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1277 Returns : integer
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1278 Argument : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1279 Throws : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1280
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1281 See Also : L<num_identical()|num_identical>, L<frac_conserved()|frac_conserved>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1282
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1283 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1284
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1285 #-------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1286 sub num_conserved {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1287 #-------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1288 my( $self) = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1289
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1290 $self->{'_numConserved'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1291 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1292
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1293
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1294
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1295 =head2 range
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1296
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1297 Usage : $hsp->range( [seq_type] );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1298 Purpose : Gets the (start, end) coordinates for the query or sbjct sequence
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1299 : in the HSP alignment.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1300 Example : ($query_beg, $query_end) = $hsp->range('query');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1301 : ($hit_beg, $hit_end) = $hsp->range('hit');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1302 Returns : Two-element array of integers
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1303 Argument : seq_type = string, 'query' or 'hit' or 'sbjct' (default = 'query')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1304 : ('sbjct' is synonymous with 'hit')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1305 Throws : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1306
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1307 See Also : L<start()|start>, L<end()|end>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1308
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1309 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1310
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1311 #----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1312 sub range {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1313 #----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1314 my ($self, $seqType) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1315
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1316 $self->_set_seq_data() unless $self->{'_set_seq_data'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1317
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1318 $seqType ||= 'query';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1319 $seqType = 'sbjct' if $seqType eq 'hit';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1320
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1321 ## Sensitive to member name changes.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1322 $seqType = "_\L$seqType\E";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1323
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1324 return ($self->{$seqType.'Start'},$self->{$seqType.'Stop'});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1325 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1326
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1327 =head2 start
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1328
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1329 Usage : $hsp->start( [seq_type] );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1330 Purpose : Gets the start coordinate for the query, sbjct, or both sequences
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1331 : in the HSP alignment.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1332 : NOTE: Start will always be less than end.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1333 : To determine strand, use $hsp->strand()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1334 Example : $query_beg = $hsp->start('query');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1335 : $hit_beg = $hsp->start('hit');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1336 : ($query_beg, $hit_beg) = $hsp->start();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1337 Returns : scalar context: integer
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1338 : array context without args: list of two integers
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1339 Argument : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1340 : ('sbjct' is synonymous with 'hit')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1341 : Array context can be "induced" by providing an argument of 'list' or 'array'.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1342 Throws : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1343
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1344 See Also : L<end()|end>, L<range()|range>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1345
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1346 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1347
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1348 #----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1349 sub start {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1350 #----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1351 my ($self, $seqType) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1352
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1353 $seqType ||= (wantarray ? 'list' : 'query');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1354 $seqType = 'sbjct' if $seqType eq 'hit';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1355
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1356 $self->_set_seq_data() unless $self->{'_set_seq_data'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1357
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1358 if($seqType =~ /list|array/i) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1359 return ($self->{'_queryStart'}, $self->{'_sbjctStart'});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1360 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1361 ## Sensitive to member name changes.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1362 $seqType = "_\L$seqType\E";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1363 return $self->{$seqType.'Start'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1364 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1365 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1366
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1367 =head2 end
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1368
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1369 Usage : $hsp->end( [seq_type] );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1370 Purpose : Gets the end coordinate for the query, sbjct, or both sequences
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1371 : in the HSP alignment.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1372 : NOTE: Start will always be less than end.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1373 : To determine strand, use $hsp->strand()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1374 Example : $query_end = $hsp->end('query');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1375 : $hit_end = $hsp->end('hit');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1376 : ($query_end, $hit_end) = $hsp->end();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1377 Returns : scalar context: integer
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1378 : array context without args: list of two integers
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1379 Argument : In scalar context: seq_type = 'query' or 'hit' or 'sbjct' (default= 'query')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1380 : ('sbjct' is synonymous with 'hit')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1381 : Array context can be "induced" by providing an argument of 'list' or 'array'.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1382 Throws : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1383
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1384 See Also : L<start()|start>, L<range()|range>, L<strand()|strand>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1385
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1386 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1387
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1388 #----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1389 sub end {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1390 #----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1391 my ($self, $seqType) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1392
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1393 $seqType ||= (wantarray ? 'list' : 'query');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1394 $seqType = 'sbjct' if $seqType eq 'hit';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1395
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1396 $self->_set_seq_data() unless $self->{'_set_seq_data'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1397
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1398 if($seqType =~ /list|array/i) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1399 return ($self->{'_queryStop'}, $self->{'_sbjctStop'});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1400 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1401 ## Sensitive to member name changes.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1402 $seqType = "_\L$seqType\E";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1403 return $self->{$seqType.'Stop'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1404 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1405 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1406
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1407
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1408
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1409 =head2 strand
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1410
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1411 Usage : $hsp_object->strand( [seq_type] )
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1412 Purpose : Get the strand of the query or sbjct sequence.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1413 Example : print $hsp->strand('query');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1414 : ($query_strand, $hit_strand) = $hsp->strand();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1415 Returns : -1, 0, or 1
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1416 : -1 = Minus strand, +1 = Plus strand
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1417 : Returns 0 if strand is not defined, which occurs
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1418 : for BLASTP reports, and the query of TBLASTN
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1419 : as well as the hit if BLASTX reports.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1420 : In scalar context without arguments, returns queryStrand value.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1421 : In array context without arguments, returns a two-element list
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1422 : of strings (queryStrand, sbjctStrand).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1423 : Array context can be "induced" by providing an argument of 'list' or 'array'.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1424 Argument : seq_type: 'query' or 'hit' or 'sbjct' or undef
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1425 : ('sbjct' is synonymous with 'hit')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1426 Throws : n/a
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1427
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1428 See Also : B<_set_seq()>, B<_set_match_stats()>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1429
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1430 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1431
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1432 #-----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1433 sub strand {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1434 #-----------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1435 my( $self, $seqType ) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1436
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1437 # Hack to deal with the fact that SimilarityPair calls strand()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1438 # which will lead to an error because parsing hasn't yet occurred.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1439 # See SimilarityPair::new().
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1440 return if $self->{'_initializing'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1441
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1442 $seqType ||= (wantarray ? 'list' : 'query');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1443 $seqType = 'sbjct' if $seqType eq 'hit';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1444
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1445 ## Sensitive to member name format.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1446 $seqType = "_\L$seqType\E";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1447
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1448 # $seqType could be '_list'.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1449 $self->{'_queryStrand'} or $self->_set_seq_data() unless $self->{'_set_seq_data'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1450
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1451 my $prog = $self->algorithm;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1452
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1453 if($seqType =~ /list|array/i) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1454 my ($qstr, $hstr);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1455 if( $prog eq 'BLASTP') {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1456 $qstr = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1457 $hstr = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1458 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1459 elsif( $prog eq 'TBLASTN') {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1460 $qstr = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1461 $hstr = $STRAND_SYMBOL{$self->{'_sbjctStrand'}};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1462 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1463 elsif( $prog eq 'BLASTX') {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1464 $qstr = $STRAND_SYMBOL{$self->{'_queryStrand'}};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1465 $hstr = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1466 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1467 else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1468 $qstr = $STRAND_SYMBOL{$self->{'_queryStrand'}} if defined $self->{'_queryStrand'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1469 $hstr = $STRAND_SYMBOL{$self->{'_sbjctStrand'}} if defined $self->{'_sbjctStrand'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1470 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1471 $qstr ||= 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1472 $hstr ||= 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1473 return ($qstr, $hstr);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1474 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1475 local $^W = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1476 $STRAND_SYMBOL{$self->{$seqType.'Strand'}} || 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1477 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1478
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1479
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1480 =head2 seq
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1481
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1482 Usage : $hsp->seq( [seq_type] );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1483 Purpose : Get the query or sbjct sequence as a Bio::Seq.pm object.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1484 Example : $seqObj = $hsp->seq('query');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1485 Returns : Object reference for a Bio::Seq.pm object.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1486 Argument : seq_type = 'query' or 'hit' or 'sbjct' (default = 'query').
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1487 : ('sbjct' is synonymous with 'hit')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1488 Throws : Propagates any exception that occurs during construction
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1489 : of the Bio::Seq.pm object.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1490 Comments : The sequence is returned in an array of strings corresponding
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1491 : to the strings in the original format of the Blast alignment.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1492 : (i.e., same spacing).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1493
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1494 See Also : L<seq_str()|seq_str>, L<seq_inds()|seq_inds>, B<Bio::Seq>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1495
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1496 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1497
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1498 #-------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1499 sub seq {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1500 #-------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1501 my($self,$seqType) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1502 $seqType ||= 'query';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1503 $seqType = 'sbjct' if $seqType eq 'hit';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1504 my $str = $self->seq_str($seqType);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1505
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1506 require Bio::Seq;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1507
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1508 new Bio::Seq (-ID => $self->to_string,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1509 -SEQ => $str,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1510 -DESC => "$seqType sequence",
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1511 );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1512 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1513
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1514 =head2 seq_str
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1515
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1516 Usage : $hsp->seq_str( seq_type );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1517 Purpose : Get the full query, sbjct, or 'match' sequence as a string.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1518 : The 'match' sequence is the string of symbols in between the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1519 : query and sbjct sequences.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1520 Example : $str = $hsp->seq_str('query');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1521 Returns : String
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1522 Argument : seq_Type = 'query' or 'hit' or 'sbjct' or 'match'
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1523 : ('sbjct' is synonymous with 'hit')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1524 Throws : Exception if the argument does not match an accepted seq_type.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1525 Comments : Calls _set_seq_data() to set the 'match' sequence if it has
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1526 : not been set already.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1527
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1528 See Also : L<seq()|seq>, L<seq_inds()|seq_inds>, B<_set_match_seq()>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1529
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1530 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1531
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1532 #------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1533 sub seq_str {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1534 #------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1535 my($self,$seqType) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1536
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1537 $seqType ||= 'query';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1538 $seqType = 'sbjct' if $seqType eq 'hit';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1539 ## Sensitive to member name changes.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1540 $seqType = "_\L$seqType\E";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1541
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1542 $self->_set_seq_data() unless $self->{'_set_seq_data'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1543
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1544 if($seqType =~ /sbjct|query/) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1545 my $seq = join('',@{$self->{$seqType.'Seq'}});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1546 $seq =~ s/\s+//g;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1547 return $seq;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1548
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1549 } elsif( $seqType =~ /match/i) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1550 # Only need to call _set_match_seq() if the match seq is requested.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1551 my $aref = $self->_set_match_seq() unless ref $self->{'_matchSeq'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1552 $aref = $self->{'_matchSeq'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1553
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1554 return join('',@$aref);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1555
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1556 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1557 my $id_str = $self->_id_str;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1558 $self->throw(-class => 'Bio::Root::BadParameter',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1559 -text => "Invalid or undefined sequence type: $seqType ($id_str)\n" .
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1560 "Valid types: query, sbjct, match",
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1561 -value => $seqType);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1562 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1563 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1564
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1565 =head2 seq_inds
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1566
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1567 Usage : $hsp->seq_inds( seq_type, class, collapse );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1568 Purpose : Get a list of residue positions (indices) for all identical
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1569 : or conserved residues in the query or sbjct sequence.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1570 Example : @s_ind = $hsp->seq_inds('query', 'identical');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1571 : @h_ind = $hsp->seq_inds('hit', 'conserved');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1572 : @h_ind = $hsp->seq_inds('hit', 'conserved', 1);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1573 Returns : List of integers
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1574 : May include ranges if collapse is true.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1575 Argument : seq_type = 'query' or 'hit' or 'sbjct' (default = query)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1576 : ('sbjct' is synonymous with 'hit')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1577 : class = 'identical' or 'conserved' (default = identical)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1578 : (can be shortened to 'id' or 'cons')
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1579 : (actually, anything not 'id' will evaluate to 'conserved').
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1580 : collapse = boolean, if true, consecutive positions are merged
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1581 : using a range notation, e.g., "1 2 3 4 5 7 9 10 11"
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1582 : collapses to "1-5 7 9-11". This is useful for
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1583 : consolidating long lists. Default = no collapse.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1584 Throws : n/a.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1585 Comments : Calls _set_residues() to set the 'match' sequence if it has
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1586 : not been set already.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1587
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1588 See Also : L<seq()|seq>, B<_set_residues()>, L<Bio::Search::BlastUtils::collapse_nums()|Bio::Search::BlastUtils>, L<Bio::Search::Hit::BlastHit::seq_inds()|Bio::Search::Hit::BlastHit>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1589
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1590 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1591
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1592 #---------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1593 sub seq_inds {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1594 #---------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1595 my ($self, $seqType, $class, $collapse) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1596
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1597 $seqType ||= 'query';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1598 $class ||= 'identical';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1599 $collapse ||= 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1600 $seqType = 'sbjct' if $seqType eq 'hit';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1601
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1602 $self->_set_residues() unless defined $self->{'_identicalRes_query'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1603
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1604 $seqType = ($seqType !~ /^q/i ? 'sbjct' : 'query');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1605 $class = ($class !~ /^id/i ? 'conserved' : 'identical');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1606
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1607 ## Sensitive to member name changes.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1608 $seqType = "_\L$seqType\E";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1609 $class = "_\L$class\E";
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1610
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1611 my @ary = sort { $a <=> $b } keys %{ $self->{"${class}Res$seqType"}};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1612
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1613 require Bio::Search::BlastUtils if $collapse;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1614
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1615 return $collapse ? &Bio::Search::BlastUtils::collapse_nums(@ary) : @ary;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1616 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1617
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1618
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1619 =head2 get_aln
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1620
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1621 Usage : $hsp->get_aln()
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1622 Purpose : Get a Bio::SimpleAlign object constructed from the query + sbjct
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1623 : sequences of the present HSP object.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1624 Example : $aln_obj = $hsp->get_aln();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1625 Returns : Object reference for a Bio::SimpleAlign.pm object.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1626 Argument : n/a.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1627 Throws : Propagates any exception ocurring during the construction of
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1628 : the Bio::SimpleAlign object.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1629 Comments : Requires Bio::SimpleAlign.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1630 : The Bio::SimpleAlign object is constructed from the query + sbjct
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1631 : sequence objects obtained by calling seq().
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1632 : Gap residues are included (see $GAP_SYMBOL).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1633
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1634 See Also : L<seq()|seq>, L<Bio::SimpleAlign>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1635
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1636 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1637
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1638 #------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1639 sub get_aln {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1640 #------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1641 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1642
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1643 require Bio::SimpleAlign;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1644 require Bio::LocatableSeq;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1645 my $qseq = $self->seq('query');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1646 my $sseq = $self->seq('sbjct');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1647
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1648 my $type = $self->algorithm =~ /P$|^T/ ? 'amino' : 'dna';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1649 my $aln = new Bio::SimpleAlign();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1650 $aln->add_seq(new Bio::LocatableSeq(-seq => $qseq->seq(),
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1651 -id => 'query_'.$qseq->display_id(),
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1652 -start => 1,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1653 -end => CORE::length($qseq)));
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1654
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1655 $aln->add_seq(new Bio::LocatableSeq(-seq => $sseq->seq(),
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1656 -id => 'hit_'.$sseq->display_id(),
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1657 -start => 1,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1658 -end => CORE::length($sseq)));
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1659
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1660 return $aln;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1661 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1662
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1663
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1664 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1665 __END__
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1666
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1667
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1668 =head1 FOR DEVELOPERS ONLY
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1669
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1670 =head2 Data Members
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1671
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1672 Information about the various data members of this module is provided for those
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1673 wishing to modify or understand the code. Two things to bear in mind:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1674
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1675 =over 4
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1676
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1677 =item 1 Do NOT rely on these in any code outside of this module.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1678
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1679 All data members are prefixed with an underscore to signify that they are private.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1680 Always use accessor methods. If the accessor doesn't exist or is inadequate,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1681 create or modify an accessor (and let me know, too!).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1682
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1683 =item 2 This documentation may be incomplete and out of date.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1684
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1685 It is easy for these data member descriptions to become obsolete as
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1686 this module is still evolving. Always double check this info and search
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1687 for members not described here.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1688
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1689 =back
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1690
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1691 An instance of Bio::Search::HSP::BlastHSP.pm is a blessed reference to a hash containing
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1692 all or some of the following fields:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1693
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1694 FIELD VALUE
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1695 --------------------------------------------------------------
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1696 (member names are mostly self-explanatory)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1697
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1698 _score :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1699 _bits :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1700 _p :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1701 _n : Integer. The 'N' value listed in parenthesis with P/Expect value:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1702 : e.g., P(3) = 1.2e-30 ---> (N = 3).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1703 : Not defined in NCBI Blast2 with gaps.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1704 : To obtain the number of HSPs, use Bio::Search::Hit::BlastHit::num_hsps().
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1705 _expect :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1706 _queryLength :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1707 _queryGaps :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1708 _queryStart :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1709 _queryStop :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1710 _querySeq :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1711 _sbjctLength :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1712 _sbjctGaps :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1713 _sbjctStart :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1714 _sbjctStop :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1715 _sbjctSeq :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1716 _matchSeq : String. Contains the symbols between the query and sbjct lines
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1717 which indicate identical (letter) and conserved ('+') matches
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1718 or a mismatch (' ').
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1719 _numIdentical :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1720 _numConserved :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1721 _identicalRes_query :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1722 _identicalRes_sbjct :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1723 _conservedRes_query :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1724 _conservedRes_sbjct :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1725 _match_indent : The number of leading space characters on each line containing
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1726 the match symbols. _match_indent is 13 in this example:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1727 Query: 285 QNSAPWGLARISHRERLNLGSFNKYLYDDDAG
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1728 Q +APWGLARIS G+ + Y YD+ AG
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1729 ^^^^^^^^^^^^^
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1730
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1731
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1732 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1733
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1734 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1735