annotate variant_effect_predictor/Bio/SearchIO/blastxml.pm @ 0:2bc9b66ada89 draft default tip

Uploaded
author mahtabm
date Thu, 11 Apr 2013 06:29:17 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
1 # $Id: blastxml.pm,v 1.24 2002/10/26 09:32:16 sac Exp $
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
2 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
3 # BioPerl module for Bio::SearchIO::blastxml
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
4 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
5 # Cared for by Jason Stajich <jason@bioperl.org>
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
6 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
7 # Copyright Jason Stajich
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
8 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
9 # You may distribute this module under the same terms as perl itself
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
10
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
11 # POD documentation - main docs before the code
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
12
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
13 =head1 NAME
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
14
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
15 Bio::SearchIO::blastxml - A SearchIO implementation of NCBI Blast XML parsing.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
16
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
17 =head1 SYNOPSIS
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
18
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
19 use Bio::SearchIO;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
20 my $searchin = new Bio::SearchIO(-format => 'blastxml',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
21 -file => 't/data/plague_yeast.bls.xml');
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
22 while( my $result = $searchin->next_result ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
23 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
24
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
25 # one can also request that the parser NOT keep the XML data in memory
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
26 # by using the tempfile initialization flag.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
27 my $searchin = new Bio::SearchIO(-tempfile => 1,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
28 -format => 'blastxml',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
29 -file => 't/data/plague_yeast.bls.xml');
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
30 while( my $result = $searchin->next_result ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
31 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
32
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
33 =head1 DESCRIPTION
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
34
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
35 This object implements a NCBI Blast XML parser.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
36
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
37 There is one additional initialization flag from the SearchIO defaults
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
38 - that is the -tempfile flag. If specified as true, then the parser
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
39 will write out each report to a temporary filehandle rather than
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
40 holding the entire report as a string in memory. The reason this is
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
41 done in the first place is NCBI reports have an uncessary E<lt>?xml
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
42 version="1.0"?E<gt> at the beginning of each report and RPS-BLAST reports
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
43 have an additional unecessary RPS-BLAST tag at the top of each report.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
44 So we currently have implemented the work around by preparsing the
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
45 file (yes it makes the process slower, but it works).
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
46
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
47
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
48 =head1 FEEDBACK
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
49
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
50 =head2 Mailing Lists
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
51
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
52 User feedback is an integral part of the evolution of this and other
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
53 Bioperl modules. Send your comments and suggestions preferably to
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
54 the Bioperl mailing list. Your participation is much appreciated.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
55
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
56 bioperl-l@bioperl.org - General discussion
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
57 http://bioperl.org/MailList.shtml - About the mailing lists
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
58
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
59 =head2 Reporting Bugs
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
60
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
61 Report bugs to the Bioperl bug tracking system to help us keep track
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
62 of the bugs and their resolution. Bug reports can be submitted via
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
63 email or the web:
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
64
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
65 bioperl-bugs@bioperl.org
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
66 http://bugzilla.bioperl.org/
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
67
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
68 =head1 AUTHOR - Jason Stajich
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
69
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
70 Email jason@bioperl.org
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
71
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
72 Describe contact details here
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
73
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
74 =head1 CONTRIBUTORS
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
75
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
76 Additional contributors names and emails here
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
77
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
78 =head1 APPENDIX
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
79
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
80 The rest of the documentation details each of the object methods.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
81 Internal methods are usually preceded with a _
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
82
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
83 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
84
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
85 # Let the code begin...
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
86
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
87 package Bio::SearchIO::blastxml;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
88 use vars qw(@ISA $DTD %MAPPING %MODEMAP $DEBUG);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
89 use strict;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
90
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
91 $DTD = 'ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NCBI_BlastOutput.dtd';
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
92 # Object preamble - inherits from Bio::Root::Root
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
93
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
94 use Bio::Root::Root;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
95 use Bio::SearchIO;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
96 use XML::Parser::PerlSAX;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
97 use XML::Handler::Subs;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
98 use HTML::Entities;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
99 use IO::File;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
100
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
101
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
102 BEGIN {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
103 # mapping of NCBI Blast terms to Bioperl hash keys
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
104 %MODEMAP = ('BlastOutput' => 'result',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
105 'Hit' => 'hit',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
106 'Hsp' => 'hsp'
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
107 );
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
108
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
109 %MAPPING = (
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
110 # HSP specific fields
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
111 'Hsp_bit-score' => 'HSP-bits',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
112 'Hsp_score' => 'HSP-score',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
113 'Hsp_evalue' => 'HSP-evalue',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
114 'Hsp_query-from' => 'HSP-query_start',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
115 'Hsp_query-to' => 'HSP-query_end',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
116 'Hsp_hit-from' => 'HSP-hit_start',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
117 'Hsp_hit-to' => 'HSP-hit_end',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
118 'Hsp_positive' => 'HSP-conserved',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
119 'Hsp_identity' => 'HSP-identical',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
120 'Hsp_gaps' => 'HSP-gaps',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
121 'Hsp_hitgaps' => 'HSP-hit_gaps',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
122 'Hsp_querygaps' => 'HSP-query_gaps',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
123 'Hsp_qseq' => 'HSP-query_seq',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
124 'Hsp_hseq' => 'HSP-hit_seq',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
125 'Hsp_midline' => 'HSP-homology_seq',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
126 'Hsp_align-len' => 'HSP-hsp_length',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
127 'Hsp_query-frame'=> 'HSP-query_frame',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
128 'Hsp_hit-frame' => 'HSP-hit_frame',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
129
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
130 # these are ignored for now
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
131 'Hsp_num' => 'HSP-order',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
132 'Hsp_pattern-from' => 'patternend',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
133 'Hsp_pattern-to' => 'patternstart',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
134 'Hsp_density' => 'hspdensity',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
135
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
136 # Hit specific fields
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
137 'Hit_id' => 'HIT-name',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
138 'Hit_len' => 'HIT-length',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
139 'Hit_accession' => 'HIT-accession',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
140 'Hit_def' => 'HIT-description',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
141 'Hit_num' => 'HIT-order',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
142 'Iteration_iter-num' => 'HIT-iteration',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
143 'Iteration_stat' => 'HIT-iteration_statistic',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
144
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
145 'BlastOutput_program' => 'RESULT-algorithm_name',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
146 'BlastOutput_version' => 'RESULT-algorithm_version',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
147 'BlastOutput_query-def' => 'RESULT-query_description',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
148 'BlastOutput_query-len' => 'RESULT-query_length',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
149 'BlastOutput_db' => 'RESULT-database_name',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
150 'BlastOutput_reference' => 'RESULT-program_reference',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
151 'BlastOutput_query-ID' => 'runid',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
152
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
153 'Parameters_matrix' => { 'RESULT-parameters' => 'matrix'},
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
154 'Parameters_expect' => { 'RESULT-parameters' => 'expect'},
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
155 'Parameters_include' => { 'RESULT-parameters' => 'include'},
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
156 'Parameters_sc-match' => { 'RESULT-parameters' => 'match'},
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
157 'Parameters_sc-mismatch' => { 'RESULT-parameters' => 'mismatch'},
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
158 'Parameters_gap-open' => { 'RESULT-parameters' => 'gapopen'},
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
159 'Parameters_gap-extend'=> { 'RESULT-parameters' => 'gapext'},
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
160 'Parameters_filter' => {'RESULT-parameters' => 'filter'},
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
161 'Statistics_db-num' => 'RESULT-database_entries',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
162 'Statistics_db-len' => 'RESULT-database_letters',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
163 'Statistics_hsp-len' => { 'RESULT-statistics' => 'hsplength'},
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
164 'Statistics_eff-space' => { 'RESULT-statistics' => 'effectivespace'},
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
165 'Statistics_kappa' => { 'RESULT-statistics' => 'kappa' },
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
166 'Statistics_lambda' => { 'RESULT-statistics' => 'lambda' },
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
167 'Statistics_entropy' => { 'RESULT-statistics' => 'entropy'},
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
168 );
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
169 eval { require Time::HiRes };
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
170 if( $@ ) { $DEBUG = 0; }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
171 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
172
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
173
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
174 @ISA = qw(Bio::SearchIO );
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
175
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
176 =head2 new
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
177
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
178 Title : new
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
179 Usage : my $searchio = new Bio::SearchIO(-format => 'blastxml',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
180 -file => 'filename',
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
181 -tempfile => 1);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
182 Function: Initializes the object - this is chained through new in SearchIO
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
183 Returns : Bio::SearchIO::blastxml object
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
184 Args : One additional argument from the format and file/fh parameters.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
185 -tempfile => boolean. Defaults to false. Write out XML data
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
186 to a temporary filehandle to send to
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
187 PerlSAX parser.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
188 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
189
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
190 =head2 _initialize
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
191
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
192 Title : _initialize
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
193 Usage : private
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
194 Function: Initializes the object - this is chained through new in SearchIO
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
195
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
196 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
197
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
198 sub _initialize{
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
199 my ($self,@args) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
200 $self->SUPER::_initialize(@args);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
201 my ($usetempfile) = $self->_rearrange([qw(TEMPFILE)],@args);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
202 defined $usetempfile && $self->use_tempfile($usetempfile);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
203 $self->{'_xmlparser'} = new XML::Parser::PerlSAX();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
204 $DEBUG = 1 if( ! defined $DEBUG && $self->verbose > 0);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
205 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
206
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
207 =head2 next_result
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
208
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
209 Title : next_result
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
210 Usage : my $hit = $searchio->next_result;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
211 Function: Returns the next Result from a search
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
212 Returns : Bio::Search::Result::ResultI object
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
213 Args : none
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
214
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
215 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
216
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
217 sub next_result {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
218 my ($self) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
219
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
220 my $data = '';
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
221 my $firstline = 1;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
222 my ($tfh);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
223 if( $self->use_tempfile ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
224 $tfh = IO::File->new_tmpfile or $self->throw("Unable to open temp file: $!");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
225 $tfh->autoflush(1);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
226 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
227 my $okaytoprocess;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
228 while( defined( $_ = $self->_readline) ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
229 if( /^RPS-BLAST/i ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
230 $self->{'_type'} = 'RPSBLAST';
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
231 next;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
232 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
233 if( /^<\?xml version/ && ! $firstline) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
234 $self->_pushback($_);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
235 last;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
236 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
237 $_ = decode_entities($_);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
238 # s/\&apos;/\`/g;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
239 # s/\&gt;/\>/g;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
240 # s/\&lt;/\</g;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
241 $okaytoprocess = 1;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
242 if( defined $tfh ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
243 print $tfh $_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
244 } else {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
245 $data .= $_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
246 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
247 $firstline = 0;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
248 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
249
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
250 return undef unless( $okaytoprocess);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
251
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
252 my %parser_args;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
253 if( defined $tfh ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
254 seek($tfh,0,0);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
255 %parser_args = ('Source' => { 'ByteStream' => $tfh },
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
256 'Handler' => $self);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
257 } else {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
258 %parser_args = ('Source' => { 'String' => $data },
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
259 'Handler' => $self);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
260 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
261 my $result;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
262 my $starttime;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
263 if( $DEBUG ) { $starttime = [ Time::HiRes::gettimeofday() ]; }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
264
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
265 eval {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
266 $result = $self->{'_xmlparser'}->parse(%parser_args);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
267 $self->{'_result_count'}++;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
268 };
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
269 if( $@ ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
270 $self->warn("error in parsing a report:\n $@");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
271 $result = undef;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
272 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
273 if( $DEBUG ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
274 $self->debug( sprintf("parsing took %f seconds\n", Time::HiRes::tv_interval($starttime)));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
275 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
276 # parsing magic here - but we call event handlers rather than
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
277 # instantiating things
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
278 return $result;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
279 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
280
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
281 =head2 SAX methods
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
282
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
283 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
284
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
285 =head2 start_document
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
286
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
287 Title : start_document
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
288 Usage : $parser->start_document;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
289 Function: SAX method to indicate starting to parse a new document
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
290 Returns : none
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
291 Args : none
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
292
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
293
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
294 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
295
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
296 sub start_document{
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
297 my ($self) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
298 $self->{'_lasttype'} = '';
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
299 $self->{'_values'} = {};
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
300 $self->{'_result'}= undef;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
301 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
302
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
303 =head2 end_document
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
304
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
305 Title : end_document
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
306 Usage : $parser->end_document;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
307 Function: SAX method to indicate finishing parsing a new document
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
308 Returns : Bio::Search::Result::ResultI object
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
309 Args : none
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
310
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
311 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
312
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
313 sub end_document{
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
314 my ($self,@args) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
315 return $self->{'_result'};
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
316 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
317
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
318 =head2 start_element
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
319
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
320 Title : start_element
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
321 Usage : $parser->start_element($data)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
322 Function: SAX method to indicate starting a new element
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
323 Returns : none
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
324 Args : hash ref for data
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
325
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
326 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
327
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
328 sub start_element{
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
329 my ($self,$data) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
330 # we currently don't care about attributes
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
331 my $nm = $data->{'Name'};
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
332
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
333 if( my $type = $MODEMAP{$nm} ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
334 if( $self->_eventHandler->will_handle($type) ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
335 my $func = sprintf("start_%s",lc $type);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
336 $self->_eventHandler->$func($data->{'Attributes'});
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
337 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
338 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
339
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
340 if($nm eq 'BlastOutput') {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
341 $self->{'_values'} = {};
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
342 $self->{'_result'}= undef;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
343 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
344 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
345
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
346 =head2 end_element
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
347
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
348 Title : end_element
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
349 Usage : $parser->end_element($data)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
350 Function: Signals finishing an element
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
351 Returns : Bio::Search object dpending on what type of element
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
352 Args : hash ref for data
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
353
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
354 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
355
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
356 sub end_element{
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
357 my ($self,$data) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
358
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
359 my $nm = $data->{'Name'};
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
360 my $rc;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
361 if($nm eq 'BlastOutput_program' &&
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
362 $self->{'_last_data'} =~ /(t?blast[npx])/i ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
363 $self->{'_type'} = uc $1;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
364 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
365
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
366 if( my $type = $MODEMAP{$nm} ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
367 if( $self->_eventHandler->will_handle($type) ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
368 my $func = sprintf("end_%s",lc $type);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
369 $rc = $self->_eventHandler->$func($self->{'_type'},
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
370 $self->{'_values'});
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
371 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
372 } elsif( $MAPPING{$nm} ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
373 if ( ref($MAPPING{$nm}) =~ /hash/i ) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
374 my $key = (keys %{$MAPPING{$nm}})[0];
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
375 $self->{'_values'}->{$key}->{$MAPPING{$nm}->{$key}} = $self->{'_last_data'};
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
376 } else {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
377 $self->{'_values'}->{$MAPPING{$nm}} = $self->{'_last_data'};
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
378 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
379 } elsif( $nm eq 'Iteration' || $nm eq 'Hit_hsps' || $nm eq 'Parameters' ||
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
380 $nm eq 'BlastOutput_param' || $nm eq 'Iteration_hits' ||
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
381 $nm eq 'Statistics' || $nm eq 'BlastOutput_iterations' ){
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
382
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
383 } else {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
384
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
385 $self->debug("ignoring unrecognized element type $nm\n");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
386 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
387 $self->{'_last_data'} = ''; # remove read data if we are at
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
388 # end of an element
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
389 $self->{'_result'} = $rc if( $nm eq 'BlastOutput' );
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
390 return $rc;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
391 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
392
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
393 =head2 characters
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
394
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
395 Title : characters
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
396 Usage : $parser->characters($data)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
397 Function: Signals new characters to be processed
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
398 Returns : characters read
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
399 Args : hash ref with the key 'Data'
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
400
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
401
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
402 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
403
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
404 sub characters{
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
405 my ($self,$data) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
406 return unless ( defined $data->{'Data'} && $data->{'Data'} !~ /^\s+$/ );
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
407
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
408 $self->{'_last_data'} = $data->{'Data'};
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
409 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
410
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
411 =head2 use_tempfile
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
412
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
413 Title : use_tempfile
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
414 Usage : $obj->use_tempfile($newval)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
415 Function: Get/Set boolean flag on whether or not use a tempfile
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
416 Example :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
417 Returns : value of use_tempfile
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
418 Args : newvalue (optional)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
419
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
420
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
421 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
422
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
423 sub use_tempfile{
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
424 my ($self,$value) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
425 if( defined $value) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
426 $self->{'_use_tempfile'} = $value;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
427 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
428 return $self->{'_use_tempfile'};
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
429 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
430
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
431 sub result_count {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
432 my $self = shift;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
433 return $self->{'_result_count'};
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
434 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
435
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
436 1;