annotate variant_effect_predictor/Bio/SearchIO/blastxml.pm @ 1:d6778b5d8382 draft default tip

Deleted selected files
author willmclaren
date Fri, 03 Aug 2012 10:05:43 -0400
parents 21066c0abaf5
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
1 # $Id: blastxml.pm,v 1.24 2002/10/26 09:32:16 sac Exp $
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
2 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
3 # BioPerl module for Bio::SearchIO::blastxml
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
4 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
5 # Cared for by Jason Stajich <jason@bioperl.org>
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
6 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
7 # Copyright Jason Stajich
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
8 #
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
9 # You may distribute this module under the same terms as perl itself
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
10
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
11 # POD documentation - main docs before the code
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
12
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
13 =head1 NAME
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
14
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
15 Bio::SearchIO::blastxml - A SearchIO implementation of NCBI Blast XML parsing.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
16
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
17 =head1 SYNOPSIS
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
18
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
19 use Bio::SearchIO;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
20 my $searchin = new Bio::SearchIO(-format => 'blastxml',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
21 -file => 't/data/plague_yeast.bls.xml');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
22 while( my $result = $searchin->next_result ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
23 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
24
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
25 # one can also request that the parser NOT keep the XML data in memory
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
26 # by using the tempfile initialization flag.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
27 my $searchin = new Bio::SearchIO(-tempfile => 1,
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
28 -format => 'blastxml',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
29 -file => 't/data/plague_yeast.bls.xml');
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
30 while( my $result = $searchin->next_result ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
31 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
32
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
33 =head1 DESCRIPTION
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
34
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
35 This object implements a NCBI Blast XML parser.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
36
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
37 There is one additional initialization flag from the SearchIO defaults
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
38 - that is the -tempfile flag. If specified as true, then the parser
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
39 will write out each report to a temporary filehandle rather than
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
40 holding the entire report as a string in memory. The reason this is
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
41 done in the first place is NCBI reports have an uncessary E<lt>?xml
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
42 version="1.0"?E<gt> at the beginning of each report and RPS-BLAST reports
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
43 have an additional unecessary RPS-BLAST tag at the top of each report.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
44 So we currently have implemented the work around by preparsing the
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
45 file (yes it makes the process slower, but it works).
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
46
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
47
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
48 =head1 FEEDBACK
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
49
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
50 =head2 Mailing Lists
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
51
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
52 User feedback is an integral part of the evolution of this and other
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
53 Bioperl modules. Send your comments and suggestions preferably to
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
54 the Bioperl mailing list. Your participation is much appreciated.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
55
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
56 bioperl-l@bioperl.org - General discussion
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
57 http://bioperl.org/MailList.shtml - About the mailing lists
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
58
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
59 =head2 Reporting Bugs
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
60
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
61 Report bugs to the Bioperl bug tracking system to help us keep track
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
62 of the bugs and their resolution. Bug reports can be submitted via
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
63 email or the web:
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
64
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
65 bioperl-bugs@bioperl.org
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
66 http://bugzilla.bioperl.org/
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
67
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
68 =head1 AUTHOR - Jason Stajich
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
69
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
70 Email jason@bioperl.org
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
71
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
72 Describe contact details here
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
73
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
74 =head1 CONTRIBUTORS
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
75
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
76 Additional contributors names and emails here
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
77
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
78 =head1 APPENDIX
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
79
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
80 The rest of the documentation details each of the object methods.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
81 Internal methods are usually preceded with a _
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
82
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
83 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
84
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
85 # Let the code begin...
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
86
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
87 package Bio::SearchIO::blastxml;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
88 use vars qw(@ISA $DTD %MAPPING %MODEMAP $DEBUG);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
89 use strict;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
90
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
91 $DTD = 'ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NCBI_BlastOutput.dtd';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
92 # Object preamble - inherits from Bio::Root::Root
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
93
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
94 use Bio::Root::Root;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
95 use Bio::SearchIO;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
96 use XML::Parser::PerlSAX;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
97 use XML::Handler::Subs;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
98 use HTML::Entities;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
99 use IO::File;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
100
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
101
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
102 BEGIN {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
103 # mapping of NCBI Blast terms to Bioperl hash keys
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
104 %MODEMAP = ('BlastOutput' => 'result',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
105 'Hit' => 'hit',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
106 'Hsp' => 'hsp'
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
107 );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
108
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
109 %MAPPING = (
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
110 # HSP specific fields
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
111 'Hsp_bit-score' => 'HSP-bits',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
112 'Hsp_score' => 'HSP-score',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
113 'Hsp_evalue' => 'HSP-evalue',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
114 'Hsp_query-from' => 'HSP-query_start',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
115 'Hsp_query-to' => 'HSP-query_end',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
116 'Hsp_hit-from' => 'HSP-hit_start',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
117 'Hsp_hit-to' => 'HSP-hit_end',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
118 'Hsp_positive' => 'HSP-conserved',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
119 'Hsp_identity' => 'HSP-identical',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
120 'Hsp_gaps' => 'HSP-gaps',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
121 'Hsp_hitgaps' => 'HSP-hit_gaps',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
122 'Hsp_querygaps' => 'HSP-query_gaps',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
123 'Hsp_qseq' => 'HSP-query_seq',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
124 'Hsp_hseq' => 'HSP-hit_seq',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
125 'Hsp_midline' => 'HSP-homology_seq',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
126 'Hsp_align-len' => 'HSP-hsp_length',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
127 'Hsp_query-frame'=> 'HSP-query_frame',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
128 'Hsp_hit-frame' => 'HSP-hit_frame',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
129
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
130 # these are ignored for now
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
131 'Hsp_num' => 'HSP-order',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
132 'Hsp_pattern-from' => 'patternend',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
133 'Hsp_pattern-to' => 'patternstart',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
134 'Hsp_density' => 'hspdensity',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
135
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
136 # Hit specific fields
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
137 'Hit_id' => 'HIT-name',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
138 'Hit_len' => 'HIT-length',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
139 'Hit_accession' => 'HIT-accession',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
140 'Hit_def' => 'HIT-description',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
141 'Hit_num' => 'HIT-order',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
142 'Iteration_iter-num' => 'HIT-iteration',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
143 'Iteration_stat' => 'HIT-iteration_statistic',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
144
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
145 'BlastOutput_program' => 'RESULT-algorithm_name',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
146 'BlastOutput_version' => 'RESULT-algorithm_version',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
147 'BlastOutput_query-def' => 'RESULT-query_description',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
148 'BlastOutput_query-len' => 'RESULT-query_length',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
149 'BlastOutput_db' => 'RESULT-database_name',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
150 'BlastOutput_reference' => 'RESULT-program_reference',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
151 'BlastOutput_query-ID' => 'runid',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
152
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
153 'Parameters_matrix' => { 'RESULT-parameters' => 'matrix'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
154 'Parameters_expect' => { 'RESULT-parameters' => 'expect'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
155 'Parameters_include' => { 'RESULT-parameters' => 'include'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
156 'Parameters_sc-match' => { 'RESULT-parameters' => 'match'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
157 'Parameters_sc-mismatch' => { 'RESULT-parameters' => 'mismatch'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
158 'Parameters_gap-open' => { 'RESULT-parameters' => 'gapopen'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
159 'Parameters_gap-extend'=> { 'RESULT-parameters' => 'gapext'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
160 'Parameters_filter' => {'RESULT-parameters' => 'filter'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
161 'Statistics_db-num' => 'RESULT-database_entries',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
162 'Statistics_db-len' => 'RESULT-database_letters',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
163 'Statistics_hsp-len' => { 'RESULT-statistics' => 'hsplength'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
164 'Statistics_eff-space' => { 'RESULT-statistics' => 'effectivespace'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
165 'Statistics_kappa' => { 'RESULT-statistics' => 'kappa' },
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
166 'Statistics_lambda' => { 'RESULT-statistics' => 'lambda' },
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
167 'Statistics_entropy' => { 'RESULT-statistics' => 'entropy'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
168 );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
169 eval { require Time::HiRes };
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
170 if( $@ ) { $DEBUG = 0; }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
171 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
172
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
173
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
174 @ISA = qw(Bio::SearchIO );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
175
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
176 =head2 new
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
177
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
178 Title : new
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
179 Usage : my $searchio = new Bio::SearchIO(-format => 'blastxml',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
180 -file => 'filename',
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
181 -tempfile => 1);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
182 Function: Initializes the object - this is chained through new in SearchIO
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
183 Returns : Bio::SearchIO::blastxml object
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
184 Args : One additional argument from the format and file/fh parameters.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
185 -tempfile => boolean. Defaults to false. Write out XML data
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
186 to a temporary filehandle to send to
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
187 PerlSAX parser.
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
188 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
189
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
190 =head2 _initialize
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
191
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
192 Title : _initialize
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
193 Usage : private
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
194 Function: Initializes the object - this is chained through new in SearchIO
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
195
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
196 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
197
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
198 sub _initialize{
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
199 my ($self,@args) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
200 $self->SUPER::_initialize(@args);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
201 my ($usetempfile) = $self->_rearrange([qw(TEMPFILE)],@args);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
202 defined $usetempfile && $self->use_tempfile($usetempfile);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
203 $self->{'_xmlparser'} = new XML::Parser::PerlSAX();
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
204 $DEBUG = 1 if( ! defined $DEBUG && $self->verbose > 0);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
205 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
206
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
207 =head2 next_result
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
208
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
209 Title : next_result
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
210 Usage : my $hit = $searchio->next_result;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
211 Function: Returns the next Result from a search
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
212 Returns : Bio::Search::Result::ResultI object
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
213 Args : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
214
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
215 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
216
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
217 sub next_result {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
218 my ($self) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
219
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
220 my $data = '';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
221 my $firstline = 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
222 my ($tfh);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
223 if( $self->use_tempfile ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
224 $tfh = IO::File->new_tmpfile or $self->throw("Unable to open temp file: $!");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
225 $tfh->autoflush(1);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
226 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
227 my $okaytoprocess;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
228 while( defined( $_ = $self->_readline) ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
229 if( /^RPS-BLAST/i ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
230 $self->{'_type'} = 'RPSBLAST';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
231 next;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
232 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
233 if( /^<\?xml version/ && ! $firstline) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
234 $self->_pushback($_);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
235 last;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
236 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
237 $_ = decode_entities($_);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
238 # s/\&apos;/\`/g;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
239 # s/\&gt;/\>/g;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
240 # s/\&lt;/\</g;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
241 $okaytoprocess = 1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
242 if( defined $tfh ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
243 print $tfh $_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
244 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
245 $data .= $_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
246 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
247 $firstline = 0;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
248 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
249
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
250 return undef unless( $okaytoprocess);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
251
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
252 my %parser_args;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
253 if( defined $tfh ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
254 seek($tfh,0,0);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
255 %parser_args = ('Source' => { 'ByteStream' => $tfh },
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
256 'Handler' => $self);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
257 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
258 %parser_args = ('Source' => { 'String' => $data },
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
259 'Handler' => $self);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
260 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
261 my $result;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
262 my $starttime;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
263 if( $DEBUG ) { $starttime = [ Time::HiRes::gettimeofday() ]; }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
264
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
265 eval {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
266 $result = $self->{'_xmlparser'}->parse(%parser_args);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
267 $self->{'_result_count'}++;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
268 };
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
269 if( $@ ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
270 $self->warn("error in parsing a report:\n $@");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
271 $result = undef;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
272 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
273 if( $DEBUG ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
274 $self->debug( sprintf("parsing took %f seconds\n", Time::HiRes::tv_interval($starttime)));
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
275 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
276 # parsing magic here - but we call event handlers rather than
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
277 # instantiating things
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
278 return $result;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
279 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
280
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
281 =head2 SAX methods
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
282
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
283 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
284
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
285 =head2 start_document
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
286
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
287 Title : start_document
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
288 Usage : $parser->start_document;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
289 Function: SAX method to indicate starting to parse a new document
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
290 Returns : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
291 Args : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
292
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
293
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
294 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
295
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
296 sub start_document{
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
297 my ($self) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
298 $self->{'_lasttype'} = '';
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
299 $self->{'_values'} = {};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
300 $self->{'_result'}= undef;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
301 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
302
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
303 =head2 end_document
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
304
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
305 Title : end_document
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
306 Usage : $parser->end_document;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
307 Function: SAX method to indicate finishing parsing a new document
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
308 Returns : Bio::Search::Result::ResultI object
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
309 Args : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
310
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
311 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
312
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
313 sub end_document{
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
314 my ($self,@args) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
315 return $self->{'_result'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
316 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
317
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
318 =head2 start_element
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
319
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
320 Title : start_element
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
321 Usage : $parser->start_element($data)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
322 Function: SAX method to indicate starting a new element
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
323 Returns : none
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
324 Args : hash ref for data
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
325
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
326 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
327
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
328 sub start_element{
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
329 my ($self,$data) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
330 # we currently don't care about attributes
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
331 my $nm = $data->{'Name'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
332
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
333 if( my $type = $MODEMAP{$nm} ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
334 if( $self->_eventHandler->will_handle($type) ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
335 my $func = sprintf("start_%s",lc $type);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
336 $self->_eventHandler->$func($data->{'Attributes'});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
337 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
338 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
339
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
340 if($nm eq 'BlastOutput') {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
341 $self->{'_values'} = {};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
342 $self->{'_result'}= undef;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
343 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
344 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
345
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
346 =head2 end_element
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
347
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
348 Title : end_element
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
349 Usage : $parser->end_element($data)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
350 Function: Signals finishing an element
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
351 Returns : Bio::Search object dpending on what type of element
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
352 Args : hash ref for data
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
353
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
354 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
355
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
356 sub end_element{
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
357 my ($self,$data) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
358
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
359 my $nm = $data->{'Name'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
360 my $rc;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
361 if($nm eq 'BlastOutput_program' &&
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
362 $self->{'_last_data'} =~ /(t?blast[npx])/i ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
363 $self->{'_type'} = uc $1;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
364 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
365
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
366 if( my $type = $MODEMAP{$nm} ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
367 if( $self->_eventHandler->will_handle($type) ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
368 my $func = sprintf("end_%s",lc $type);
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
369 $rc = $self->_eventHandler->$func($self->{'_type'},
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
370 $self->{'_values'});
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
371 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
372 } elsif( $MAPPING{$nm} ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
373 if ( ref($MAPPING{$nm}) =~ /hash/i ) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
374 my $key = (keys %{$MAPPING{$nm}})[0];
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
375 $self->{'_values'}->{$key}->{$MAPPING{$nm}->{$key}} = $self->{'_last_data'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
376 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
377 $self->{'_values'}->{$MAPPING{$nm}} = $self->{'_last_data'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
378 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
379 } elsif( $nm eq 'Iteration' || $nm eq 'Hit_hsps' || $nm eq 'Parameters' ||
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
380 $nm eq 'BlastOutput_param' || $nm eq 'Iteration_hits' ||
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
381 $nm eq 'Statistics' || $nm eq 'BlastOutput_iterations' ){
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
382
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
383 } else {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
384
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
385 $self->debug("ignoring unrecognized element type $nm\n");
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
386 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
387 $self->{'_last_data'} = ''; # remove read data if we are at
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
388 # end of an element
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
389 $self->{'_result'} = $rc if( $nm eq 'BlastOutput' );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
390 return $rc;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
391 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
392
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
393 =head2 characters
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
394
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
395 Title : characters
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
396 Usage : $parser->characters($data)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
397 Function: Signals new characters to be processed
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
398 Returns : characters read
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
399 Args : hash ref with the key 'Data'
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
400
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
401
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
402 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
403
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
404 sub characters{
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
405 my ($self,$data) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
406 return unless ( defined $data->{'Data'} && $data->{'Data'} !~ /^\s+$/ );
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
407
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
408 $self->{'_last_data'} = $data->{'Data'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
409 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
410
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
411 =head2 use_tempfile
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
412
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
413 Title : use_tempfile
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
414 Usage : $obj->use_tempfile($newval)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
415 Function: Get/Set boolean flag on whether or not use a tempfile
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
416 Example :
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
417 Returns : value of use_tempfile
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
418 Args : newvalue (optional)
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
419
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
420
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
421 =cut
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
422
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
423 sub use_tempfile{
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
424 my ($self,$value) = @_;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
425 if( defined $value) {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
426 $self->{'_use_tempfile'} = $value;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
427 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
428 return $self->{'_use_tempfile'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
429 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
430
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
431 sub result_count {
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
432 my $self = shift;
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
433 return $self->{'_result_count'};
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
434 }
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
435
21066c0abaf5 Uploaded
willmclaren
parents:
diff changeset
436 1;