annotate variant_effect_predictor/Bio/Tools/ESTScan.pm @ 0:2bc9b66ada89 draft default tip

Uploaded
author mahtabm
date Thu, 11 Apr 2013 06:29:17 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
1 # $Id: ESTScan.pm,v 1.10 2002/10/22 07:38:45 lapp Exp $
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
2 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
3 # BioPerl module for Bio::Tools::ESTScan
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
4 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
5 # Cared for by Hilmar Lapp <hlapp@gmx.net>
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
6 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
7 # Copyright Hilmar Lapp
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
8 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
9 # You may distribute this module under the same terms as perl itself
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
10
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
11 # POD documentation - main docs before the code
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
12
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
13 =head1 NAME
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
14
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
15 Bio::Tools::ESTScan - Results of one ESTScan run
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
16
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
17 =head1 SYNOPSIS
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
18
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
19 $estscan = Bio::Tools::ESTScan->new(-file => 'result.estscan');
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
20 # filehandle:
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
21 $estscan = Bio::Tools::ESTScan->new( -fh => \*INPUT );
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
22
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
23 # parse the results
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
24 # note: this class is-a Bio::Tools::AnalysisResult which implements
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
25 # Bio::SeqAnalysisParserI, i.e., $genscan->next_feature() is the same
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
26 while($gene = $estscan->next_prediction()) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
27 # $gene is an instance of Bio::Tools::Prediction::Gene
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
28 foreach my $orf ($gene->exons()) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
29 # $orf is an instance of Bio::Tools::Prediction::Exon
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
30 $cds_str = $orf->predicted_cds();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
31 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
32 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
33
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
34 # essential if you gave a filename at initialization (otherwise the file
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
35 # will stay open)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
36 $estscan->close();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
37
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
38 =head1 DESCRIPTION
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
39
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
40 The ESTScan module provides a parser for ESTScan coding region prediction
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
41 output.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
42
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
43 This module inherits off L<Bio::Tools::AnalysisResult> and therefore
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
44 implements the L<Bio::SeqAnalysisParserI> interface.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
45 See L<Bio::SeqAnalysisParserI>.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
46
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
47 =head1 FEEDBACK
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
48
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
49 =head2 Mailing Lists
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
50
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
51 User feedback is an integral part of the evolution of this and other
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
52 Bioperl modules. Send your comments and suggestions preferably to one
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
53 of the Bioperl mailing lists. Your participation is much appreciated.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
54
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
55 bioperl-l@bioperl.org - General discussion
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
56 http://bio.perl.org/MailList.html - About the mailing lists
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
57
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
58 =head2 Reporting Bugs
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
59
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
60 Report bugs to the Bioperl bug tracking system to help us keep track
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
61 the bugs and their resolution. Bug reports can be submitted via email
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
62 or the web:
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
63
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
64 bioperl-bugs@bio.perl.org
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
65 http://bugzilla.bioperl.org/
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
66
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
67 =head1 AUTHOR - Hilmar Lapp
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
68
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
69 Email hlapp@gmx.net (or hilmar.lapp@pharma.novartis.com)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
70
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
71 Describe contact details here
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
72
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
73 =head1 APPENDIX
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
74
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
75 The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
76
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
77 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
78
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
79 # Let the code begin...
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
80
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
81 package Bio::Tools::ESTScan;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
82 use vars qw(@ISA);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
83 use strict;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
84 use Symbol;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
85
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
86 use Bio::Root::Root;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
87 use Bio::Tools::AnalysisResult;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
88 use Bio::Tools::Prediction::Exon;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
89
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
90 @ISA = qw(Bio::Tools::AnalysisResult);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
91
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
92 sub _initialize_state {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
93 my ($self,@args) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
94
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
95 # first call the inherited method!
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
96 my $make = $self->SUPER::_initialize_state(@args);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
97
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
98 if(! $self->analysis_method()) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
99 $self->analysis_method('ESTScan');
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
100 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
101 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
102
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
103 =head2 analysis_method
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
104
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
105 Usage : $estscan->analysis_method();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
106 Purpose : Inherited method. Overridden to ensure that the name matches
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
107 /estscan/i.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
108 Returns : String
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
109 Argument : n/a
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
110
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
111 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
112
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
113 #-------------
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
114 sub analysis_method {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
115 #-------------
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
116 my ($self, $method) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
117 if($method && ($method !~ /estscan/i)) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
118 $self->throw("method $method not supported in " . ref($self));
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
119 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
120 return $self->SUPER::analysis_method($method);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
121 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
122
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
123 =head2 next_feature
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
124
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
125 Title : next_feature
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
126 Usage : while($orf = $estscan->next_feature()) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
127 # do something
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
128 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
129 Function: Returns the next gene structure prediction of the ESTScan result
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
130 file. Call this method repeatedly until FALSE is returned.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
131
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
132 The returned object is actually a SeqFeatureI implementing object.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
133 This method is required for classes implementing the
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
134 SeqAnalysisParserI interface, and is merely an alias for
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
135 next_prediction() at present.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
136
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
137 Example :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
138 Returns : A Bio::Tools::Prediction::Gene object.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
139 Args :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
140
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
141 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
142
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
143 sub next_feature {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
144 my ($self,@args) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
145 # even though next_prediction doesn't expect any args (and this method
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
146 # does neither), we pass on args in order to be prepared if this changes
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
147 # ever
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
148 return $self->next_prediction(@args);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
149 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
150
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
151 =head2 next_prediction
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
152
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
153 Title : next_prediction
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
154 Usage : while($gene = $estscan->next_prediction()) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
155 # do something
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
156 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
157 Function: Returns the next gene structure prediction of the ESTScan result
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
158 file. Call this method repeatedly until FALSE is returned.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
159
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
160 So far, this method DOES NOT work for reverse strand predictions,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
161 even though the code looks like.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
162 Example :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
163 Returns : A Bio::Tools::Prediction::Gene object.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
164 Args :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
165
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
166 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
167
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
168 sub next_prediction {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
169 my ($self) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
170 my ($gene, $seq, $cds, $predobj);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
171 my $numins = 0;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
172
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
173 # predictions are in the format of FASTA sequences and can be parsed one
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
174 # at a time
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
175 $seq = $self->_fasta_stream()->next_seq();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
176 return unless $seq;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
177 # there is a new prediction
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
178 $gene = Bio::Tools::Prediction::Gene->new('-primary' => "ORFprediction",
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
179 '-source' => "ESTScan");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
180 # score starts the description
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
181 $seq->desc() =~ /^([\d.]+)\s*(.*)/ or
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
182 $self->throw("unexpected format of description: no score in " .
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
183 $seq->desc());
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
184 $gene->score($1);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
185 $seq->desc($2);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
186 # strand may end the description
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
187 if($seq->desc() =~ /(.*)minus strand$/) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
188 my $desc = $1;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
189 $desc =~ s/;\s+$//;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
190 $seq->desc($desc);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
191 $gene->strand(-1);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
192 } else {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
193 $gene->strand(1);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
194 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
195 # check for the format: default or 'all-in-one' (option -a)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
196 if($seq->desc() =~ /^(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s*(.*)/) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
197 # default format
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
198 $seq->desc($5);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
199 $predobj = Bio::Tools::Prediction::Exon->new('-source' => "ESTScan",
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
200 '-start' => $3,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
201 '-end' => $4);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
202 $predobj->strand($gene->strand());
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
203 $predobj->score($gene->score()); # FIXME or $1, or $2 ?
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
204 $predobj->primary_tag("InternalExon");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
205 $predobj->seq_id($seq->display_id());
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
206 # add to gene structure object
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
207 $gene->add_exon($predobj);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
208 # add predicted CDS
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
209 $cds = $seq->seq();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
210 $cds =~ s/[a-z]//g; # remove the deletions, but keep the insertions
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
211 $cds = Bio::PrimarySeq->new('-seq' => $cds,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
212 '-display_id' => $seq->display_id(),
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
213 '-desc' => $seq->desc(),
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
214 '-alphabet' => "dna");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
215 $gene->predicted_cds($cds);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
216 $predobj->predicted_cds($cds);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
217 if($gene->strand() == -1) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
218 $self->warn("reverse strand ORF, but unable to reverse coordinates!");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
219 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
220 } else {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
221 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
222 # All-in-one format (hopefully). This encodes the following information
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
223 # into the sequence:
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
224 # 1) untranslated regions: stretches of lower-case letters
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
225 # 2) translated regions: stretches of upper-case letters
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
226 # 3) insertions in the translated regions: capital X
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
227 # 4) deletions in the translated regions: a single lower-case letter
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
228 #
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
229 # if reverse strand ORF, save a lot of hassle by reversing the sequence
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
230 if($gene->strand() == -1) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
231 $seq = $seq->revcom();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
232 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
233 my $seqstr = $seq->seq();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
234 while($seqstr =~ /^([a-z]*)([A-Z].*)$/) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
235 # leading 5'UTR
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
236 my $utr5 = $1;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
237 # exon + 3'UTR
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
238 my $exonseq = $2;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
239 # strip 3'UTR and following exons
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
240 if($exonseq =~ s/([a-z]{2,}.*)$//) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
241 $seqstr = $1;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
242 } else {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
243 $seqstr = "";
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
244 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
245 # start: take care of yielding the absolute coordinate
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
246 my $start = CORE::length($utr5) + 1;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
247 if($predobj) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
248 $start += $predobj->end() + $numins;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
249 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
250 # for the end coordinate, we need to subtract the insertions
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
251 $cds = $exonseq;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
252 $cds =~ s/[X]//g;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
253 my $end = $start + CORE::length($cds) - 1;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
254 # construct next exon object
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
255 $predobj = Bio::Tools::Prediction::Exon->new('-start' => $start,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
256 '-end' => $end);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
257 $predobj->source_tag("ESTScan");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
258 $predobj->primary_tag("InternalExon");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
259 $predobj->seq_id($seq->display_id());
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
260 $predobj->strand($gene->strand());
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
261 $predobj->score($gene->score());
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
262 # add the exon to the gene structure object
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
263 $gene->add_exon($predobj);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
264 # add the predicted CDS
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
265 $cds = $exonseq;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
266 $cds =~ s/[a-z]//g; # remove the deletions, but keep the insertions
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
267 $cds = Bio::PrimarySeq->new('-seq' => $cds,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
268 '-display_id' => $seq->display_id(),
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
269 '-desc' => $seq->desc(),
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
270 '-alphabet' => "dna");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
271 # only store the first one in the overall prediction
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
272 $gene->predicted_cds($cds) unless $gene->predicted_cds();
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
273 $predobj->predicted_cds($cds);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
274 # add the predicted insertions and deletions as subfeatures
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
275 # of the exon
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
276 my $fea = undef;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
277 while($exonseq =~ /([a-zX])/g) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
278 my $indel = $1;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
279 # start and end: start looking at the position after the
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
280 # previous feature
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
281 if($fea) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
282 $start = $fea->start()+$numins;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
283 $start -= 1 if($fea->primary_tag() eq 'insertion');
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
284 } else {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
285 $start = $predobj->start()+$numins-1;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
286 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
287 #print "# numins = $numins, indel = $indel, start = $start\n";
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
288 $start = index($seq->seq(), $indel, $start) + 1 - $numins;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
289 $fea = Bio::SeqFeature::Generic->new('-start' => $start,
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
290 '-end' => $start);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
291 $fea->source_tag("ESTScan");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
292 $fea->seq_id($seq->display_id());
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
293 $fea->strand($predobj->strand());
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
294 if($indel eq 'X') {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
295 # an insertion (depends on viewpoint: to get the 'real'
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
296 # CDS, a base has to be inserted, i.e., the HMMER model
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
297 # inserted a base; however, the sequencing process deleted
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
298 # a base that was there).
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
299 $fea->primary_tag("insertion");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
300 # we need to count insertions because these are left out
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
301 # of any coordinates saved in the objects (which is correct
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
302 # because insertions change the original sequence, so
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
303 # coordinates wouldn't match)
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
304 $numins++;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
305 } else {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
306 # a deletion (depends on viewpoint: to get the 'real'
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
307 # CDS, a base has to be deleted, i.e., the HMMER model
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
308 # deleted a base; however, the sequencing process inserted
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
309 # a base that wasn't there).
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
310 $fea->primary_tag("deletion");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
311 $fea->add_tag_value('base', $indel);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
312 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
313 $predobj->add_sub_SeqFeature($fea);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
314 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
315 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
316 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
317
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
318 return $gene;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
319 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
320
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
321 =head2 close
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
322
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
323 Title : close
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
324 Usage : $result->close()
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
325 Function: Closes the file handle associated with this result file.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
326 Inherited method, overridden.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
327 Example :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
328 Returns :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
329 Args :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
330
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
331 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
332
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
333 sub close {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
334 my ($self, @args) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
335
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
336 delete($self->{'_fastastream'});
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
337 $self->SUPER::close(@args);
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
338 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
339
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
340 =head2 _fasta_stream
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
341
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
342 Title : _fasta_stream
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
343 Usage : $result->_fasta_stream()
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
344 Function: Gets/Sets the FASTA sequence IO stream for reading the contents of
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
345 the file associated with this MZEF result object.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
346
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
347 If called for the first time, creates the stream from the filehandle
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
348 if necessary.
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
349 Example :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
350 Returns :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
351 Args :
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
352
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
353 =cut
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
354
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
355 sub _fasta_stream {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
356 my ($self, $stream) = @_;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
357
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
358 if($stream || (! exists($self->{'_fastastream'}))) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
359 if(! $stream) {
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
360 $stream = Bio::SeqIO->new('-fh' => $self->_fh(),
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
361 '-format' => "fasta");
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
362 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
363 $self->{'_fastastream'} = $stream;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
364 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
365 return $self->{'_fastastream'};
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
366 }
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
367
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
368 1;
2bc9b66ada89 Uploaded
mahtabm
parents:
diff changeset
369