annotate variant_effect_predictor/Bio/Tools/ESTScan.pm @ 3:d30fa12e4cc5 default tip

Merge heads 2:a5976b2dce6f and 1:09613ce8151e which were created as a result of a recently fixed bug.
author devteam <devteam@galaxyproject.org>
date Mon, 13 Jan 2014 10:38:30 -0500
parents 1f6dce3d34e0
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1 # $Id: ESTScan.pm,v 1.10 2002/10/22 07:38:45 lapp Exp $
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
2 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
3 # BioPerl module for Bio::Tools::ESTScan
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
4 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
5 # Cared for by Hilmar Lapp <hlapp@gmx.net>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
6 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
7 # Copyright Hilmar Lapp
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
8 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
9 # You may distribute this module under the same terms as perl itself
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
10
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
11 # POD documentation - main docs before the code
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
12
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
13 =head1 NAME
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
14
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
15 Bio::Tools::ESTScan - Results of one ESTScan run
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
16
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
17 =head1 SYNOPSIS
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
18
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
19 $estscan = Bio::Tools::ESTScan->new(-file => 'result.estscan');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
20 # filehandle:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
21 $estscan = Bio::Tools::ESTScan->new( -fh => \*INPUT );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
22
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
23 # parse the results
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
24 # note: this class is-a Bio::Tools::AnalysisResult which implements
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
25 # Bio::SeqAnalysisParserI, i.e., $genscan->next_feature() is the same
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
26 while($gene = $estscan->next_prediction()) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
27 # $gene is an instance of Bio::Tools::Prediction::Gene
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
28 foreach my $orf ($gene->exons()) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
29 # $orf is an instance of Bio::Tools::Prediction::Exon
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
30 $cds_str = $orf->predicted_cds();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
31 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
32 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
33
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
34 # essential if you gave a filename at initialization (otherwise the file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
35 # will stay open)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
36 $estscan->close();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
37
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
38 =head1 DESCRIPTION
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
39
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
40 The ESTScan module provides a parser for ESTScan coding region prediction
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
41 output.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
42
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
43 This module inherits off L<Bio::Tools::AnalysisResult> and therefore
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
44 implements the L<Bio::SeqAnalysisParserI> interface.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
45 See L<Bio::SeqAnalysisParserI>.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
46
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
47 =head1 FEEDBACK
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
48
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
49 =head2 Mailing Lists
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
50
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
51 User feedback is an integral part of the evolution of this and other
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
52 Bioperl modules. Send your comments and suggestions preferably to one
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
53 of the Bioperl mailing lists. Your participation is much appreciated.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
54
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
55 bioperl-l@bioperl.org - General discussion
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
56 http://bio.perl.org/MailList.html - About the mailing lists
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
57
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
58 =head2 Reporting Bugs
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
59
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
60 Report bugs to the Bioperl bug tracking system to help us keep track
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
61 the bugs and their resolution. Bug reports can be submitted via email
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
62 or the web:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
63
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
64 bioperl-bugs@bio.perl.org
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
65 http://bugzilla.bioperl.org/
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
66
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
67 =head1 AUTHOR - Hilmar Lapp
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
68
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
69 Email hlapp@gmx.net (or hilmar.lapp@pharma.novartis.com)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
70
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
71 Describe contact details here
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
72
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
73 =head1 APPENDIX
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
74
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
75 The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
76
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
77 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
78
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
79 # Let the code begin...
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
80
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
81 package Bio::Tools::ESTScan;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
82 use vars qw(@ISA);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
83 use strict;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
84 use Symbol;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
85
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
86 use Bio::Root::Root;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
87 use Bio::Tools::AnalysisResult;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
88 use Bio::Tools::Prediction::Exon;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
89
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
90 @ISA = qw(Bio::Tools::AnalysisResult);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
91
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
92 sub _initialize_state {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
93 my ($self,@args) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
94
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
95 # first call the inherited method!
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
96 my $make = $self->SUPER::_initialize_state(@args);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
97
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
98 if(! $self->analysis_method()) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
99 $self->analysis_method('ESTScan');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
100 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
101 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
102
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
103 =head2 analysis_method
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
104
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
105 Usage : $estscan->analysis_method();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
106 Purpose : Inherited method. Overridden to ensure that the name matches
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
107 /estscan/i.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
108 Returns : String
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
109 Argument : n/a
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
110
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
111 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
112
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
113 #-------------
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
114 sub analysis_method {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
115 #-------------
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
116 my ($self, $method) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
117 if($method && ($method !~ /estscan/i)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
118 $self->throw("method $method not supported in " . ref($self));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
119 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
120 return $self->SUPER::analysis_method($method);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
121 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
122
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
123 =head2 next_feature
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
124
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
125 Title : next_feature
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
126 Usage : while($orf = $estscan->next_feature()) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
127 # do something
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
128 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
129 Function: Returns the next gene structure prediction of the ESTScan result
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
130 file. Call this method repeatedly until FALSE is returned.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
131
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
132 The returned object is actually a SeqFeatureI implementing object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
133 This method is required for classes implementing the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
134 SeqAnalysisParserI interface, and is merely an alias for
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
135 next_prediction() at present.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
136
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
137 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
138 Returns : A Bio::Tools::Prediction::Gene object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
139 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
140
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
141 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
142
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
143 sub next_feature {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
144 my ($self,@args) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
145 # even though next_prediction doesn't expect any args (and this method
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
146 # does neither), we pass on args in order to be prepared if this changes
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
147 # ever
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
148 return $self->next_prediction(@args);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
149 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
150
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
151 =head2 next_prediction
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
152
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
153 Title : next_prediction
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
154 Usage : while($gene = $estscan->next_prediction()) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
155 # do something
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
156 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
157 Function: Returns the next gene structure prediction of the ESTScan result
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
158 file. Call this method repeatedly until FALSE is returned.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
159
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
160 So far, this method DOES NOT work for reverse strand predictions,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
161 even though the code looks like.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
162 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
163 Returns : A Bio::Tools::Prediction::Gene object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
164 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
165
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
166 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
167
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
168 sub next_prediction {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
169 my ($self) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
170 my ($gene, $seq, $cds, $predobj);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
171 my $numins = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
172
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
173 # predictions are in the format of FASTA sequences and can be parsed one
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
174 # at a time
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
175 $seq = $self->_fasta_stream()->next_seq();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
176 return unless $seq;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
177 # there is a new prediction
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
178 $gene = Bio::Tools::Prediction::Gene->new('-primary' => "ORFprediction",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
179 '-source' => "ESTScan");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
180 # score starts the description
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
181 $seq->desc() =~ /^([\d.]+)\s*(.*)/ or
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
182 $self->throw("unexpected format of description: no score in " .
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
183 $seq->desc());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
184 $gene->score($1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
185 $seq->desc($2);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
186 # strand may end the description
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
187 if($seq->desc() =~ /(.*)minus strand$/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
188 my $desc = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
189 $desc =~ s/;\s+$//;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
190 $seq->desc($desc);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
191 $gene->strand(-1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
192 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
193 $gene->strand(1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
194 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
195 # check for the format: default or 'all-in-one' (option -a)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
196 if($seq->desc() =~ /^(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s*(.*)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
197 # default format
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
198 $seq->desc($5);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
199 $predobj = Bio::Tools::Prediction::Exon->new('-source' => "ESTScan",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
200 '-start' => $3,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
201 '-end' => $4);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
202 $predobj->strand($gene->strand());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
203 $predobj->score($gene->score()); # FIXME or $1, or $2 ?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
204 $predobj->primary_tag("InternalExon");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
205 $predobj->seq_id($seq->display_id());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
206 # add to gene structure object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
207 $gene->add_exon($predobj);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
208 # add predicted CDS
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
209 $cds = $seq->seq();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
210 $cds =~ s/[a-z]//g; # remove the deletions, but keep the insertions
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
211 $cds = Bio::PrimarySeq->new('-seq' => $cds,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
212 '-display_id' => $seq->display_id(),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
213 '-desc' => $seq->desc(),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
214 '-alphabet' => "dna");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
215 $gene->predicted_cds($cds);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
216 $predobj->predicted_cds($cds);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
217 if($gene->strand() == -1) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
218 $self->warn("reverse strand ORF, but unable to reverse coordinates!");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
219 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
220 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
221 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
222 # All-in-one format (hopefully). This encodes the following information
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
223 # into the sequence:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
224 # 1) untranslated regions: stretches of lower-case letters
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
225 # 2) translated regions: stretches of upper-case letters
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
226 # 3) insertions in the translated regions: capital X
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
227 # 4) deletions in the translated regions: a single lower-case letter
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
228 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
229 # if reverse strand ORF, save a lot of hassle by reversing the sequence
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
230 if($gene->strand() == -1) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
231 $seq = $seq->revcom();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
232 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
233 my $seqstr = $seq->seq();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
234 while($seqstr =~ /^([a-z]*)([A-Z].*)$/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
235 # leading 5'UTR
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
236 my $utr5 = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
237 # exon + 3'UTR
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
238 my $exonseq = $2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
239 # strip 3'UTR and following exons
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
240 if($exonseq =~ s/([a-z]{2,}.*)$//) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
241 $seqstr = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
242 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
243 $seqstr = "";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
244 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
245 # start: take care of yielding the absolute coordinate
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
246 my $start = CORE::length($utr5) + 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
247 if($predobj) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
248 $start += $predobj->end() + $numins;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
249 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
250 # for the end coordinate, we need to subtract the insertions
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
251 $cds = $exonseq;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
252 $cds =~ s/[X]//g;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
253 my $end = $start + CORE::length($cds) - 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
254 # construct next exon object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
255 $predobj = Bio::Tools::Prediction::Exon->new('-start' => $start,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
256 '-end' => $end);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
257 $predobj->source_tag("ESTScan");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
258 $predobj->primary_tag("InternalExon");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
259 $predobj->seq_id($seq->display_id());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
260 $predobj->strand($gene->strand());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
261 $predobj->score($gene->score());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
262 # add the exon to the gene structure object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
263 $gene->add_exon($predobj);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
264 # add the predicted CDS
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
265 $cds = $exonseq;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
266 $cds =~ s/[a-z]//g; # remove the deletions, but keep the insertions
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
267 $cds = Bio::PrimarySeq->new('-seq' => $cds,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
268 '-display_id' => $seq->display_id(),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
269 '-desc' => $seq->desc(),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
270 '-alphabet' => "dna");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
271 # only store the first one in the overall prediction
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
272 $gene->predicted_cds($cds) unless $gene->predicted_cds();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
273 $predobj->predicted_cds($cds);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
274 # add the predicted insertions and deletions as subfeatures
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
275 # of the exon
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
276 my $fea = undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
277 while($exonseq =~ /([a-zX])/g) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
278 my $indel = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
279 # start and end: start looking at the position after the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
280 # previous feature
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
281 if($fea) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
282 $start = $fea->start()+$numins;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
283 $start -= 1 if($fea->primary_tag() eq 'insertion');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
284 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
285 $start = $predobj->start()+$numins-1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
286 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
287 #print "# numins = $numins, indel = $indel, start = $start\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
288 $start = index($seq->seq(), $indel, $start) + 1 - $numins;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
289 $fea = Bio::SeqFeature::Generic->new('-start' => $start,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
290 '-end' => $start);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
291 $fea->source_tag("ESTScan");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
292 $fea->seq_id($seq->display_id());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
293 $fea->strand($predobj->strand());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
294 if($indel eq 'X') {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
295 # an insertion (depends on viewpoint: to get the 'real'
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
296 # CDS, a base has to be inserted, i.e., the HMMER model
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
297 # inserted a base; however, the sequencing process deleted
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
298 # a base that was there).
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
299 $fea->primary_tag("insertion");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
300 # we need to count insertions because these are left out
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
301 # of any coordinates saved in the objects (which is correct
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
302 # because insertions change the original sequence, so
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
303 # coordinates wouldn't match)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
304 $numins++;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
305 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
306 # a deletion (depends on viewpoint: to get the 'real'
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
307 # CDS, a base has to be deleted, i.e., the HMMER model
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
308 # deleted a base; however, the sequencing process inserted
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
309 # a base that wasn't there).
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
310 $fea->primary_tag("deletion");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
311 $fea->add_tag_value('base', $indel);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
312 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
313 $predobj->add_sub_SeqFeature($fea);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
314 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
315 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
316 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
317
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
318 return $gene;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
319 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
320
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
321 =head2 close
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
322
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
323 Title : close
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
324 Usage : $result->close()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
325 Function: Closes the file handle associated with this result file.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
326 Inherited method, overridden.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
327 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
328 Returns :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
329 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
330
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
331 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
332
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
333 sub close {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
334 my ($self, @args) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
335
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
336 delete($self->{'_fastastream'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
337 $self->SUPER::close(@args);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
338 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
339
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
340 =head2 _fasta_stream
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
341
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
342 Title : _fasta_stream
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
343 Usage : $result->_fasta_stream()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
344 Function: Gets/Sets the FASTA sequence IO stream for reading the contents of
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
345 the file associated with this MZEF result object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
346
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
347 If called for the first time, creates the stream from the filehandle
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
348 if necessary.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
349 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
350 Returns :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
351 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
352
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
353 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
354
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
355 sub _fasta_stream {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
356 my ($self, $stream) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
357
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
358 if($stream || (! exists($self->{'_fastastream'}))) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
359 if(! $stream) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
360 $stream = Bio::SeqIO->new('-fh' => $self->_fh(),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
361 '-format' => "fasta");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
362 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
363 $self->{'_fastastream'} = $stream;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
364 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
365 return $self->{'_fastastream'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
366 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
367
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
368 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
369