annotate variant_effect_predictor/Bio/SeqIO/genbank.pm @ 3:d30fa12e4cc5 default tip

Merge heads 2:a5976b2dce6f and 1:09613ce8151e which were created as a result of a recently fixed bug.
author devteam <devteam@galaxyproject.org>
date Mon, 13 Jan 2014 10:38:30 -0500
parents 1f6dce3d34e0
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1 # $Id: genbank.pm,v 1.76.2.12 2003/09/13 23:33:04 jason Exp $
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
2 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
3 # BioPerl module for Bio::SeqIO::GenBank
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
4 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
5 # Cared for by Elia Stupka <elia@tll.org.sg>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
6 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
7 # Copyright Elia Stupka
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
8 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
9 # You may distribute this module under the same terms as perl itself
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
10
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
11 # POD documentation - main docs before the code
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
12
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
13 =head1 NAME
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
14
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
15 Bio::SeqIO::GenBank - GenBank sequence input/output stream
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
16
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
17 =head1 SYNOPSIS
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
18
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
19 It is probably best not to use this object directly, but
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
20 rather go through the SeqIO handler system. Go:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
21
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
22 $stream = Bio::SeqIO->new(-file => $filename, -format => 'GenBank');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
23
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
24 while ( my $seq = $stream->next_seq() ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
25 # do something with $seq
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
26 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
27
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
28
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
29 =head1 DESCRIPTION
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
30
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
31 This object can transform Bio::Seq objects to and from GenBank flat
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
32 file databases.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
33
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
34 There is alot of flexibility here about how to dump things which I need
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
35 to document fully.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
36
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
37 =head2 Mapping of record properties to object properties
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
38
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
39 This section is supposed to document which sections and properties of
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
40 a GenBank databank record end up where in the Bioperl object model. It
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
41 is far from complete and presently focuses only on those mappings
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
42 which may be non-obvious. $seq in the text refers to the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
43 Bio::Seq::RichSeqI implementing object returned by the parser for each
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
44 record.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
45
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
46 =over 4
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
47
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
48 =item GI number
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
49
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
50 $seq-E<gt>primary_id
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
51
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
52 =back
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
53
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
54 =head2 Optional functions
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
55
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
56 =over 3
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
57
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
58 =item _show_dna()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
59
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
60 (output only) shows the dna or not
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
61
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
62 =item _post_sort()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
63
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
64 (output only) provides a sorting func which is applied to the FTHelpers
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
65 before printing
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
66
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
67 =item _id_generation_func()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
68
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
69 This is function which is called as
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
70
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
71 print "ID ", $func($seq), "\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
72
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
73 To generate the ID line. If it is not there, it generates a sensible ID
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
74 line using a number of tools.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
75
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
76
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
77 If you want to output annotations in genbank format they need to be
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
78 stored in a Bio::Annotation::Collection object which is accessible
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
79 through the Bio::SeqI interface method L<annotation()|annotation>.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
80
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
81 The following are the names of the keys which are polled from a
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
82 L<Bio::Annotation::Collection> object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
83
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
84 reference - Should contain Bio::Annotation::Reference objects
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
85 comment - Should contain Bio::Annotation::Comment objects
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
86
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
87 segment - Should contain a Bio::Annotation::SimpleValue object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
88 origin - Should contain a Bio::Annotation::SimpleValue object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
89
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
90 =back
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
91
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
92 =head1 Where does the data go?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
93
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
94 Data parsed in Bio::SeqIO::genbank is stored in a variety of data
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
95 fields in the sequence object that is returned. More information in
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
96 the HOWTOs about exactly what each field means and where it goes.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
97 Here is a partial list of fields.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
98
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
99 Items listed as RichSeq or Seq or PrimarySeq and then NAME() tell you
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
100 the top level object which defines a function called NAME() which
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
101 stores this information.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
102
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
103 Items listed as Annotation 'NAME' tell you the data is stored the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
104 associated Bio::Annotation::Colection object which is associated with
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
105 Bio::Seq objects. If it is explictly requested that no annotations
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
106 should be stored when parsing a record of course they won't be
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
107 available when you try and get them. If you are having this problem
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
108 look at the type of SeqBuilder that is being used to contruct your
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
109 sequence object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
110
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
111 Comments Annotation 'comment'
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
112 References Annotation 'reference'
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
113 Segment Annotation 'segment'
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
114 Origin Annotation 'origin'
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
115
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
116 Accessions PrimarySeq accession_number()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
117 Secondary accessions RichSeq get_secondary_accessions()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
118 Keywords RichSeq keywords()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
119 Dates RichSeq get_dates()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
120 Molecule RichSeq molecule()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
121 Seq Version RichSeq seq_version()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
122 PID RichSeq pid()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
123 Division RichSeq division()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
124 Features Seq get_SeqFeatures()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
125 Alphabet PrimarySeq alphabet()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
126 Definition PrimarySeq description() or desc()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
127 Version PrimarySeq version()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
128
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
129 Sequence PrimarySeq seq()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
130
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
131 =head1 FEEDBACK
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
132
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
133 =head2 Mailing Lists
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
134
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
135 User feedback is an integral part of the evolution of this
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
136 and other Bioperl modules. Send your comments and suggestions preferably
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
137 to one of the Bioperl mailing lists.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
138 Your participation is much appreciated.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
139
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
140 bioperl-l@bioperl.org - General discussion
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
141 http://www.bioperl.org/MailList.shtml - About the mailing lists
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
142
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
143 =head2 Reporting Bugs
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
144
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
145 Report bugs to the Bioperl bug tracking system to help us keep track
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
146 the bugs and their resolution.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
147 Bug reports can be submitted via email or the web:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
148
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
149 bioperl-bugs@bio.perl.org
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
150 http://bugzilla.bioperl.org/
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
151
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
152 =head1 AUTHOR - Elia Stupka
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
153
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
154 Email elia@tll.org.sg
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
155
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
156 =head1 CONTRIBUTORS
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
157
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
158 Ewan Birney birney@ebi.ac.uk
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
159 Jason Stajich jason@bioperl.org
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
160 Chris Mungall cjm@fruitfly.bdgp.berkeley.edu
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
161 Lincoln Stein lstein@cshl.org
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
162 Heikki Lehvaslaiho, heikki@ebi.ac.uk
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
163 Hilmar Lapp, hlapp@gmx.net
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
164 Donald G. Jackson, donald.jackson@bms.com
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
165
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
166 =head1 APPENDIX
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
167
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
168 The rest of the documentation details each of the object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
169 methods. Internal methods are usually preceded with a _
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
170
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
171 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
172
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
173 # Let the code begin...
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
174
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
175 package Bio::SeqIO::genbank;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
176 use vars qw(@ISA);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
177 use strict;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
178
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
179 use Bio::SeqIO;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
180 use Bio::SeqIO::FTHelper;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
181 use Bio::SeqFeature::Generic;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
182 use Bio::Species;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
183 use Bio::Seq::SeqFactory;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
184 use Bio::Annotation::Collection;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
185 use Bio::Annotation::Comment;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
186 use Bio::Annotation::Reference;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
187 use Bio::Annotation::DBLink;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
188
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
189 @ISA = qw(Bio::SeqIO);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
190
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
191 sub _initialize {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
192 my($self,@args) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
193
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
194 $self->SUPER::_initialize(@args);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
195 # hash for functions for decoding keys.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
196 $self->{'_func_ftunit_hash'} = {};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
197 $self->_show_dna(1); # sets this to one by default. People can change it
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
198 if( ! defined $self->sequence_factory ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
199 $self->sequence_factory(new Bio::Seq::SeqFactory
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
200 (-verbose => $self->verbose(),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
201 -type => 'Bio::Seq::RichSeq'));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
202 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
203 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
204
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
205 =head2 next_seq
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
206
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
207 Title : next_seq
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
208 Usage : $seq = $stream->next_seq()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
209 Function: returns the next sequence in the stream
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
210 Returns : Bio::Seq object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
211 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
212
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
213 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
214
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
215 sub next_seq {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
216 my ($self,@args) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
217 my $builder = $self->sequence_builder();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
218 my $seq;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
219 my %params;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
220
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
221 RECORDSTART: while (1) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
222 my $buffer;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
223 my (@acc, @features);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
224 my ($display_id, $annotation);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
225 my $species;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
226
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
227 # initialize; we may come here because of starting over
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
228 @features = ();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
229 $annotation = undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
230 @acc = ();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
231 $species = undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
232 %params = (-verbose => $self->verbose); # reset hash
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
233 local($/) = "\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
234 while(defined($buffer = $self->_readline())) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
235 last if index($buffer,'LOCUS ') == 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
236 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
237 return undef if( !defined $buffer ); # end of file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
238 $buffer =~ /^LOCUS\s+(\S.*)$/ ||
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
239 $self->throw("GenBank stream with bad LOCUS line. Not GenBank in my book. Got '$buffer'");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
240
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
241 my @tokens = split(' ', $1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
242
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
243 # this is important to have the id for display in e.g. FTHelper,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
244 # otherwise you won't know which entry caused an error
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
245 $display_id = shift(@tokens);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
246 $params{'-display_id'} = $display_id;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
247 # may still be useful if we don't want the seq
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
248 $params{'-length'} = shift(@tokens);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
249 # the alphabet of the entry
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
250 $params{'-alphabet'} = (lc(shift @tokens) eq 'bp') ? 'dna' : 'protein';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
251 # for aa there is usually no 'molecule' (mRNA etc)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
252 if (($params{'-alphabet'} eq 'dna') || (@tokens > 2)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
253 $params{'-molecule'} = shift(@tokens);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
254 my $circ = shift(@tokens);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
255 if ($circ eq 'circular') {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
256 $params{'-is_circular'} = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
257 $params{'-division'} = shift(@tokens);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
258 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
259 # 'linear' or 'circular' may actually be omitted altogether
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
260 $params{'-division'} =
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
261 (CORE::length($circ) == 3 ) ? $circ : shift(@tokens);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
262 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
263 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
264 $params{'-molecule'} = 'PRT' if($params{'-alphabet'} eq 'aa');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
265 $params{'-division'} = shift(@tokens);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
266 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
267 my $date = join(' ', @tokens); # we lump together the rest
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
268 # this is per request bug #1513
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
269 # we can handle
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
270 # 9-10-2003
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
271 # 9-10-03
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
272 #09-10-2003
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
273 #09-10-03
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
274 if($date =~ s/\s*((\d{1,2})-(\w{3})-(\d{2,4})).*/$1/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
275 if( length($date) < 11 ) { # improperly formatted date
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
276 # But we'll be nice and fix it for them
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
277 my ($d,$m,$y) = ($2,$3,$4);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
278 if( length($d) == 1 ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
279 $d = "0$d";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
280 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
281 # guess the century here
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
282 if( length($y) == 2 ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
283 if( $y > 60 ) { # arbitrarily guess that '60' means 1960
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
284 $y = "19$y";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
285 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
286 $y = "20$y";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
287 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
288 $self->warn("Date was malformed, guessing the century for $date to be $y\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
289 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
290 $params{'-dates'} = [join('-',$d,$m,$y)];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
291 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
292 $params{'-dates'} = [$date];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
293 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
294 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
295
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
296 # set them all at once
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
297 $builder->add_slot_value(%params);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
298 %params = ();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
299
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
300 # parse the rest if desired, otherwise start over
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
301 if(! $builder->want_object()) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
302 $builder->make_object();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
303 next RECORDSTART;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
304 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
305
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
306 # set up annotation depending on what the builder wants
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
307 if($builder->want_slot('annotation')) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
308 $annotation = new Bio::Annotation::Collection;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
309 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
310 $buffer = $self->_readline();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
311 until( !defined ($buffer) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
312 $_ = $buffer;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
313
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
314 # Description line(s)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
315 if (/^DEFINITION\s+(\S.*\S)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
316 my @desc = ($1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
317 while ( defined($_ = $self->_readline) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
318 if( /^\s+(.*)/ ) { push (@desc, $1); next };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
319 last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
320 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
321 $builder->add_slot_value(-desc => join(' ', @desc));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
322 # we'll continue right here because DEFINITION always comes
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
323 # at the top of the entry
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
324 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
325 # accession number (there can be multiple accessions)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
326 if( /^ACCESSION\s+(\S.*\S)/ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
327 push(@acc, split(/\s+/,$1));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
328 while( defined($_ = $self->_readline) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
329 /^\s+(.*)/ && do { push (@acc, split(/\s+/,$1)); next };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
330 last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
331 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
332 $buffer = $_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
333 next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
334 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
335 # PID
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
336 elsif( /^PID\s+(\S+)/ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
337 $params{'-pid'} = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
338 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
339 #Version number
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
340 elsif( /^VERSION\s+(.+)$/ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
341 my ($acc,$gi) = split(' ',$1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
342 if($acc =~ /^\w+\.(\d+)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
343 $params{'-version'} = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
344 $params{'-seq_version'} = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
345 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
346 if($gi && (index($gi,"GI:") == 0)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
347 $params{'-primary_id'} = substr($gi,3);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
348 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
349 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
350 #Keywords
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
351 elsif( /^KEYWORDS\s+(.*)/ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
352 my @kw = split(/\s*\;\s*/,$1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
353 while( defined($_ = $self->_readline) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
354 chomp;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
355 /^\s+(.*)/ && do { push (@kw, split(/\s*\;\s*/,$1)); next };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
356 last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
357 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
358
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
359 @kw && $kw[-1] =~ s/\.$//;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
360 $params{'-keywords'} = \@kw;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
361 $buffer = $_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
362 next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
363 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
364 # Organism name and phylogenetic information
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
365 elsif (/^SOURCE/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
366 if($builder->want_slot('species')) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
367 $species = $self->_read_GenBank_Species(\$buffer);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
368 $builder->add_slot_value(-species => $species);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
369 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
370 while(defined($buffer = $self->_readline())) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
371 last if substr($buffer,0,1) ne ' ';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
372 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
373 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
374 next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
375 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
376 #References
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
377 elsif (/^REFERENCE/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
378 if($annotation) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
379 my @refs = $self->_read_GenBank_References(\$buffer);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
380 foreach my $ref ( @refs ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
381 $annotation->add_Annotation('reference',$ref);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
382 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
383 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
384 while(defined($buffer = $self->_readline())) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
385 last if substr($buffer,0,1) ne ' ';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
386 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
387 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
388 next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
389 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
390 #Comments
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
391 elsif (/^COMMENT\s+(.*)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
392 if($annotation) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
393 my $comment = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
394 while (defined($_ = $self->_readline)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
395 last if (/^\S/);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
396 $comment .= $_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
397 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
398 $comment =~ s/\n/ /g;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
399 $comment =~ s/ +/ /g;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
400 $annotation->add_Annotation(
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
401 'comment',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
402 Bio::Annotation::Comment->new(-text => $comment));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
403 $buffer = $_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
404 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
405 while(defined($buffer = $self->_readline())) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
406 last if substr($buffer,0,1) ne ' ';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
407 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
408 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
409 next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
410 } elsif( /^SEGMENT\s+(.+)/ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
411 if($annotation) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
412 my $segment = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
413 while (defined($_ = $self->_readline)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
414 last if (/^\S/);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
415 $segment .= $_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
416 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
417 $segment =~ s/\n/ /g;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
418 $segment =~ s/ +/ /g;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
419 $annotation->add_Annotation(
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
420 'segment',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
421 Bio::Annotation::SimpleValue->new(-value => $segment));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
422 $buffer = $_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
423 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
424 while(defined($buffer = $self->_readline())) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
425 last if substr($buffer,0,1) ne ' ';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
426 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
427 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
428 next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
429 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
430
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
431 # Exit at start of Feature table, or start of sequence
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
432 last if( /^(FEATURES|ORIGIN)/ );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
433 # Get next line and loop again
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
434 $buffer = $self->_readline;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
435 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
436 return undef if(! defined($buffer));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
437
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
438 # add them all at once for efficiency
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
439 $builder->add_slot_value(-accession_number => shift(@acc),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
440 -secondary_accessions => \@acc,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
441 %params);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
442 $builder->add_slot_value(-annotation => $annotation) if $annotation;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
443 %params = (); # reset before possible re-use to avoid setting twice
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
444
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
445 # start over if we don't want to continue with this entry
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
446 if(! $builder->want_object()) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
447 $builder->make_object();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
448 next RECORDSTART;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
449 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
450
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
451 # some "minimal" formats may not necessarily have a feature table
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
452 if($builder->want_slot('features') && defined($_) && /^FEATURES/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
453 # need to read the first line of the feature table
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
454 $buffer = $self->_readline;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
455
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
456 # DO NOT read lines in the while condition -- this is done as a side
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
457 # effect in _read_FTHelper_GenBank!
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
458 while( defined($buffer) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
459 # check immediately -- not at the end of the loop
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
460 # note: GenPept entries obviously do not have a BASE line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
461 last if(($buffer =~ /^BASE/) || ($buffer =~ /^ORIGIN/) ||
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
462 ($buffer =~ /^CONTIG/) );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
463
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
464 # slurp in one feature at a time -- at return, the start of
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
465 # the next feature will have been read already, so we need
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
466 # to pass a reference, and the called method must set this
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
467 # to the last line read before returning
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
468
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
469 my $ftunit = $self->_read_FTHelper_GenBank(\$buffer);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
470
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
471 # fix suggested by James Diggans
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
472
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
473 if( !defined $ftunit ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
474 # GRRRR. We have fallen over. Try to recover
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
475 $self->warn("Unexpected error in feature table for ".$params{'-display_id'}." Skipping feature, attempting to recover");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
476 unless( ($buffer =~ /^\s{5,5}\S+/) or ($buffer =~ /^\S+/)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
477 $buffer = $self->_readline();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
478 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
479 next; # back to reading FTHelpers
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
480 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
481
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
482 # process ftunit
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
483 my $feat =
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
484 $ftunit->_generic_seqfeature($self->location_factory(),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
485 $display_id);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
486 # add taxon_id from source if available
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
487 if($species && ($feat->primary_tag eq 'source') &&
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
488 $feat->has_tag('db_xref') && (! $species->ncbi_taxid())) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
489 foreach my $tagval ($feat->get_tag_values('db_xref')) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
490 if(index($tagval,"taxon:") == 0) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
491 $species->ncbi_taxid(substr($tagval,6));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
492 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
493 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
494 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
495 # add feature to list of features
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
496 push(@features, $feat);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
497 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
498 $builder->add_slot_value(-features => \@features);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
499 $_ = $buffer;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
500 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
501 if( defined ($_) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
502 if( /^CONTIG/ && $builder->want_slot('features')) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
503 $b = " $_"; # need 5 spaces to treat it like a feature
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
504 my $ftunit = $self->_read_FTHelper_GenBank(\$b);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
505 if( ! defined $ftunit ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
506 $self->warn("unable to parse the CONTIG feature\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
507 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
508 push(@features,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
509 $ftunit->_generic_seqfeature($self->location_factory(),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
510 $display_id));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
511 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
512 } elsif(! /^(ORIGIN|\/\/)/ ) { # advance to the sequence, if any
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
513 while (defined( $_ = $self->_readline) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
514 last if /^(ORIGIN|\/\/)/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
515 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
516 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
517 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
518 if(! $builder->want_object()) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
519 $builder->make_object(); # implicit end-of-object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
520 next RECORDSTART;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
521 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
522 if($builder->want_slot('seq')) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
523 # the fact that we want a sequence does not necessarily mean that
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
524 # there also is a sequence ...
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
525 if(defined($_) && s/^ORIGIN//) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
526 chomp;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
527 if( $annotation && length($_) > 0 ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
528 $annotation->add_Annotation('origin',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
529 Bio::Annotation::SimpleValue->new(-value => $_));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
530 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
531 my $seqc = '';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
532 while( defined($_ = $self->_readline) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
533 /^\/\// && last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
534 $_ = uc($_);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
535 s/[^A-Za-z]//g;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
536 $seqc .= $_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
537 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
538 $self->debug("sequence length is ". length($seqc) ."\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
539 $builder->add_slot_value(-seq => $seqc);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
540 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
541 } elsif ( defined($_) && (substr($_,0,2) ne '//')) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
542 # advance to the end of the record
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
543 while( defined($_ = $self->_readline) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
544 last if substr($_,0,2) eq '//';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
545 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
546 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
547 # Unlikely, but maybe the sequence is so weird that we don't want it
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
548 # anymore. We don't want to return undef if the stream's not exhausted
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
549 # yet.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
550 $seq = $builder->make_object();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
551 next RECORDSTART unless $seq;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
552 last RECORDSTART;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
553 } # end while RECORDSTART
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
554
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
555 return $seq;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
556 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
557
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
558 =head2 write_seq
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
559
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
560 Title : write_seq
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
561 Usage : $stream->write_seq($seq)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
562 Function: writes the $seq object (must be seq) to the stream
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
563 Returns : 1 for success and 0 for error
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
564 Args : array of 1 to n Bio::SeqI objects
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
565
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
566
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
567 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
568
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
569 sub write_seq {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
570 my ($self,@seqs) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
571
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
572 foreach my $seq ( @seqs ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
573 $self->throw("Attempting to write with no seq!") unless defined $seq;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
574
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
575 if( ! ref $seq || ! $seq->isa('Bio::SeqI') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
576 $self->warn(" $seq is not a SeqI compliant module. Attempting to dump, but may fail!");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
577 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
578
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
579 my $str = $seq->seq;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
580
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
581 my ($div, $mol);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
582 my $len = $seq->length();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
583
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
584 if ( $seq->can('division') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
585 $div=$seq->division;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
586 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
587 if( !defined $div || ! $div ) { $div = 'UNK'; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
588 my $alpha = $seq->alphabet;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
589 if( !$seq->can('molecule') || ! defined ($mol = $seq->molecule()) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
590 $mol = $alpha || 'DNA';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
591 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
592
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
593 my $circular = 'linear ';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
594 $circular = 'circular' if $seq->is_circular;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
595
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
596 local($^W) = 0; # supressing warnings about uninitialized fields.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
597
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
598 my $temp_line;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
599 if( $self->_id_generation_func ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
600 $temp_line = &{$self->_id_generation_func}($seq);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
601 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
602 my $date = '';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
603 if( $seq->can('get_dates') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
604 ($date) = $seq->get_dates();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
605 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
606 $temp_line = sprintf ("%-12s%-15s%13s %s%4s%-8s%-8s %3s %-s",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
607 'LOCUS', $seq->id(),$len,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
608 (lc($alpha) eq 'protein') ? ('aa','', '') :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
609 ('bp', '',$mol),$circular,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
610 $div,$date);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
611 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
612
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
613 $self->_print("$temp_line\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
614 $self->_write_line_GenBank_regex("DEFINITION ", " ",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
615 $seq->desc(),"\\s\+\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
616
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
617 # if there, write the accession line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
618
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
619 if( $self->_ac_generation_func ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
620 $temp_line = &{$self->_ac_generation_func}($seq);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
621 $self->_print("ACCESSION $temp_line\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
622 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
623 my @acc = ();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
624 push(@acc, $seq->accession_number());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
625 if( $seq->isa('Bio::Seq::RichSeqI') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
626 push(@acc, $seq->get_secondary_accessions());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
627 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
628 $self->_print("ACCESSION ", join(" ", @acc), "\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
629 # otherwise - cannot print <sigh>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
630 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
631
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
632 # if PID defined, print it
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
633 if($seq->isa('Bio::Seq::RichSeqI') && $seq->pid()) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
634 $self->_print("PID ", $seq->pid(), "\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
635 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
636
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
637 # if there, write the version line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
638
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
639 if( defined $self->_sv_generation_func() ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
640 $temp_line = &{$self->_sv_generation_func}($seq);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
641 if( $temp_line ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
642 $self->_print("VERSION $temp_line\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
643 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
644 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
645 if($seq->isa('Bio::Seq::RichSeqI') && defined($seq->seq_version)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
646 my $id = $seq->primary_id(); # this may be a GI number
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
647 $self->_print("VERSION ",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
648 $seq->accession_number(), ".", $seq->seq_version,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
649 ($id && ($id =~ /^\d+$/) ? " GI:".$id : ""),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
650 "\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
651 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
652 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
653
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
654 # if there, write the keywords line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
655
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
656 if( defined $self->_kw_generation_func() ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
657 $temp_line = &{$self->_kw_generation_func}($seq);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
658 $self->_print("KEYWORDS $temp_line\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
659 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
660 if( $seq->can('keywords') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
661 my $kw = $seq->keywords;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
662 if( ref($kw) =~ /ARRAY/i ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
663 $kw = join("; ", @$kw);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
664 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
665 $kw .= '.' if( $kw !~ /\.$/ );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
666 $self->_print("KEYWORDS $kw\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
667 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
668 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
669
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
670 # SEGMENT if it exists
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
671 foreach my $ref ( $seq->annotation->get_Annotations('segment') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
672 $self->_print(sprintf ("%-11s %s\n",'SEGMENT',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
673 $ref->value));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
674 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
675
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
676 # Organism lines
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
677 if (my $spec = $seq->species) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
678 my ($species, $genus, @class) = $spec->classification();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
679 my $OS;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
680 if( $spec->common_name ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
681 $OS = $spec->common_name;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
682 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
683 $OS = "$genus $species";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
684 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
685 if (my $ssp = $spec->sub_species) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
686 $OS .= " $ssp";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
687 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
688 $self->_print("SOURCE $OS\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
689 $self->_print(" ORGANISM ",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
690 ($spec->organelle() ? $spec->organelle()." " : ""),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
691 "$genus $species", "\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
692 my $OC = join('; ', (reverse(@class), $genus)) .'.';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
693 $self->_write_line_GenBank_regex(' 'x12,' 'x12,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
694 $OC,"\\s\+\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
695 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
696
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
697 # Reference lines
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
698 my $count = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
699 foreach my $ref ( $seq->annotation->get_Annotations('reference') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
700 $temp_line = sprintf ("REFERENCE $count (%s %d to %d)",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
701 ($seq->alphabet() eq "protein" ?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
702 "residues" : "bases"),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
703 $ref->start,$ref->end);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
704 $self->_print("$temp_line\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
705 $self->_write_line_GenBank_regex(" AUTHORS ",' 'x12,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
706 $ref->authors,"\\s\+\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
707 $self->_write_line_GenBank_regex(" TITLE "," "x12,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
708 $ref->title,"\\s\+\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
709 $self->_write_line_GenBank_regex(" JOURNAL "," "x12,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
710 $ref->location,"\\s\+\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
711 if ($ref->comment) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
712 $self->_write_line_GenBank_regex(" REMARK "," "x12,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
713 $ref->comment,"\\s\+\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
714 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
715 if( $ref->medline) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
716 $self->_write_line_GenBank_regex(" MEDLINE "," "x12,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
717 $ref->medline, "\\s\+\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
718 # I am assuming that pubmed entries only exist when there
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
719 # are also MEDLINE entries due to the indentation
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
720 # This could be a wrong assumption
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
721 if( $ref->pubmed ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
722 $self->_write_line_GenBank_regex(" PUBMED "," "x12,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
723 $ref->pubmed, "\\s\+\|\$",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
724 80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
725 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
726 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
727 $count++;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
728 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
729 # Comment lines
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
730
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
731 foreach my $comment ( $seq->annotation->get_Annotations('comment') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
732 $self->_write_line_GenBank_regex("COMMENT "," "x12,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
733 $comment->text,"\\s\+\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
734 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
735 $self->_print("FEATURES Location/Qualifiers\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
736
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
737 my $contig;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
738 if( defined $self->_post_sort ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
739 # we need to read things into an array. Process. Sort them. Print 'em
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
740
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
741 my $post_sort_func = $self->_post_sort();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
742 my @fth;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
743
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
744 foreach my $sf ( $seq->top_SeqFeatures ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
745 push(@fth,Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
746 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
747
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
748 @fth = sort { &$post_sort_func($a,$b) } @fth;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
749
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
750 foreach my $fth ( @fth ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
751 $self->_print_GenBank_FTHelper($fth);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
752 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
753 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
754 # not post sorted. And so we can print as we get them.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
755 # lower memory load...
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
756
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
757 foreach my $sf ( $seq->top_SeqFeatures ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
758 my @fth = Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
759 foreach my $fth ( @fth ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
760 if( ! $fth->isa('Bio::SeqIO::FTHelper') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
761 $sf->throw("Cannot process FTHelper... $fth");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
762 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
763 $self->_print_GenBank_FTHelper($fth);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
764 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
765 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
766 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
767 if( $seq->length == 0 ) { $self->_show_dna(0) }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
768
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
769 if( $self->_show_dna() == 0 ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
770 $self->_print("\n//\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
771 return;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
772 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
773
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
774 # finished printing features.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
775
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
776 $str =~ tr/A-Z/a-z/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
777
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
778 # Count each nucleotide
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
779 unless( $mol eq 'protein' ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
780 my $alen = $str =~ tr/a/a/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
781 my $clen = $str =~ tr/c/c/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
782 my $glen = $str =~ tr/g/g/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
783 my $tlen = $str =~ tr/t/t/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
784
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
785 my $olen = $len - ($alen + $tlen + $clen + $glen);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
786 if( $olen < 0 ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
787 $self->warn("Weird. More atgc than bases. Problem!");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
788 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
789
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
790 my $base_count = sprintf("BASE COUNT %8s a %6s c %6s g %6s t%s\n",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
791 $alen,$clen,$glen,$tlen,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
792 ( $olen > 0 ) ? sprintf("%6s others",$olen) : '');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
793 $self->_print($base_count);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
794 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
795 my ($o) = $seq->annotation->get_Annotations('origin');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
796 $self->_print(sprintf("%-6s%s\n",'ORIGIN',$o ? $o->value : ''));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
797
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
798 # print out the sequence
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
799 my $nuc = 60; # Number of nucleotides per line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
800 my $whole_pat = 'a10' x 6; # Pattern for unpacking a whole line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
801 my $out_pat = 'A11' x 6; # Pattern for packing a line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
802 my $length = length($str);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
803
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
804 # Calculate the number of nucleotides which fit on whole lines
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
805 my $whole = int($length / $nuc) * $nuc;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
806
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
807 # Print the whole lines
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
808 my $i;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
809 for ($i = 0; $i < $whole; $i += $nuc) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
810 my $blocks = pack $out_pat,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
811 unpack $whole_pat,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
812 substr($str, $i, $nuc);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
813 chop $blocks;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
814 $self->_print(sprintf("%9d $blocks\n", $i + $nuc - 59));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
815 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
816
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
817 # Print the last line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
818 if (my $last = substr($str, $i)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
819 my $last_len = length($last);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
820 my $last_pat = 'a10' x int($last_len / 10) .'a'. $last_len % 10;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
821 my $blocks = pack $out_pat,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
822 unpack($last_pat, $last);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
823 $blocks =~ s/ +$//;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
824 $self->_print(sprintf("%9d $blocks\n", $length - $last_len + 1));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
825 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
826
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
827 $self->_print("//\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
828
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
829 $self->flush if $self->_flush_on_write && defined $self->_fh;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
830 return 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
831 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
832 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
833
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
834 =head2 _print_GenBank_FTHelper
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
835
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
836 Title : _print_GenBank_FTHelper
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
837 Usage :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
838 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
839 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
840 Returns :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
841 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
842
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
843
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
844 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
845
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
846 sub _print_GenBank_FTHelper {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
847 my ($self,$fth,$always_quote) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
848
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
849 if( ! ref $fth || ! $fth->isa('Bio::SeqIO::FTHelper') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
850 $fth->warn("$fth is not a FTHelper class. Attempting to print, but there could be tears!");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
851 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
852 if( defined $fth->key &&
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
853 $fth->key eq 'CONTIG' ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
854 $self->_write_line_GenBank_regex(sprintf("%-12s",$fth->key),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
855 ' 'x12,$fth->loc,"\,\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
856 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
857 $self->_write_line_GenBank_regex(sprintf(" %-16s",$fth->key),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
858 " "x21,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
859 $fth->loc,"\,\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
860 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
861
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
862 if( !defined $always_quote) { $always_quote = 0; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
863
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
864 foreach my $tag ( keys %{$fth->field} ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
865 foreach my $value ( @{$fth->field->{$tag}} ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
866 $value =~ s/\"/\"\"/g;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
867 if ($value eq "_no_value") {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
868 $self->_write_line_GenBank_regex(" "x21,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
869 " "x21,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
870 "/$tag","\.\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
871 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
872 elsif( $always_quote == 1 || $value !~ /^\d+$/ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
873 my ($pat) = ($value =~ /\s/ ? '\s|$' : '.|$');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
874 $self->_write_line_GenBank_regex(" "x21,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
875 " "x21,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
876 "/$tag=\"$value\"",$pat,80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
877 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
878 $self->_write_line_GenBank_regex(" "x21,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
879 " "x21,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
880 "/$tag=$value","\.\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
881 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
882 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
883 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
884
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
885 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
886
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
887
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
888 =head2 _read_GenBank_References
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
889
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
890 Title : _read_GenBank_References
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
891 Usage :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
892 Function: Reads references from GenBank format. Internal function really
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
893 Returns :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
894 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
895
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
896
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
897 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
898
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
899 sub _read_GenBank_References{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
900 my ($self,$buffer) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
901 my (@refs);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
902 my $ref;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
903
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
904 # assumme things are starting with RN
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
905
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
906 if( $$buffer !~ /^REFERENCE/ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
907 warn("Not parsing line '$$buffer' which maybe important");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
908 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
909
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
910 $_ = $$buffer;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
911
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
912 my (@title,@loc,@authors,@com,@medline,@pubmed);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
913
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
914 REFLOOP: while( defined($_) || defined($_ = $self->_readline) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
915 if (/^ AUTHORS\s+(.*)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
916 push (@authors, $1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
917 while ( defined($_ = $self->_readline) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
918 /^\s{3,}(.*)/ && do { push (@authors, $1);next;};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
919 last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
920 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
921 $ref->authors(join(' ', @authors));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
922 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
923 if (/^ TITLE\s+(.*)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
924 push (@title, $1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
925 while ( defined($_ = $self->_readline) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
926 /^\s{3,}(.*)/ && do { push (@title, $1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
927 next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
928 };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
929 last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
930 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
931 $ref->title(join(' ', @title));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
932 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
933 if (/^ JOURNAL\s+(.*)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
934 push(@loc, $1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
935 while ( defined($_ = $self->_readline) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
936 /^\s{3,}(.*)/ && do { push(@loc, $1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
937 next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
938 };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
939 last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
940 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
941 $ref->location(join(' ', @loc));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
942 redo REFLOOP;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
943 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
944 if (/^ REMARK\s+(.*)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
945 push (@com, $1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
946 while ( defined($_ = $self->_readline) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
947 /^\s{3,}(.*)/ && do { push(@com, $1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
948 next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
949 };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
950 last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
951 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
952 $ref->comment(join(' ', @com));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
953 redo REFLOOP;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
954 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
955 if( /^ MEDLINE\s+(.*)/ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
956 push(@medline,$1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
957 while ( defined($_ = $self->_readline) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
958 /^\s{4,}(.*)/ && do { push(@medline, $1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
959 next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
960 };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
961 last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
962 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
963 $ref->medline(join(' ', @medline));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
964 redo REFLOOP;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
965 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
966 if( /^ PUBMED\s+(.*)/ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
967 push(@pubmed,$1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
968 while ( defined($_ = $self->_readline) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
969 /^\s{5,}(.*)/ && do { push(@pubmed, $1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
970 next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
971 };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
972 last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
973 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
974 $ref->pubmed(join(' ', @pubmed));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
975 redo REFLOOP;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
976 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
977
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
978 /^REFERENCE/ && do {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
979 # store current reference
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
980 $self->_add_ref_to_array(\@refs,$ref) if $ref;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
981 # reset
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
982 @authors = ();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
983 @title = ();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
984 @loc = ();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
985 @com = ();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
986 @pubmed = ();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
987 @medline = ();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
988 # create the new reference object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
989 $ref = Bio::Annotation::Reference->new();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
990 # check whether start and end base is given
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
991 if (/^REFERENCE\s+\d+\s+\([a-z]+ (\d+) to (\d+)/){
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
992 $ref->start($1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
993 $ref->end($2);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
994 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
995 };
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
996
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
997 /^(FEATURES)|(COMMENT)/ && last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
998
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
999 $_ = undef; # Empty $_ to trigger read of next line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1000 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1001
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1002 # store last reference
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1003 $self->_add_ref_to_array(\@refs,$ref) if $ref;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1004
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1005 $$buffer = $_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1006
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1007 #print "\nnumber of references found: ", $#refs+1,"\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1008
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1009 return @refs;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1010 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1011
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1012 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1013 # This is undocumented as it shouldn't be called by anywhere else as
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1014 # read_GenBank_References. For those who still want to know:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1015 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1016 # Purpose: adds a Reference object to an array of Reference objects, takes
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1017 # care of possible cleanups to be done (currently, only author and title
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1018 # will be chopped of trailing semicolons).
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1019 # Parameters:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1020 # a reference to an array of Reference objects
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1021 # the Reference object to be added
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1022 # Returns: nothing
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1023 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1024 sub _add_ref_to_array {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1025 my ($self, $refs, $ref) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1026
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1027 # first, polish author and title by removing possible trailing semicolons
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1028 my $au = $ref->authors();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1029 my $title = $ref->title();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1030 $au =~ s/;\s*$//g if $au;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1031 $title =~ s/;\s*$//g if $title;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1032 $ref->authors($au);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1033 $ref->title($title);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1034 # the rest should be clean already, so go ahead and add it
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1035 push(@{$refs}, $ref);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1036 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1037
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1038 =head2 _read_GenBank_Species
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1039
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1040 Title : _read_GenBank_Species
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1041 Usage :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1042 Function: Reads the GenBank Organism species and classification
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1043 lines.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1044 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1045 Returns : A Bio::Species object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1046 Args : a reference to the current line buffer
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1047
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1048 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1049
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1050 sub _read_GenBank_Species {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1051 my( $self,$buffer) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1052 my @organell_names = ("chloroplast", "mitochondr");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1053 # only those carrying DNA, apart from the nucleus
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1054
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1055 $_ = $$buffer;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1056
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1057 my( $sub_species, $species, $genus, $common, $organelle, @class );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1058 # upon first entering the loop, we must not read a new line -- the SOURCE
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1059 # line is already in the buffer (HL 05/10/2000)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1060 while (defined($_) || defined($_ = $self->_readline())) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1061 # de-HTMLify (links that may be encountered here don't contain
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1062 # escaped '>', so a simple-minded approach suffices)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1063 s/<[^>]+>//g;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1064 if (/^SOURCE\s+(.*)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1065 # FIXME this is probably mostly wrong (e.g., it yields things like
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1066 # Homo sapiens adult placenta cDNA to mRNA
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1067 # which is certainly not what you want)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1068 $common = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1069 $common =~ s/\.$//; # remove trailing dot
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1070 } elsif (/^\s+ORGANISM/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1071 my @spflds = split(' ', $_);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1072 shift(@spflds); # ORGANISM
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1073 if(grep { $_ =~ /^$spflds[0]/i; } @organell_names) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1074 $organelle = shift(@spflds);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1075 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1076 $genus = shift(@spflds);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1077 if(@spflds) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1078 $species = shift(@spflds);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1079 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1080 $species = "sp.";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1081 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1082 $sub_species = shift(@spflds) if(@spflds);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1083 } elsif (/^\s+(.+)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1084 # only split on ';' or '.' so that
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1085 # classification that is 2 words will
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1086 # still get matched
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1087 # use map to remove trailing/leading spaces
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1088 push(@class, map { s/^\s+//; s/\s+$//; $_; } split /[;\.]+/, $1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1089 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1090 last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1091 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1092
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1093 $_ = undef; # Empty $_ to trigger read of next line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1094 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1095
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1096 $$buffer = $_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1097
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1098 # Don't make a species object if it's empty or "Unknown" or "None"
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1099 return unless $genus and $genus !~ /^(Unknown|None)$/i;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1100
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1101 # Bio::Species array needs array in Species -> Kingdom direction
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1102 if ($class[$#class] eq $genus) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1103 push( @class, $species );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1104 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1105 push( @class, $genus, $species );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1106 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1107 @class = reverse @class;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1108
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1109 my $make = Bio::Species->new();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1110 $make->classification( \@class, "FORCE" ); # no name validation please
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1111 $make->common_name( $common ) if $common;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1112 $make->sub_species( $sub_species ) if $sub_species;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1113 $make->organelle($organelle) if $organelle;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1114 return $make;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1115 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1116
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1117 =head2 _read_FTHelper_GenBank
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1118
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1119 Title : _read_FTHelper_GenBank
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1120 Usage : _read_FTHelper_GenBank($buffer)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1121 Function: reads the next FT key line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1122 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1123 Returns : Bio::SeqIO::FTHelper object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1124 Args : filehandle and reference to a scalar
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1125
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1126
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1127 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1128
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1129 sub _read_FTHelper_GenBank {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1130 my ($self,$buffer) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1131
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1132 my ($key, # The key of the feature
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1133 $loc # The location line from the feature
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1134 );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1135 my @qual = (); # An arrray of lines making up the qualifiers
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1136
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1137 if ($$buffer =~ /^ (\S+)\s+(.+?)\s*$/o) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1138 $key = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1139 $loc = $2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1140 # Read all the lines up to the next feature
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1141 while ( defined($_ = $self->_readline) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1142 if (/^(\s+)(.+?)\s*$/o) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1143 # Lines inside features are preceded by 21 spaces
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1144 # A new feature is preceded by 5 spaces
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1145 if (length($1) > 6) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1146 # Add to qualifiers if we're in the qualifiers, or if it's
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1147 # the first qualifier
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1148 if (@qual || (index($2,'/') == 0)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1149 push(@qual, $2);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1150 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1151 # We're still in the location line, so append to location
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1152 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1153 $loc .= $2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1154 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1155 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1156 # We've reached the start of the next feature
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1157 last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1158 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1159 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1160 # We're at the end of the feature table
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1161 last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1162 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1163 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1164 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1165 # No feature key
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1166 $self->debug("no feature key!\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1167 # change suggested by JDiggans to avoid infinite loop-
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1168 # see bugreport 1062.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1169 # reset buffer to prevent infinite loop
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1170 $$buffer = $self->_readline();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1171 return;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1172 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1173
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1174 # Put the first line of the next feature into the buffer
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1175 $$buffer = $_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1176
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1177 # Make the new FTHelper object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1178 my $out = new Bio::SeqIO::FTHelper();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1179 $out->verbose($self->verbose());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1180 $out->key($key);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1181 $out->loc($loc);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1182
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1183 # Now parse and add any qualifiers. (@qual is kept
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1184 # intact to provide informative error messages.)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1185 QUAL: for (my $i = 0; $i < @qual; $i++) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1186 $_ = $qual[$i];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1187 my( $qualifier, $value ) = (m{^/([^=]+)(?:=(.+))?})
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1188 or $self->warn("cannot see new qualifier in feature $key: ".
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1189 $qual[$i]);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1190 #or $self->throw("Can't see new qualifier in: $_\nfrom:\n"
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1191 # . join('', map "$_\n", @qual));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1192 $qualifier = '' unless( defined $qualifier);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1193 if (defined $value) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1194 # Do we have a quoted value?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1195 if (substr($value, 0, 1) eq '"') {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1196 # Keep adding to value until we find the trailing quote
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1197 # and the quotes are balanced
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1198 while ($value !~ /\"$/ or $value =~ tr/"/"/ % 2) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1199 if($i >= $#qual) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1200 $self->warn("Unbalanced quote in:\n" .
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1201 join('', map("$_\n", @qual)) .
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1202 "No further qualifiers will " .
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1203 "be added for this feature");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1204 last QUAL;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1205 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1206 $i++; # modifying a for-loop variable inside of the loop
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1207 # is not the best programming style ...
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1208 my $next = $qual[$i];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1209
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1210 # add to value with a space unless the value appears
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1211 # to be a sequence (translation for example)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1212 if(($value.$next) =~ /[^A-Za-z"-]/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1213 $value .= " ";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1214 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1215 $value .= $next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1216 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1217 # Trim leading and trailing quotes
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1218 $value =~ s/^"|"$//g;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1219 # Undouble internal quotes
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1220 $value =~ s/""/\"/g;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1221 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1222 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1223 $value = '_no_value';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1224 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1225 # Store the qualifier
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1226 $out->field->{$qualifier} ||= [];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1227 push(@{$out->field->{$qualifier}},$value);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1228 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1229 return $out;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1230 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1231
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1232 =head2 _write_line_GenBank
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1233
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1234 Title : _write_line_GenBank
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1235 Usage :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1236 Function: internal function
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1237 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1238 Returns :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1239 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1240
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1241
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1242 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1243
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1244 sub _write_line_GenBank{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1245 my ($self,$pre1,$pre2,$line,$length) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1246
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1247 $length || $self->throw("Miscalled write_line_GenBank without length. Programming error!");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1248 my $subl = $length - length $pre2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1249 my $linel = length $line;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1250 my $i;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1251
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1252 my $sub = substr($line,0,$length - length $pre1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1253
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1254 $self->_print("$pre1$sub\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1255
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1256 for($i= ($length - length $pre1);$i < $linel;) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1257 $sub = substr($line,$i,($subl));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1258 $self->_print("$pre2$sub\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1259 $i += $subl;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1260 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1261
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1262 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1263
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1264 =head2 _write_line_GenBank_regex
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1265
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1266 Title : _write_line_GenBank_regex
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1267 Usage :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1268 Function: internal function for writing lines of specified
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1269 length, with different first and the next line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1270 left hand headers and split at specific points in the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1271 text
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1272 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1273 Returns : nothing
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1274 Args : file handle, first header, second header, text-line, regex for line breaks, total line length
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1275
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1276
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1277 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1278
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1279 sub _write_line_GenBank_regex {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1280 my ($self,$pre1,$pre2,$line,$regex,$length) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1281
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1282 #print STDOUT "Going to print with $line!\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1283
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1284 $length || $self->throw( "Miscalled write_line_GenBank without length. Programming error!");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1285
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1286 # if( length $pre1 != length $pre2 ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1287 # $self->throw( "Programming error - cannot called write_line_GenBank_regex with different length pre1 and pre2 tags!");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1288 # }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1289
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1290 my $subl = $length - (length $pre1) - 2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1291 my @lines = ();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1292
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1293 CHUNK: while($line) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1294 foreach my $pat ($regex, '[,;\.\/-]\s|'.$regex, '[,;\.\/-]|'.$regex) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1295 if($line =~ m/^(.{1,$subl})($pat)(.*)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1296 $line = $3;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1297 # be strict about not padding spaces according to
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1298 # genbank format
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1299 my $l = $1.$2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1300 $l =~ s/\s+$//;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1301 push(@lines, $l);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1302 next CHUNK;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1303 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1304 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1305 # if we get here none of the patterns matched $subl or less chars
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1306 $self->warn("trouble dissecting \"$line\" into chunks ".
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1307 "of $subl chars or less - this tag won't print right");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1308 # insert a space char to prevent infinite loops
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1309 $line = substr($line,0,$subl) . " " . substr($line,$subl);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1310 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1311
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1312 my $s = shift @lines;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1313 $self->_print("$pre1$s\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1314 foreach my $s ( @lines ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1315 $self->_print("$pre2$s\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1316 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1317 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1318
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1319 =head2 _post_sort
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1320
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1321 Title : _post_sort
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1322 Usage : $obj->_post_sort($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1323 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1324 Returns : value of _post_sort
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1325 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1326
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1327
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1328 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1329
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1330 sub _post_sort{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1331 my ($obj,$value) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1332 if( defined $value) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1333 $obj->{'_post_sort'} = $value;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1334 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1335 return $obj->{'_post_sort'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1336 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1337
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1338 =head2 _show_dna
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1339
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1340 Title : _show_dna
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1341 Usage : $obj->_show_dna($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1342 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1343 Returns : value of _show_dna
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1344 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1345
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1346
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1347 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1348
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1349 sub _show_dna{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1350 my ($obj,$value) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1351 if( defined $value) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1352 $obj->{'_show_dna'} = $value;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1353 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1354 return $obj->{'_show_dna'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1355 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1356
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1357 =head2 _id_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1358
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1359 Title : _id_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1360 Usage : $obj->_id_generation_func($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1361 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1362 Returns : value of _id_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1363 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1364
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1365
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1366 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1367
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1368 sub _id_generation_func{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1369 my ($obj,$value) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1370 if( defined $value ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1371 $obj->{'_id_generation_func'} = $value;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1372 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1373 return $obj->{'_id_generation_func'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1374 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1375
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1376 =head2 _ac_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1377
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1378 Title : _ac_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1379 Usage : $obj->_ac_generation_func($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1380 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1381 Returns : value of _ac_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1382 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1383
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1384
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1385 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1386
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1387 sub _ac_generation_func{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1388 my ($obj,$value) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1389 if( defined $value ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1390 $obj->{'_ac_generation_func'} = $value;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1391 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1392 return $obj->{'_ac_generation_func'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1393 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1394
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1395 =head2 _sv_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1396
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1397 Title : _sv_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1398 Usage : $obj->_sv_generation_func($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1399 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1400 Returns : value of _sv_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1401 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1402
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1403
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1404 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1405
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1406 sub _sv_generation_func{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1407 my ($obj,$value) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1408 if( defined $value ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1409 $obj->{'_sv_generation_func'} = $value;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1410 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1411 return $obj->{'_sv_generation_func'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1412
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1413 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1414
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1415 =head2 _kw_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1416
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1417 Title : _kw_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1418 Usage : $obj->_kw_generation_func($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1419 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1420 Returns : value of _kw_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1421 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1422
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1423
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1424 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1425
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1426 sub _kw_generation_func{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1427 my ($obj,$value) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1428 if( defined $value ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1429 $obj->{'_kw_generation_func'} = $value;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1430 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1431 return $obj->{'_kw_generation_func'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1432 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1433
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1434 1;