annotate variant_effect_predictor/Bio/SeqIO/swiss.pm @ 0:1f6dce3d34e0

Uploaded
author mahtabm
date Thu, 11 Apr 2013 02:01:53 -0400
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1 # $Id: swiss.pm,v 1.66.2.4 2003/09/13 22:16:43 jason Exp $
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
2 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
3 # BioPerl module for Bio::SeqIO::swiss
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
4 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
5 # Cared for by Elia Stupka <elia@tll.org.sg>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
6 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
7 # Copyright Elia Stupka
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
8 #
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
9 # You may distribute this module under the same terms as perl itself
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
10
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
11 # POD documentation - main docs before the code
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
12
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
13 =head1 NAME
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
14
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
15 Bio::SeqIO::swiss - Swissprot sequence input/output stream
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
16
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
17 =head1 SYNOPSIS
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
18
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
19 It is probably best not to use this object directly, but
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
20 rather go through the SeqIO handler system. Go:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
21
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
22 $stream = Bio::SeqIO->new(-file => $filename, -format => 'swiss');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
23
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
24 while ( my $seq = $stream->next_seq() ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
25 # do something with $seq
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
26 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
27
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
28 =head1 DESCRIPTION
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
29
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
30 This object can transform Bio::Seq objects to and from swissprot flat
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
31 file databases.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
32
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
33 There is a lot of flexibility here about how to dump things which I need
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
34 to document fully.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
35
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
36
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
37 =head2 Optional functions
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
38
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
39 =over 3
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
40
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
41 =item _show_dna()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
42
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
43 (output only) shows the dna or not
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
44
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
45 =item _post_sort()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
46
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
47 (output only) provides a sorting func which is applied to the FTHelpers
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
48 before printing
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
49
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
50 =item _id_generation_func()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
51
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
52 This is function which is called as
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
53
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
54 print "ID ", $func($seq), "\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
55
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
56 To generate the ID line. If it is not there, it generates a sensible ID
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
57 line using a number of tools.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
58
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
59 If you want to output annotations in swissprot format they need to be
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
60 stored in a Bio::Annotation::Collection object which is accessible
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
61 through the Bio::SeqI interface method L<annotation()|annotation>.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
62
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
63 The following are the names of the keys which are polled from a
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
64 L<Bio::Annotation::Collection> object.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
65
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
66 reference - Should contain Bio::Annotation::Reference objects
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
67 comment - Should contain Bio::Annotation::Comment objects
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
68 dblink - Should contain Bio::Annotation::DBLink objects
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
69 gene_name - Should contain Bio::Annotation::SimpleValue object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
70
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
71 =back
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
72
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
73 =head1 FEEDBACK
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
74
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
75 =head2 Mailing Lists
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
76
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
77 User feedback is an integral part of the evolution of this
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
78 and other Bioperl modules. Send your comments and suggestions preferably
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
79 to one of the Bioperl mailing lists.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
80 Your participation is much appreciated.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
81
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
82 bioperl-l@bioperl.org - General discussion
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
83 http://bio.perl.org/MailList.html - About the mailing lists
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
84
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
85 =head2 Reporting Bugs
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
86
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
87 Report bugs to the Bioperl bug tracking system to help us keep track
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
88 the bugs and their resolution.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
89 Bug reports can be submitted via email or the web:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
90
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
91 bioperl-bugs@bio.perl.org
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
92 http://bugzilla.bioperl.org/
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
93
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
94 =head1 AUTHOR - Elia Stupka
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
95
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
96 Email elia@tll.org.sg
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
97
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
98 Describe contact details here
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
99
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
100 =head1 APPENDIX
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
101
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
102 The rest of the documentation details each of the object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
103 methods. Internal methods are usually preceded with a _
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
104
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
105 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
106
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
107
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
108 # Let the code begin...
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
109
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
110
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
111 package Bio::SeqIO::swiss;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
112 use vars qw(@ISA);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
113 use strict;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
114 use Bio::SeqIO;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
115 use Bio::SeqIO::FTHelper;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
116 use Bio::SeqFeature::Generic;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
117 use Bio::Species;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
118 use Bio::Tools::SeqStats;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
119 use Bio::Seq::SeqFactory;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
120 use Bio::Annotation::Collection;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
121 use Bio::Annotation::Comment;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
122 use Bio::Annotation::Reference;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
123 use Bio::Annotation::DBLink;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
124 use Bio::Annotation::SimpleValue;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
125 use Bio::Annotation::StructuredValue;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
126
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
127 @ISA = qw(Bio::SeqIO);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
128
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
129
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
130 sub _initialize {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
131 my($self,@args) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
132 $self->SUPER::_initialize(@args);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
133
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
134 # hash for functions for decoding keys.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
135 $self->{'_func_ftunit_hash'} = {};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
136 $self->_show_dna(1); # sets this to one by default. People can change it
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
137 if( ! defined $self->sequence_factory ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
138 $self->sequence_factory(new Bio::Seq::SeqFactory
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
139 (-verbose => $self->verbose(),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
140 -type => 'Bio::Seq::RichSeq'));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
141 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
142 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
143
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
144 =head2 next_seq
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
145
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
146 Title : next_seq
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
147 Usage : $seq = $stream->next_seq()
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
148 Function: returns the next sequence in the stream
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
149 Returns : Bio::Seq object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
150 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
151
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
152
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
153 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
154
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
155 sub next_seq {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
156 my ($self,@args) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
157 my ($pseq,$c,$line,$name,$desc,$acc,$seqc,$mol,$div,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
158 $date,$comment,@date_arr);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
159
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
160 my $genename = "";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
161 my ($annotation, %params, @features) = ( new Bio::Annotation::Collection);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
162
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
163 $line = $self->_readline;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
164
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
165 if( !defined $line) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
166 return undef; # no throws - end of file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
167 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
168
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
169 if( $line =~ /^\s+$/ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
170 while( defined ($line = $self->_readline) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
171 $line =~ /\S/ && last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
172 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
173 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
174 if( !defined $line ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
175 return undef; # end of file
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
176 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
177
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
178 # fixed to allow _DIVISION to be optional for bug #946
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
179 # see bug report for more information
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
180 $line =~ /^ID\s+([^\s_]+)(_([^\s_]+))?\s+([^\s;]+);\s+([^\s;]+);/
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
181 || $self->throw("swissprot stream with no ID. Not swissprot in my book");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
182
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
183 if( $3 ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
184 $name = "$1$2";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
185 $params{'-division'} = $3;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
186 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
187 $name = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
188 $params{'-division'} = 'UNK';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
189 $params{'-primary_id'} = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
190 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
191 $params{'-alphabet'} = 'protein';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
192 # this is important to have the id for display in e.g. FTHelper, otherwise
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
193 # you won't know which entry caused an error
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
194 $params{'-display_id'} = $name;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
195
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
196 my $buffer = $line;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
197
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
198 BEFORE_FEATURE_TABLE :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
199 until( !defined ($buffer) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
200 $_ = $buffer;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
201
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
202 # Exit at start of Feature table
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
203 last if /^FT/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
204 # and at the sequence at the latest HL 05/11/2000
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
205 last if /^SQ/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
206
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
207 # Description line(s)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
208 if (/^DE\s+(\S.*\S)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
209 $desc .= $desc ? " $1" : $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
210 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
211 #Gene name
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
212 elsif(/^GN\s+(.*)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
213 $genename .= " " if $genename;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
214 $genename .= $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
215 # has GN terminated yet?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
216 if($genename =~ s/[\. ]+$//) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
217 my $gn = Bio::Annotation::StructuredValue->new();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
218 foreach my $gene (split(/ AND /, $genename)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
219 $gene =~ s/^\(//;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
220 $gene =~ s/\)$//;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
221 $gn->add_value([-1,-1], split(/ OR /, $gene));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
222 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
223 $annotation->add_Annotation('gene_name',$gn,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
224 "Bio::Annotation::SimpleValue");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
225 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
226 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
227 #accession number(s)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
228 elsif( /^AC\s+(.+)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
229 my @accs = split(/[; ]+/, $1); # allow space in addition
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
230 $params{'-accession_number'} = shift @accs
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
231 unless defined $params{'-accession_number'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
232 push @{$params{'-secondary_accessions'}}, @accs;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
233 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
234 #version number
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
235 elsif( /^SV\s+(\S+);?/ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
236 my $sv = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
237 $sv =~ s/\;//;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
238 $params{'-seq_version'} = $sv;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
239 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
240 #date
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
241 elsif( /^DT\s+(.*)/ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
242 my $date = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
243 $date =~ s/\;//;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
244 $date =~ s/\s+$//;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
245 push @{$params{'-dates'}}, $date;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
246 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
247 # Organism name and phylogenetic information
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
248 elsif (/^O[SCG]/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
249 my $species = $self->_read_swissprot_Species(\$buffer);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
250 $params{'-species'}= $species;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
251 # now we are one line ahead -- so continue without reading the next
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
252 # line HL 05/11/2000
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
253 next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
254 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
255 # References
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
256 elsif (/^R/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
257 my $refs = $self->_read_swissprot_References(\$buffer);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
258
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
259 foreach my $r (@$refs) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
260 $annotation->add_Annotation('reference',$r);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
261 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
262 # now we are one line ahead -- so continue without reading the next
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
263 # line HL 05/11/2000
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
264 next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
265 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
266 #Comments
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
267 elsif (/^CC\s{3}(.*)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
268 $comment .= $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
269 $comment .= "\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
270 while (defined ($buffer = $self->_readline)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
271 if ($buffer =~ /^CC\s{3}(.*)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
272 $comment .= $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
273 $comment .= "\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
274 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
275 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
276 last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
277 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
278 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
279 my $commobj = Bio::Annotation::Comment->new();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
280 # note: don't try to process comments here -- they may contain
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
281 # structure. LP 07/30/2000
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
282 $commobj->text($comment);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
283 $annotation->add_Annotation('comment',$commobj);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
284 $comment = "";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
285 # now we are one line ahead -- so continue without reading the next
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
286 # line HL 05/11/2000
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
287 next;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
288 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
289 #DBLinks
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
290 elsif (/^DR\s+(\S+)\;\s+(\S+)\;\s+(\S+)[\;\.](.*)$/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
291 my $dblinkobj = Bio::Annotation::DBLink->new();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
292 $dblinkobj->database($1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
293 $dblinkobj->primary_id($2);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
294 $dblinkobj->optional_id($3);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
295 my $comment = $4;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
296 if(length($comment) > 0) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
297 # edit comment to get rid of leading space and trailing dot
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
298 if( $comment =~ /^\s*(\S+)\./ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
299 $dblinkobj->comment($1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
300 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
301 $dblinkobj->comment($comment);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
302 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
303 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
304 $annotation->add_Annotation('dblink',$dblinkobj);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
305 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
306 #keywords
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
307 elsif( /^KW\s+(.*)$/ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
308 my @kw = split(/\s*\;\s*/,$1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
309 defined $kw[-1] && $kw[-1] =~ s/\.$//;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
310 push @{$params{'-keywords'}}, @kw;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
311 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
312
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
313
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
314 # Get next line. Getting here assumes that we indeed need to read the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
315 # line.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
316 $buffer = $self->_readline;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
317 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
318
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
319 $buffer = $_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
320
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
321 FEATURE_TABLE :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
322 # if there is no feature table, or if we've got beyond, exit loop or don't
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
323 # even enter HL 05/11/2000
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
324 while (defined ($buffer) && ($buffer =~ /^FT/)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
325 my $ftunit = $self->_read_FTHelper_swissprot(\$buffer);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
326
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
327 # process ftunit
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
328 # when parsing of the line fails we get undef returned
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
329 if($ftunit) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
330 push(@features,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
331 $ftunit->_generic_seqfeature($self->location_factory(),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
332 $params{'-seqid'}, "SwissProt"));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
333 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
334 $self->warn("failed to parse feature table line for seq " .
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
335 $params{'-display_id'});
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
336 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
337 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
338 if( $buffer !~ /^SQ/ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
339 while( defined($_ = $self->_readline) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
340 /^SQ/ && last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
341 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
342 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
343 $seqc = "";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
344 while( defined ($_ = $self->_readline) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
345 /^\/\// && last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
346 $_ = uc($_);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
347 s/[^A-Za-z]//g;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
348 $seqc .= $_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
349 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
350
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
351 my $seq= $self->sequence_factory->create
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
352 (-verbose => $self->verbose,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
353 %params,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
354 -seq => $seqc,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
355 -desc => $desc,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
356 -features => \@features,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
357 -annotation => $annotation,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
358 );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
359
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
360 # The annotation doesn't get added by the contructor
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
361 $seq->annotation($annotation);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
362
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
363 return $seq;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
364 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
365
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
366 =head2 write_seq
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
367
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
368 Title : write_seq
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
369 Usage : $stream->write_seq($seq)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
370 Function: writes the $seq object (must be seq) to the stream
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
371 Returns : 1 for success and 0 for error
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
372 Args : array of 1 to n Bio::SeqI objects
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
373
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
374
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
375 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
376
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
377 sub write_seq {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
378 my ($self,@seqs) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
379 foreach my $seq ( @seqs ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
380 $self->throw("Attempting to write with no seq!") unless defined $seq;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
381
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
382 if( ! ref $seq || ! $seq->isa('Bio::SeqI') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
383 $self->warn(" $seq is not a SeqI compliant module. Attempting to dump, but may fail!");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
384 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
385
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
386 my $i;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
387 my $str = $seq->seq;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
388
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
389 my $mol;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
390 my $div;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
391 my $len = $seq->length();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
392
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
393 if ( !$seq->can('division') || ! defined ($div = $seq->division()) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
394 $div = 'UNK';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
395 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
396
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
397 if( ! $seq->can('alphabet') || ! defined ($mol = $seq->alphabet) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
398 $mol = 'XXX';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
399 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
400
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
401 my $temp_line;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
402 if( $self->_id_generation_func ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
403 $temp_line = &{$self->_id_generation_func}($seq);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
404 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
405 #$temp_line = sprintf ("%10s STANDARD; %3s; %d AA.",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
406 # $seq->primary_id()."_".$div,$mol,$len);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
407 # Reconstructing the ID relies heavily upon the input source having
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
408 # been in a format that is parsed as this routine expects it -- that is,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
409 # by this module itself. This is bad, I think, and immediately breaks
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
410 # if e.g. the Bio::DB::GenPept module is used as input.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
411 # Hence, switch to display_id(); _every_ sequence is supposed to have
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
412 # this. HL 2000/09/03
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
413 $mol =~ s/protein/PRT/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
414 $temp_line = sprintf ("%10s STANDARD; %3s; %d AA.",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
415 $seq->display_id(), $mol, $len);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
416 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
417
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
418 $self->_print( "ID $temp_line\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
419
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
420 # if there, write the accession line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
421 local($^W) = 0; # supressing warnings about uninitialized fields
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
422
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
423 if( $self->_ac_generation_func ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
424 $temp_line = &{$self->_ac_generation_func}($seq);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
425 $self->_print( "AC $temp_line\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
426 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
427 if ($seq->can('accession_number') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
428 $self->_print("AC ",$seq->accession_number,";");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
429 if ($seq->can('get_secondary_accessions') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
430 foreach my $sacc ($seq->get_secondary_accessions) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
431 $self->_print(" ",$sacc,";");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
432 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
433 $self->_print("\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
434 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
435 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
436 $self->_print("\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
437 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
438 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
439 # otherwise - cannot print <sigh>
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
440 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
441
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
442 # Date lines
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
443
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
444 if( $seq->can('get_dates') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
445 foreach my $dt ( $seq->get_dates() ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
446 $self->_write_line_swissprot_regex("DT ","DT ",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
447 $dt,"\\s\+\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
448 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
449 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
450
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
451 #Definition lines
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
452 $self->_write_line_swissprot_regex("DE ","DE ",$seq->desc(),"\\s\+\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
453
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
454 #Gene name
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
455 if ((my @genes = $seq->annotation->get_Annotations('gene_name') ) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
456 $self->_print("GN ",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
457 join(' OR ',
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
458 map {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
459 $_->isa("Bio::Annotation::StructuredValue") ?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
460 $_->value(-joins => [" AND ", " OR "]) :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
461 $_->value();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
462 } @genes),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
463 ".\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
464 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
465
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
466 # Organism lines
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
467 if ($seq->can('species') && (my $spec = $seq->species)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
468 my($species, @class) = $spec->classification();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
469 my $genus = $class[0];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
470 my $OS = "$genus $species";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
471 if ($class[$#class] =~ /viruses/i) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
472 # different OS / OC syntax for viruses LP 09/16/2000
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
473 shift @class;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
474 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
475 if (my $ssp = $spec->sub_species) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
476 $OS .= " $ssp";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
477 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
478 foreach (($spec->variant, $spec->common_name)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
479 $OS .= " ($_)" if $_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
480 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
481 $self->_print( "OS $OS.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
482 my $OC = join('; ', reverse(@class)) .'.';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
483 $self->_write_line_swissprot_regex("OC ","OC ",$OC,"\; \|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
484 if ($spec->organelle) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
485 $self->_write_line_swissprot_regex("OG ","OG ",$spec->organelle,"\; \|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
486 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
487 if ($spec->ncbi_taxid) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
488 $self->_print("OX NCBI_TaxID=".$spec->ncbi_taxid.";\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
489 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
490 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
491
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
492 # Reference lines
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
493 my $t = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
494 foreach my $ref ( $seq->annotation->get_Annotations('reference') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
495 $self->_print( "RN [$t]\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
496 # changed by lorenz 08/03/00
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
497 # j.gilbert and h.lapp agreed that the rp line in swissprot seems
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
498 # more like a comment than a parseable value, so print it as is
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
499 if ($ref->rp) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
500 $self->_write_line_swissprot_regex("RP ","RP ",$ref->rp,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
501 "\\s\+\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
502 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
503 if ($ref->comment) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
504 $self->_write_line_swissprot_regex("RC ","RC ",$ref->comment,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
505 "\\s\+\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
506 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
507 if ($ref->medline) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
508 # new RX format in swissprot LP 09/17/00
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
509 if ($ref->pubmed) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
510 $self->_write_line_swissprot_regex("RX ","RX ",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
511 "MEDLINE=".$ref->medline.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
512 "; PubMed=".$ref->pubmed.";",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
513 "\\s\+\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
514 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
515 $self->_write_line_swissprot_regex("RX MEDLINE; ","RX MEDLINE; ",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
516 $ref->medline.".","\\s\+\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
517 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
518 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
519 my $author = $ref->authors .';' if($ref->authors);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
520 my $title = $ref->title .';' if( $ref->title);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
521
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
522 $self->_write_line_swissprot_regex("RA ","RA ",$author,"\\s\+\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
523 $self->_write_line_swissprot_regex("RT ","RT ",$title,"\\s\+\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
524 $self->_write_line_swissprot_regex("RL ","RL ",$ref->location,"\\s\+\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
525 $t++;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
526 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
527
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
528 # Comment lines
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
529
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
530 foreach my $comment ( $seq->annotation->get_Annotations('comment') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
531 foreach my $cline (split ("\n", $comment->text)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
532 while (length $cline > 74) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
533 $self->_print("CC ",(substr $cline,0,74),"\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
534 $cline = substr $cline,74;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
535 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
536 $self->_print("CC ",$cline,"\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
537 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
538 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
539
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
540 foreach my $dblink ( $seq->annotation->get_Annotations('dblink') )
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
541 {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
542 if (defined($dblink->comment)&&($dblink->comment)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
543 $self->_print("DR ",$dblink->database,"; ",$dblink->primary_id,"; ",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
544 $dblink->optional_id,"; ",$dblink->comment,".\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
545 } elsif($dblink->optional_id) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
546 $self->_print("DR ",$dblink->database,"; ",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
547 $dblink->primary_id,"; ",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
548 $dblink->optional_id,".\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
549 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
550 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
551 $self->_print("DR ",$dblink->database,
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
552 "; ",$dblink->primary_id,"; ","-.\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
553 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
554 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
555
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
556 # if there, write the kw line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
557 {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
558 my( $kw );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
559 if( my $func = $self->_kw_generation_func ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
560 $kw = &{$func}($seq);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
561 } elsif( $seq->can('keywords') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
562 $kw = $seq->keywords;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
563 if( ref($kw) =~ /ARRAY/i ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
564 $kw = join("; ", @$kw);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
565 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
566 $kw .= '.' if( $kw !~ /\.$/ );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
567 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
568 $self->_write_line_swissprot_regex("KW ","KW ",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
569 $kw, "\\s\+\|\$",80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
570 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
571
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
572 #Check if there are seqfeatures before printing the FT line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
573 my @feats = $seq->can('top_SeqFeatures') ? $seq->top_SeqFeatures : ();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
574 if ($feats[0]) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
575 if( defined $self->_post_sort ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
576
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
577 # we need to read things into an array. Process. Sort them. Print 'em
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
578
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
579 my $post_sort_func = $self->_post_sort();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
580 my @fth;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
581
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
582 foreach my $sf ( @feats ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
583 push(@fth,Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
584 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
585 @fth = sort { &$post_sort_func($a,$b) } @fth;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
586
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
587 foreach my $fth ( @fth ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
588 $self->_print_swissprot_FTHelper($fth);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
589 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
590 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
591 # not post sorted. And so we can print as we get them.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
592 # lower memory load...
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
593
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
594 foreach my $sf ( @feats ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
595 my @fth = Bio::SeqIO::FTHelper::from_SeqFeature($sf,$seq);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
596 foreach my $fth ( @fth ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
597 if( ! $fth->isa('Bio::SeqIO::FTHelper') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
598 $sf->throw("Cannot process FTHelper... $fth");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
599 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
600
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
601 $self->_print_swissprot_FTHelper($fth);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
602 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
603 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
604 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
605
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
606 if( $self->_show_dna() == 0 ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
607 return;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
608 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
609 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
610 # finished printing features.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
611
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
612 # molecular weight
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
613 my $mw = ${Bio::Tools::SeqStats->get_mol_wt($seq->primary_seq)}[0];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
614 # checksum
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
615 # was crc32 checksum, changed it to crc64
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
616 my $crc64 = $self->_crc64(\$str);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
617 $self->_print( sprintf("SQ SEQUENCE %4d AA; %d MW; %16s CRC64;\n",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
618 $len,$mw,$crc64));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
619 $self->_print( " ");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
620 my $linepos;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
621 for ($i = 0; $i < length($str); $i += 10) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
622 $self->_print( substr($str,$i,10), " ");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
623 $linepos += 11;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
624 if( ($i+10)%60 == 0 && (($i+10) < length($str))) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
625 $self->_print( "\n ");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
626 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
627 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
628 $self->_print( "\n//\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
629
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
630 $self->flush if $self->_flush_on_write && defined $self->_fh;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
631 return 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
632 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
633 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
634
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
635 # Thanks to James Gilbert for the following two. LP 08/01/2000
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
636
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
637 =head2 _generateCRCTable
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
638
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
639 Title : _generateCRCTable
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
640 Usage :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
641 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
642 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
643 Returns :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
644 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
645
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
646
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
647 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
648
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
649 sub _generateCRCTable {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
650 # 10001000001010010010001110000100
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
651 # 32
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
652 my $poly = 0xEDB88320;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
653 my ($self) = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
654
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
655 $self->{'_crcTable'} = [];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
656 foreach my $i (0..255) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
657 my $crc = $i;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
658 for (my $j=8; $j > 0; $j--) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
659 if ($crc & 1) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
660 $crc = ($crc >> 1) ^ $poly;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
661 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
662 else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
663 $crc >>= 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
664 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
665 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
666 ${$self->{'_crcTable'}}[$i] = $crc;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
667 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
668 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
669
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
670
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
671 =head2 _crc32
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
672
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
673 Title : _crc32
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
674 Usage :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
675 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
676 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
677 Returns :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
678 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
679
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
680
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
681 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
682
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
683 sub _crc32 {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
684 my( $self, $str ) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
685
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
686 $self->throw("Argument to crc32() must be ref to scalar")
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
687 unless ref($str) eq 'SCALAR';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
688
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
689 $self->_generateCRCTable() unless exists $self->{'_crcTable'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
690
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
691 my $len = length($$str);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
692
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
693 my $crc = 0xFFFFFFFF;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
694 for (my $i = 0; $i < $len; $i++) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
695 # Get upper case value of each letter
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
696 my $int = ord uc substr $$str, $i, 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
697 $crc = (($crc >> 8) & 0x00FFFFFF) ^
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
698 ${$self->{'_crcTable'}}[ ($crc ^ $int) & 0xFF ];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
699 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
700 return $crc;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
701 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
702
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
703 =head2 _crc64
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
704
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
705 Title : _crc64
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
706 Usage :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
707 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
708 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
709 Returns :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
710 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
711
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
712
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
713 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
714
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
715 sub _crc64{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
716 my ($self, $sequence) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
717 my $POLY64REVh = 0xd8000000;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
718 my @CRCTableh = 256;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
719 my @CRCTablel = 256;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
720 my $initialized;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
721
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
722
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
723 my $seq = $$sequence;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
724
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
725 my $crcl = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
726 my $crch = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
727 if (!$initialized) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
728 $initialized = 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
729 for (my $i=0; $i<256; $i++) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
730 my $partl = $i;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
731 my $parth = 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
732 for (my $j=0; $j<8; $j++) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
733 my $rflag = $partl & 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
734 $partl >>= 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
735 $partl |= (1 << 31) if $parth & 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
736 $parth >>= 1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
737 $parth ^= $POLY64REVh if $rflag;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
738 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
739 $CRCTableh[$i] = $parth;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
740 $CRCTablel[$i] = $partl;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
741 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
742 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
743
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
744 foreach (split '', $seq) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
745 my $shr = ($crch & 0xFF) << 24;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
746 my $temp1h = $crch >> 8;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
747 my $temp1l = ($crcl >> 8) | $shr;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
748 my $tableindex = ($crcl ^ (unpack "C", $_)) & 0xFF;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
749 $crch = $temp1h ^ $CRCTableh[$tableindex];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
750 $crcl = $temp1l ^ $CRCTablel[$tableindex];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
751 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
752 my $crc64 = sprintf("%08X%08X", $crch, $crcl);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
753
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
754 return $crc64;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
755
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
756 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
757
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
758 =head2 _print_swissprot_FTHelper
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
759
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
760 Title : _print_swissprot_FTHelper
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
761 Usage :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
762 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
763 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
764 Returns :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
765 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
766
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
767
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
768 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
769
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
770 sub _print_swissprot_FTHelper {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
771 my ($self,$fth,$always_quote) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
772 $always_quote ||= 0;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
773 my ($start,$end) = ('?', '?');
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
774
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
775 if( ! ref $fth || ! $fth->isa('Bio::SeqIO::FTHelper') ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
776 $fth->warn("$fth is not a FTHelper class. ".
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
777 "Attempting to print, but there could be tears!");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
778 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
779
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
780 if( $fth->loc =~ /(\?|\d+|\>\d+|<\d+)?\.\.(\?|\d+|<\d+|>\d+)?/ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
781 $start = $1 if defined $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
782 $end = $2 if defined $2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
783
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
784 # to_FTString only returns one value when start == end, #JB955
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
785 # so if no match is found, assume it is both start and end #JB955
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
786 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
787 $start = $end = $fth->loc;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
788 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
789
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
790 my $desc = "";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
791 $desc = @{$fth->field->{"description"}}[0]."."
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
792 if exists $fth->field->{"description"};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
793 $self->_write_line_swissprot_regex(sprintf("FT %-8s %6s %6s ",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
794 substr($fth->key,0,8),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
795 $start,$end),
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
796 "FT ",
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
797 $desc.'.','\s+|$',80);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
798 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
799 #'
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
800
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
801 =head2 _read_swissprot_References
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
802
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
803 Title : _read_swissprot_References
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
804 Usage :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
805 Function: Reads references from swissprot format. Internal function really
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
806 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
807 Returns :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
808 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
809
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
810
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
811 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
812
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
813 sub _read_swissprot_References{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
814 my ($self,$buffer) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
815 my (@refs);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
816 my ($b1, $b2, $rp, $title, $loc, $au, $med, $com, $pubmed);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
817
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
818 if ($$buffer !~ /^RP/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
819 $$buffer = $self->_readline;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
820 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
821 if( !defined $$buffer ) { return undef; }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
822 if( $$buffer =~ /^RP/ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
823 if ($$buffer =~ /^RP (SEQUENCE OF (\d+)-(\d+).*)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
824 $rp=$1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
825 $b1=$2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
826 $b2=$3;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
827 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
828 elsif ($$buffer =~ /^RP (.*)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
829 $rp=$1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
830 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
831
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
832 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
833 while( defined ($_ = $self->_readline) ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
834 #/^CC/ && last;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
835 /^RN/ && last; # separator between references ! LP 07/25/2000
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
836 #/^SQ/ && last; # there may be sequences without CC lines! HL 05/11/2000
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
837 /^[^R]/ && last; # may be the safest exit point HL 05/11/2000
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
838 /^RX MEDLINE;\s+(\d+)/ && do {$med=$1};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
839 /^RX MEDLINE=(\d+);\s+PubMed=(\d+);/ && do {$med=$1;$pubmed=$2};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
840 /^RA (.*)/ && do { $au .= $au ? " $1" : $1; next;};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
841 /^RT (.*)/ && do { $title .= $title ? " $1" : $1; next;};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
842 /^RL (.*)/ && do { $loc .= $loc ? " $1" : $1; next;};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
843 /^RC (.*)/ && do { $com .= $com ? " $1" : $1; next;};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
844 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
845
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
846 my $ref = new Bio::Annotation::Reference;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
847 $au =~ s/;\s*$//g;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
848 if( defined $title ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
849 $title =~ s/;\s*$//g;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
850 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
851
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
852 $ref->start($b1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
853 $ref->end($b2);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
854 $ref->authors($au);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
855 $ref->title($title);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
856 $ref->location($loc);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
857 $ref->medline($med);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
858 $ref->pubmed($pubmed) if (defined $pubmed);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
859 $ref->comment($com);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
860 $ref->rp($rp);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
861
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
862 push(@refs,$ref);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
863 $$buffer = $_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
864 return \@refs;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
865 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
866
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
867
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
868 =head2 _read_swissprot_Species
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
869
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
870 Title : _read_swissprot_Species
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
871 Usage :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
872 Function: Reads the swissprot Organism species and classification
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
873 lines.
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
874 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
875 Returns : A Bio::Species object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
876 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
877
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
878 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
879
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
880 sub _read_swissprot_Species {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
881 my( $self, $buffer ) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
882 my $org;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
883
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
884 $_ = $$buffer;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
885 my( $subspecies, $species, $genus, $common, $variant, $ncbi_taxid );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
886 my @class;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
887 my ($binomial, $descr);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
888 my $osline = "";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
889
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
890 while (defined( $_ ||= $self->_readline )) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
891 last unless /^O[SCGX]/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
892 # believe it or not, but OS may come multiple times -- at this time
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
893 # we can't capture multiple species
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
894 if(/^OS\s+(\S.+)/ && (! defined($binomial))) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
895 $osline .= " " if $osline;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
896 $osline .= $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
897 if($osline =~ s/(,|, and|\.)$//) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
898 ($binomial, $descr) = $osline =~ /(\S[^\(]+)(.*)/;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
899 ($genus, $species, $subspecies) = split(/\s+/, $binomial);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
900 $species = "sp." unless $species;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
901 while($descr =~ /\(([^\)]+)\)/g) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
902 my $item = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
903 # strain etc may not necessarily come first (yes, swissprot
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
904 # is messy)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
905 if((! defined($variant)) &&
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
906 (($item =~ /(^|[^\(\w])([Ss]train|isolate|serogroup|serotype|subtype|clone)\b/) ||
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
907 ($item =~ /^(biovar|pv\.|type\s+)/))) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
908 $variant = $item;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
909 } elsif($item =~ s/^subsp\.\s+//) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
910 if(! $subspecies) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
911 $subspecies = $item;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
912 } elsif(! $variant) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
913 $variant = $item;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
914 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
915 } elsif(! defined($common)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
916 # we're only interested in the first common name
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
917 $common = $item;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
918 if((index($common, '(') >= 0) &&
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
919 (index($common, ')') < 0)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
920 $common .= ')';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
921 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
922 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
923 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
924 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
925 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
926 elsif (s/^OC\s+//) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
927 push(@class, split /[\;\.]\s*/);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
928 if($class[0] =~ /viruses/i) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
929 # viruses have different OS/OC syntax
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
930 my @virusnames = split(/\s+/, $binomial);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
931 $species = (@virusnames > 1) ? pop(@virusnames) : '';
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
932 $genus = join(" ", @virusnames);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
933 $subspecies = undef;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
934 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
935 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
936 elsif (/^OG\s+(.*)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
937 $org = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
938 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
939 elsif (/^OX\s+(.*)/ && (! defined($ncbi_taxid))) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
940 my $taxstring = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
941 # we only keep the first one and ignore all others
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
942 if ($taxstring =~ /NCBI_TaxID=([\w\d]+)/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
943 $ncbi_taxid = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
944 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
945 $self->throw("$taxstring doesn't look like NCBI_TaxID");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
946 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
947 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
948
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
949 $_ = undef; # Empty $_ to trigger read of next line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
950 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
951
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
952 $$buffer = $_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
953
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
954 # Don't make a species object if it is "Unknown" or "None"
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
955 return if $genus =~ /^(Unknown|None)$/i;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
956
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
957 if ($class[$#class] eq $genus) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
958 push( @class, $species );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
959 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
960 push( @class, $genus, $species );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
961 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
962
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
963 @class = reverse @class;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
964
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
965 my $taxon = Bio::Species->new();
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
966 $taxon->classification( \@class, "FORCE" ); # no name validation please
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
967 $taxon->common_name( $common ) if $common;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
968 $taxon->sub_species( $subspecies ) if $subspecies;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
969 $taxon->organelle ( $org ) if $org;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
970 $taxon->ncbi_taxid ( $ncbi_taxid ) if $ncbi_taxid;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
971 $taxon->variant($variant) if $variant;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
972
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
973 # done
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
974 return $taxon;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
975 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
976
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
977 =head2 _filehandle
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
978
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
979 Title : _filehandle
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
980 Usage : $obj->_filehandle($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
981 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
982 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
983 Returns : value of _filehandle
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
984 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
985
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
986
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
987 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
988
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
989 # inherited from SeqIO.pm ! HL 05/11/2000
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
990
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
991 =head2 _read_FTHelper_swissprot
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
992
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
993 Title : _read_FTHelper_swissprot
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
994 Usage : _read_FTHelper_swissprot(\$buffer)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
995 Function: reads the next FT key line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
996 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
997 Returns : Bio::SeqIO::FTHelper object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
998 Args : filehandle and reference to a scalar
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
999
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1000
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1001 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1002
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1003 sub _read_FTHelper_swissprot {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1004 # initial version implemented by HL 05/10/2000
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1005 # FIXME this may not be perfect, so please review
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1006 my ($self,$buffer) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1007 my ($key, # The key of the feature
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1008 $loc, # The location line from the feature
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1009 $desc, # The descriptive text
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1010 );
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1011
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1012 if ($$buffer =~ /^FT (\w+)\s+([\d\?\<]+)\s+([\d\?\>]+)\s*(.*)$/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1013 $key = $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1014 my $loc1 = $2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1015 my $loc2 = $3;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1016 $loc = "$loc1..$loc2";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1017 if($4 && (length($4) > 0)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1018 $desc = $4;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1019 chomp($desc);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1020 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1021 $desc = "";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1022 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1023 # Read all the continuation lines up to the next feature
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1024 while (defined($_ = $self->_readline) && /^FT\s{20,}(\S.*)$/) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1025 $desc .= $1;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1026 chomp($desc);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1027 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1028 $desc =~ s/\.$//;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1029 } else {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1030 # No feature key. What's this?
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1031 $self->warn("No feature key in putative feature table line: $_");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1032 return;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1033 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1034
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1035 # Put the first line of the next feature into the buffer
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1036 $$buffer = $_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1037
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1038 # Make the new FTHelper object
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1039 my $out = new Bio::SeqIO::FTHelper(-verbose => $self->verbose());
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1040 $out->key($key);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1041 $out->loc($loc);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1042
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1043 # store the description if there is one
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1044 if($desc && (length($desc) > 0)) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1045 $out->field->{"description"} ||= [];
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1046 push(@{$out->field->{"description"}}, $desc);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1047 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1048 return $out;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1049 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1050
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1051
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1052 =head2 _write_line_swissprot
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1053
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1054 Title : _write_line_swissprot
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1055 Usage :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1056 Function: internal function
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1057 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1058 Returns :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1059 Args :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1060
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1061
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1062 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1063
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1064 sub _write_line_swissprot{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1065 my ($self,$pre1,$pre2,$line,$length) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1066
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1067 $length || die "Miscalled write_line_swissprot without length. Programming error!";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1068 my $subl = $length - length $pre2;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1069 my $linel = length $line;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1070 my $i;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1071
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1072 my $sub = substr($line,0,$length - length $pre1);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1073
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1074 $self->_print( "$pre1$sub\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1075
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1076 for($i= ($length - length $pre1);$i < $linel;) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1077 $sub = substr($line,$i,($subl));
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1078 $self->_print( "$pre2$sub\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1079 $i += $subl;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1080 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1081
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1082 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1083
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1084 =head2 _write_line_swissprot_regex
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1085
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1086 Title : _write_line_swissprot_regex
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1087 Usage :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1088 Function: internal function for writing lines of specified
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1089 length, with different first and the next line
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1090 left hand headers and split at specific points in the
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1091 text
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1092 Example :
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1093 Returns : nothing
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1094 Args : file handle, first header, second header, text-line, regex for line breaks, total line length
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1095
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1096
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1097 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1098
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1099 sub _write_line_swissprot_regex {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1100 my ($self,$pre1,$pre2,$line,$regex,$length) = @_;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1101
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1102 #print STDOUT "Going to print with $line!\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1103
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1104 $length || die "Miscalled write_line_swissprot without length. Programming error!";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1105
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1106 if( length $pre1 != length $pre2 ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1107 print STDERR "len 1 is ", length $pre1, " len 2 is ", length $pre2, "\n";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1108 die "Programming error - cannot called write_line_swissprot_regex with different length \npre1 ($pre1) and \npre2 ($pre2) tags!";
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1109 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1110
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1111 my $subl = $length - (length $pre1) -1 ;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1112 my @lines;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1113
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1114 while($line =~ m/(.{1,$subl})($regex)/g) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1115 push(@lines, $1.$2);
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1116 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1117
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1118 my $s = shift @lines;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1119 $self->_print( "$pre1$s\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1120 foreach my $s ( @lines ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1121 $self->_print( "$pre2$s\n");
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1122 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1123 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1124
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1125 =head2 _post_sort
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1126
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1127 Title : _post_sort
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1128 Usage : $obj->_post_sort($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1129 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1130 Returns : value of _post_sort
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1131 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1132
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1133
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1134 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1135
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1136 sub _post_sort{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1137 my $obj = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1138 if( @_ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1139 my $value = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1140 $obj->{'_post_sort'} = $value;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1141 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1142 return $obj->{'_post_sort'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1143
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1144 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1145
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1146 =head2 _show_dna
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1147
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1148 Title : _show_dna
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1149 Usage : $obj->_show_dna($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1150 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1151 Returns : value of _show_dna
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1152 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1153
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1154
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1155 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1156
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1157 sub _show_dna{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1158 my $obj = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1159 if( @_ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1160 my $value = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1161 $obj->{'_show_dna'} = $value;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1162 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1163 return $obj->{'_show_dna'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1164
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1165 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1166
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1167 =head2 _id_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1168
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1169 Title : _id_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1170 Usage : $obj->_id_generation_func($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1171 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1172 Returns : value of _id_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1173 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1174
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1175
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1176 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1177
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1178 sub _id_generation_func{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1179 my $obj = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1180 if( @_ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1181 my $value = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1182 $obj->{'_id_generation_func'} = $value;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1183 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1184 return $obj->{'_id_generation_func'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1185
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1186 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1187
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1188 =head2 _ac_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1189
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1190 Title : _ac_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1191 Usage : $obj->_ac_generation_func($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1192 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1193 Returns : value of _ac_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1194 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1195
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1196
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1197 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1198
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1199 sub _ac_generation_func{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1200 my $obj = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1201 if( @_ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1202 my $value = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1203 $obj->{'_ac_generation_func'} = $value;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1204 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1205 return $obj->{'_ac_generation_func'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1206
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1207 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1208
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1209 =head2 _sv_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1210
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1211 Title : _sv_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1212 Usage : $obj->_sv_generation_func($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1213 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1214 Returns : value of _sv_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1215 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1216
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1217
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1218 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1219
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1220 sub _sv_generation_func{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1221 my $obj = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1222 if( @_ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1223 my $value = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1224 $obj->{'_sv_generation_func'} = $value;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1225 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1226 return $obj->{'_sv_generation_func'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1227
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1228 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1229
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1230 =head2 _kw_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1231
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1232 Title : _kw_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1233 Usage : $obj->_kw_generation_func($newval)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1234 Function:
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1235 Returns : value of _kw_generation_func
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1236 Args : newvalue (optional)
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1237
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1238
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1239 =cut
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1240
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1241 sub _kw_generation_func{
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1242 my $obj = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1243 if( @_ ) {
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1244 my $value = shift;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1245 $obj->{'_kw_generation_func'} = $value;
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1246 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1247 return $obj->{'_kw_generation_func'};
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1248
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1249 }
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1250
1f6dce3d34e0 Uploaded
mahtabm
parents:
diff changeset
1251 1;