comparison variant_effect_predictor/Bio/DasI.pm @ 0:2bc9b66ada89 draft default tip

Uploaded
author mahtabm
date Thu, 11 Apr 2013 06:29:17 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:2bc9b66ada89
1 # $Id: DasI.pm,v 1.15 2002/11/11 18:16:29 lapp Exp $
2 #
3 # BioPerl module for Bio::DasI
4 #
5 # Cared for by Lincoln Stein <lstein@cshl.org>
6 #
7 # Copyright Lincoln Stein
8 #
9 # You may distribute this module under the same terms as perl itself
10
11 # POD documentation - main docs before the code
12
13 =head1 NAME
14
15 Bio::DasI - DAS-style access to a feature database
16
17 =head1 SYNOPSIS
18
19 # Open up a feature database somehow...
20 $db = Bio::DasI->new(@args);
21
22 @segments = $db->segment(-name => 'NT_29921.4',
23 -start => 1,
24 -end => 1000000);
25
26 # segments are Bio::Das::SegmentI - compliant objects
27
28 # fetch a list of features
29 @features = $db->features(-type=>['type1','type2','type3']);
30
31 # invoke a callback over features
32 $db->features(-type=>['type1','type2','type3'],
33 -callback => sub { ... }
34 );
35
36 $stream = $db->get_seq_stream(-type=>['type1','type2','type3']);
37 while (my $feature = $stream->next_seq) {
38 # each feature is a Bio::SeqFeatureI-compliant object
39 }
40
41 # get all feature types
42 @types = $db->types;
43
44 # count types
45 %types = $db->types(-enumerate=>1);
46
47 @feature = $db->get_feature_by_name($class=>$name);
48 @feature = $db->get_feature_by_target($target_name);
49 @feature = $db->get_feature_by_attribute($att1=>$value1,$att2=>$value2);
50 $feature = $db->get_feature_by_id($id);
51
52 $error = $db->error;
53
54 =head1 DESCRIPTION
55
56 Bio::DasI is a simplified alternative interface to sequence annotation
57 databases used by the distributed annotation system (see
58 L<Bio::Das>). In this scheme, the genome is represented as a series of
59 features, a subset of which are named. Named features can be used as
60 reference points for retrieving "segments" (see L<Bio::Das::SegmentI>),
61 and these can, in turn, be used as the basis for exploring the genome
62 further.
63
64 In addition to a name, each feature has a "class", which is
65 essentially a namespace qualifier and a "type", which describes what
66 type of feature it is. Das uses the GO consortium's ontology of
67 feature types, and so the type is actually an object of class
68 Bio::Das::FeatureTypeI (see L<Bio::Das::FeatureTypeI>). Bio::DasI
69 provides methods for interrogating the database for the types it
70 contains and the counts of each type.
71
72 =head1 FEEDBACK
73
74 =head2 Mailing Lists
75
76 User feedback is an integral part of the evolution of this and other
77 Bioperl modules. Send your comments and suggestions preferably to one
78 of the Bioperl mailing lists. Your participation is much appreciated.
79
80 bioperl-l@bio.perl.org
81
82 =head2 Reporting Bugs
83
84 Report bugs to the Bioperl bug tracking system to help us keep track
85 the bugs and their resolution. Bug reports can be submitted via email
86 or the web:
87
88 bioperl-bugs@bio.perl.org
89 http://bugzilla.bioperl.org/
90
91 =head1 AUTHOR - Lincoln Stein
92
93 Email lstein@cshl.org
94
95 =head1 APPENDIX
96
97 The rest of the documentation details each of the object
98 methods. Internal methods are usually preceded with a _
99
100 =cut
101
102 #'
103 # Let the code begin...
104
105 package Bio::DasI;
106 use strict;
107
108 use vars qw(@ISA);
109 use Bio::Root::RootI;
110 use Bio::Das::SegmentI;
111 use Bio::SeqFeature::CollectionI;
112 # Object preamble - inherits from Bio::Root::Root;
113 @ISA = qw(Bio::Root::RootI Bio::SeqFeature::CollectionI);
114
115 =head2 new
116
117 Title : new
118 Usage : Bio::DasI->new(@args)
119 Function: Create new Bio::DasI object
120 Returns : a Bio::DasI object
121 Args : see below
122
123 The new() method creates a new object. The argument list is either a
124 single argument consisting of a connection string, or the following
125 list of -name=E<gt>value arguments:
126
127 Argument Description
128 -------- -----------
129
130 -dsn Connection string for database
131 -adaptor Name of an adaptor class to use when connecting
132 -aggregator Array ref containing list of aggregators
133 "semantic mappers" to apply to database
134 -user Authentication username
135 -pass Authentication password
136
137 Implementors of DasI may add other arguments.
138
139 =cut
140
141 sub new {shift->throw_not_implemented}
142
143 =head2 types
144
145 Title : types
146 Usage : $db->types(@args)
147 Function: return list of feature types in database
148 Returns : a list of Bio::Das::FeatureTypeI objects
149 Args : see below
150
151 This routine returns a list of feature types known to the database. It
152 is also possible to find out how many times each feature occurs.
153
154 Arguments are -option=E<gt>value pairs as follows:
155
156 -enumerate if true, count the features
157
158 The returned value will be a list of Bio::Das::FeatureTypeI objects
159 (see L<Bio::Das::FeatureTypeI>.
160
161 If -enumerate is true, then the function returns a hash (not a hash
162 reference) in which the keys are the stringified versions of
163 Bio::Das::FeatureTypeI and the values are the number of times each
164 feature appears in the database.
165
166 =cut
167
168 sub types { shift->throw_not_implemented; }
169
170 =head2 segment
171
172 Title : segment
173 Usage : $db->segment(@args);
174 Function: create a segment object
175 Returns : segment object(s)
176 Args : see below
177
178 This method generates a Bio::Das::SegmentI object (see
179 L<Bio::Das::SegmentI>). The segment can be used to find overlapping
180 features and the raw sequence.
181
182 When making the segment() call, you specify the ID of a sequence
183 landmark (e.g. an accession number, a clone or contig), and a
184 positional range relative to the landmark. If no range is specified,
185 then the entire region spanned by the landmark is used to generate the
186 segment.
187
188 Arguments are -option=E<gt>value pairs as follows:
189
190 -name ID of the landmark sequence.
191
192 -class A namespace qualifier. It is not necessary for the
193 database to honor namespace qualifiers, but if it
194 does, this is where the qualifier is indicated.
195
196 -version Version number of the landmark. It is not necessary for
197 the database to honor versions, but if it does, this is
198 where the version is indicated.
199
200 -start Start of the segment relative to landmark. Positions
201 follow standard 1-based sequence rules. If not specified,
202 defaults to the beginning of the landmark.
203
204 -end End of the segment relative to the landmark. If not specified,
205 defaults to the end of the landmark.
206
207 The return value is a list of Bio::Das::SegmentI objects. If the method
208 is called in a scalar context and there are no more than one segments
209 that satisfy the request, then it is allowed to return the segment.
210 Otherwise, the method must throw a "multiple segment exception".
211
212 =cut
213
214 #'
215
216 sub segment { shift->throw_not_implemented }
217
218 =head2 features
219
220 Title : features
221 Usage : $db->features(@args)
222 Function: get all features, possibly filtered by type
223 Returns : a list of Bio::SeqFeatureI objects
224 Args : see below
225 Status : public
226
227 This routine will retrieve features in the database regardless of
228 position. It can be used to return all features, or a subset based on
229 their type
230
231 Arguments are -option=E<gt>value pairs as follows:
232
233 -types List of feature types to return. Argument is an array
234 of Bio::Das::FeatureTypeI objects or a set of strings
235 that can be converted into FeatureTypeI objects.
236
237 -callback A callback to invoke on each feature. The subroutine
238 will be passed each Bio::SeqFeatureI object in turn.
239
240 -attributes A hash reference containing attributes to match.
241
242 The -attributes argument is a hashref containing one or more attributes
243 to match against:
244
245 -attributes => { Gene => 'abc-1',
246 Note => 'confirmed' }
247
248 Attribute matching is simple exact string matching, and multiple
249 attributes are ANDed together. See L<Bio::DB::ConstraintsI> for a
250 more sophisticated take on this.
251
252 If one provides a callback, it will be invoked on each feature in
253 turn. If the callback returns a false value, iteration will be
254 interrupted. When a callback is provided, the method returns undef.
255
256 =cut
257
258 sub features { shift->throw_not_implemented }
259
260 =head2 get_feature_by_name
261
262 Title : get_feature_by_name
263 Usage : $db->get_feature_by_name(-class=>$class,-name=>$name)
264 Function: fetch features by their name
265 Returns : a list of Bio::SeqFeatureI objects
266 Args : the class and name of the desired feature
267 Status : public
268
269 This method can be used to fetch named feature(s) from the database.
270 The -class and -name arguments have the same meaning as in segment(),
271 and the method also accepts the following short-cut forms:
272
273 1) one argument: the argument is treated as the feature name
274 2) two arguments: the arguments are treated as the class and name
275 (note: this uses _rearrange() so the first argument must not
276 begin with a hyphen or it will be interpreted as a named
277 argument).
278
279 This method may return zero, one, or several Bio::SeqFeatureI objects.
280 The implementor may allow the name to contain wildcards, in which case
281 standard C-shell glob semantics are expected.
282
283 =cut
284
285 sub get_feature_by_name {
286 shift->throw_not_implemented();
287 }
288
289 =head2 get_feature_by_target
290
291 Title : get_feature_by_target
292 Usage : $db->get_feature_by_target($class => $name)
293 Function: fetch features by their similarity target
294 Returns : a list of Bio::SeqFeatureI objects
295 Args : the class and name of the desired feature
296 Status : public
297
298 This method can be used to fetch a named feature from the database
299 based on its similarity hit. The arguments are the same as
300 get_feature_by_name(). If this is not implemented, the interface
301 defaults to using get_feature_by_name().
302
303 =cut
304
305 sub get_feature_by_target {
306 shift->get_feature_by_name(@_);
307 }
308
309 =head2 get_feature_by_id
310
311 Title : get_feature_by_id
312 Usage : $db->get_feature_by_target($id)
313 Function: fetch a feature by its ID
314 Returns : a Bio::SeqFeatureI objects
315 Args : the ID of the feature
316 Status : public
317
318 If the database provides unique feature IDs, this can be used to
319 retrieve a single feature from the database. If not overridden, this
320 interface calls get_feature_by_name() and returns the first element.
321
322 =cut
323
324 sub get_feature_by_id {
325 (shift->get_feature_by_name(@_))[0];
326 }
327
328 =head2 get_feature_by_attribute
329
330 Title : get_feature_by_attribute
331 Usage : $db->get_feature_by_attribute(attribute1=>value1,attribute2=>value2)
332 Function: fetch features by combinations of attribute values
333 Returns : a list of Bio::SeqFeatureI objects
334 Args : the class and name of the desired feature
335 Status : public
336
337 This method can be used to fetch a set of features from the database.
338 Attributes are a list of name=E<gt>value pairs. They will be
339 logically ANDed together. If an attribute value is an array
340 reference, the list of values in the array is treated as an
341 alternative set of values to be ORed together.
342
343 =cut
344
345 sub get_feature_by_attribute {
346 shift->throw_not_implemented();
347 }
348
349
350 =head2 search_notes
351
352 Title : search_notes
353 Usage : $db->search_notes($search_term,$max_results)
354 Function: full-text search on features, ENSEMBL-style
355 Returns : an array of [$name,$description,$score]
356 Args : see below
357 Status : public
358
359 This routine performs a full-text search on feature attributes (which
360 attributes depend on implementation) and returns a list of
361 [$name,$description,$score], where $name is the feature ID,
362 $description is a human-readable description such as a locus line, and
363 $score is the match strength.
364
365 Since this is a decidedly non-standard thing to do (but the generic
366 genome browser uses it), the default method returns an empty list.
367 You do not have to implement it.
368
369 =cut
370
371 sub search_notes { return }
372
373 =head2 get_seq_stream
374
375 Title : get_seq_stream
376 Usage : $seqio = $db->get_seq_stream(@args)
377 Function: Performs a query and returns an iterator over it
378 Returns : a Bio::SeqIO stream capable of returning Bio::SeqFeatureI objects
379 Args : As in features()
380 Status : public
381
382 This routine takes the same arguments as features(), but returns a
383 Bio::SeqIO::Stream-compliant object. Use it like this:
384
385 $stream = $db->get_seq_stream('exon');
386 while (my $exon = $stream->next_seq) {
387 print $exon,"\n";
388 }
389
390 NOTE: In the interface this method is aliased to get_feature_stream(),
391 as the name is more descriptive.
392
393 =cut
394
395 sub get_seq_stream { shift->throw_not_implemented }
396 sub get_feature_stream {shift->get_seq_stream(@_) }
397
398 =head2 refclass
399
400 Title : refclass
401 Usage : $class = $db->refclass
402 Function: returns the default class to use for segment() calls
403 Returns : a string
404 Args : none
405 Status : public
406
407 For data sources which use namespaces to distinguish reference
408 sequence accessions, this returns the default namespace (or "class")
409 to use. This interface defines a default of "Accession".
410
411 =cut
412
413 sub refclass { "Accession" }
414
415 1;