comparison variant_effect_predictor/Bio/AnalysisI.pm @ 0:2bc9b66ada89 draft default tip

Uploaded
author mahtabm
date Thu, 11 Apr 2013 06:29:17 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:2bc9b66ada89
1 # $Id: AnalysisI.pm,v 1.5.2.1 2003/07/04 02:40:29 shawnh Exp $
2 #
3 # BioPerl module for Bio::AnalysisI
4 #
5 # Cared for by Martin Senger <senger@ebi.ac.uk>
6 # For copyright and disclaimer see below.
7 #
8
9 # POD documentation - main docs before the code
10
11 =head1 NAME
12
13 Bio::AnalysisI - An interface to any (local or remote) analysis tool
14
15 =head1 SYNOPSIS
16
17 This is an interface module - you do not instantiate it.
18 Use C<Bio::Tools::Run::Analysis> module:
19
20 use Bio::Tools::Run::Analysis;
21 my $tool = new Bio::Tools::Run::Analysis (@args);
22
23 =head1 DESCRIPTION
24
25 This interface contains all public methods for accessing and
26 controlling local and remote analysis tools. It is meant to be used on
27 the client side.
28
29 =head1 FEEDBACK
30
31 =head2 Mailing Lists
32
33 User feedback is an integral part of the evolution of this and other
34 Bioperl modules. Send your comments and suggestions preferably to
35 the Bioperl mailing list. Your participation is much appreciated.
36
37 bioperl-l@bioperl.org - General discussion
38 http://bioperl.org/MailList.shtml - About the mailing lists
39
40 =head2 Reporting Bugs
41
42 Report bugs to the Bioperl bug tracking system to help us keep track
43 of the bugs and their resolution. Bug reports can be submitted via
44 email or the web:
45
46 bioperl-bugs@bioperl.org
47 http://bioperl.org/bioperl-bugs/
48
49 =head1 AUTHOR
50
51 Martin Senger (senger@ebi.ac.uk)
52
53 =head1 COPYRIGHT
54
55 Copyright (c) 2003, Martin Senger and EMBL-EBI.
56 All Rights Reserved.
57
58 This module is free software; you can redistribute it and/or modify
59 it under the same terms as Perl itself.
60
61 =head1 DISCLAIMER
62
63 This software is provided "as is" without warranty of any kind.
64
65 =head1 SEE ALSO
66
67 =over
68
69 =item *
70
71 http://industry.ebi.ac.uk/soaplab/Perl_Client.html
72
73 =back
74
75 =head1 APPENDIX
76
77 This is actually the main documentation...
78
79 If you try to call any of these methods directly on this
80 C<Bio::AnalysisI> object you will get a I<not implemented> error
81 message. You need to call them on a C<Bio::Tools::Run::Analysis> object instead.
82
83 =cut
84
85
86 # Let the code begin...
87
88 package Bio::AnalysisI;
89 use vars qw(@ISA $Revision);
90 use strict;
91 use Bio::Root::RootI;
92
93 @ISA = qw(Bio::Root::RootI);
94
95 BEGIN {
96 $Revision = q$Id: AnalysisI.pm,v 1.5.2.1 2003/07/04 02:40:29 shawnh Exp $;
97 }
98
99 # -----------------------------------------------------------------------------
100
101 =head2 analysis_name
102
103 Usage : $tool->analysis_name;
104 Returns : a name of this analysis
105 Args : none
106
107 =cut
108
109 sub analysis_name { shift->throw_not_implemented(); }
110
111 # -----------------------------------------------------------------------------
112
113 =head2 analysis_spec
114
115 Usage : $tool->analysis_spec;
116 Returns : a hash reference describing this analysis
117 Args : none
118
119 The returned hash reference uses the following keys (not all of them always
120 present, perhaps others present as well): C<name>, C<type>, C<version>,
121 C<supplier>, C<installation>, C<description>.
122
123 Here is an example output:
124
125 Analysis 'edit::seqret':
126 installation => EMBL-EBI
127 description => Reads and writes (returns) sequences
128 supplier => EMBOSS
129 version => 2.6.0
130 type => edit
131 name => seqret
132
133 =cut
134
135 sub analysis_spec { shift->throw_not_implemented(); }
136
137 # -----------------------------------------------------------------------------
138
139 =head2 describe
140
141 Usage : $tool->analysis_spec;
142 Returns : an XML detailed description of this analysis
143 Args : none
144
145 The returned XML string contains metadata describing this analysis
146 service. It includes also metadata returned (and easier used) by
147 method C<analysis_spec>, C<input_spec> and C<result_spec>.
148
149 The DTD used for returned metadata is based on the adopted standard
150 (BSA specification for analysis engine):
151
152 <!ELEMENT DsLSRAnalysis (analysis)+>
153
154 <!ELEMENT analysis (description?, input*, output*, extension?)>
155
156 <!ATTLIST analysis
157 type CDATA #REQUIRED
158 name CDATA #IMPLIED
159 version CDATA #IMPLIED
160 supplier CDATA #IMPLIED
161 installation CDATA #IMPLIED>
162
163 <!ELEMENT description ANY>
164 <!ELEMENT extension ANY>
165
166 <!ELEMENT input (default?, allowed*, extension?)>
167
168 <!ATTLIST input
169 type CDATA #REQUIRED
170 name CDATA #REQUIRED
171 mandatory (true|false) "false">
172
173 <!ELEMENT default (#PCDATA)>
174 <!ELEMENT allowed (#PCDATA)>
175
176 <!ELEMENT output (extension?)>
177
178 <!ATTLIST output
179 type CDATA #REQUIRED
180 name CDATA #REQUIRED>
181
182 But the DTD may be extended by provider-specific metadata. For
183 example, the EBI experimental SOAP-based service on top of EMBOSS uses
184 DTD explained at C<http://industry.ebi.ac.uk/applab>.
185
186 =cut
187
188 sub describe { shift->throw_not_implemented(); }
189
190 # -----------------------------------------------------------------------------
191
192 =head2 input_spec
193
194 Usage : $tool->input_spec;
195 Returns : an array reference with hashes as elements
196 Args : none
197
198 The analysis input data are named, and can be also associated with a
199 default value, with allowed values and with few other attributes. The
200 names are important for feeding the service with the input data (the
201 inputs are given to methods C<create_job>, C<run>, and/or C<wait_for>
202 as name/value pairs).
203
204 Here is a (slightly shortened) example of an input specification:
205
206 $input_spec = [
207 {
208 'mandatory' => 'false',
209 'type' => 'String',
210 'name' => 'sequence_usa'
211 },
212 {
213 'mandatory' => 'false',
214 'type' => 'String',
215 'name' => 'sequence_direct_data'
216 },
217 {
218 'mandatory' => 'false',
219 'allowed_values' => [
220 'gcg',
221 'gcg8',
222 ...
223 'raw'
224 ],
225 'type' => 'String',
226 'name' => 'sformat'
227 },
228 {
229 'mandatory' => 'false',
230 'type' => 'String',
231 'name' => 'sbegin'
232 },
233 {
234 'mandatory' => 'false',
235 'type' => 'String',
236 'name' => 'send'
237 },
238 {
239 'mandatory' => 'false',
240 'type' => 'String',
241 'name' => 'sprotein'
242 },
243 {
244 'mandatory' => 'false',
245 'type' => 'String',
246 'name' => 'snucleotide'
247 },
248 {
249 'mandatory' => 'false',
250 'type' => 'String',
251 'name' => 'sreverse'
252 },
253 {
254 'mandatory' => 'false',
255 'type' => 'String',
256 'name' => 'slower'
257 },
258 {
259 'mandatory' => 'false',
260 'type' => 'String',
261 'name' => 'supper'
262 },
263 {
264 'mandatory' => 'false',
265 'default' => 'false',
266 'type' => 'String',
267 'name' => 'firstonly'
268 },
269 {
270 'mandatory' => 'false',
271 'default' => 'fasta',
272 'allowed_values' => [
273 'gcg',
274 'gcg8',
275 'embl',
276 ...
277 'raw'
278 ],
279 'type' => 'String',
280 'name' => 'osformat'
281 }
282 ];
283
284 =cut
285
286 sub input_spec { shift->throw_not_implemented(); }
287
288 # -----------------------------------------------------------------------------
289
290 =head2 result_spec
291
292 Usage : $tool->result_spec;
293 Returns : a hash reference with result names as keys
294 and result types as values
295 Args : none
296
297 The analysis results are named and can be retrieved using their names
298 by methods C<results> and C<result>.
299
300 Here is an example of the result specification (again for the service
301 I<edit::seqret>):
302
303 $result_spec = {
304 'outseq' => 'String',
305 'report' => 'String',
306 'detailed_status' => 'String'
307 };
308
309 =cut
310
311 sub result_spec { shift->throw_not_implemented(); }
312
313 # -----------------------------------------------------------------------------
314
315 =head2 create_job
316
317 Usage : $tool->create_job ( {'sequence'=>'tatat'} )
318 Returns : Bio::Tools::Run::Analysis::Job
319 Args : data and parameters for this execution
320 (in various formats)
321
322 Create an object representing a single execution of this analysis
323 tool.
324
325 Call this method if you wish to "stage the scene" - to create a job
326 with all input data but without actually running it. This method is
327 called automatically from other methods (C<run> and C<wait_for>) so
328 usually you do not need to call it directly.
329
330 The input data and prameters for this execution can be specified in
331 various ways:
332
333 =over
334
335 =item array reference
336
337 The array has scalar elements of the form
338
339 name = [[@]value]
340
341 where C<name> is the name of an input data or input parameter (see
342 method C<input_spec> for finding what names are recognized by this
343 analysis) and C<value> is a value for this data/parameter. If C<value>
344 is missing a 1 is assumed (which is convenient for the boolean
345 options). If C<value> starts with C<@> it is treated as a local
346 filename, and its contents is used as the data/parameter value.
347
348 =item hash reference
349
350 The same as with the array reference but now there is no need to use
351 an equal sign. The hash keys are input names and hash values their
352 data. The values can again start with a C<@> sign indicating a local
353 filename.
354
355 =item scalar
356
357 In this case, the parameter represents a job ID obtained in some
358 previous invocation - such job already exists on the server side, and
359 we are just re-creating it here using the same job ID.
360
361 I<TBD: here we should allow the same by using a reference to the
362 Bio::Tools::Run::Analysis::Job object.>
363
364 =item undef
365
366 Finally, if the parameter is undefined, ask server to create an empty
367 job. The input data may be added later using C<set_data...>
368 method(s) - see scripts/papplmaker.PLS for details.
369
370 =back
371
372 =cut
373
374 sub create_job { shift->throw_not_implemented(); }
375
376 # -----------------------------------------------------------------------------
377
378 =head2 run
379
380 Usage : $tool->run ( ['sequence=@my.seq', 'osformat=embl'] )
381 Returns : Bio::Tools::Run::Analysis::Job,
382 representing started job (an execution)
383 Args : the same as for create_job
384
385 Create a job and start it, but do not wait for its completion.
386
387 =cut
388
389 sub run { shift->throw_not_implemented(); }
390
391 # -----------------------------------------------------------------------------
392
393 =head2 wait_for
394
395 Usage : $tool->wait_for ( { 'sequence' => '@my,file' } )
396 Returns : Bio::Tools::Run::Analysis::Job,
397 representing finished job
398 Args : the same as for create_job
399
400 Create a job, start it and wait for its completion.
401
402 Note that this is a blocking method. It returns only after the
403 executed job finishes, either normally or by an error.
404
405 Usually, after this call, you ask for results of the finished job:
406
407 $analysis->wait_for (...)->results;
408
409 =cut
410
411 sub wait_for { shift->throw_not_implemented(); }
412
413 # -----------------------------------------------------------------------------
414 #
415 # Bio::AnalysisI::JobI
416 #
417 # -----------------------------------------------------------------------------
418
419 package Bio::AnalysisI::JobI;
420
421 =head1 Module Bio::AnalysisI::JobI
422
423 An interface to the public methods provided by C<Bio::Tools::Run::Analysis::Job>
424 objects.
425
426 The C<Bio::Tools::Run::Analysis::Job> objects represent a created,
427 running, or finished execution of an analysis tool.
428
429 The factory for these objects is module C<Bio::Tools::Run::Analysis>
430 where the following methods return an
431 C<Bio::Tools::Run::Analysis::Job> object:
432
433 create_job (returning a prepared job)
434 run (returning a running job)
435 wait_for (returning a finished job)
436
437 =cut
438
439 use vars qw(@ISA);
440 use strict;
441 use Bio::Root::RootI;
442
443 @ISA = qw(Bio::Root::RootI);
444
445 # -----------------------------------------------------------------------------
446
447 =head2 id
448
449 Usage : $job->id;
450 Returns : this job ID
451 Args : none
452
453 Each job (an execution) is identifiable by this unique ID which can be
454 used later to re-create the same job (in other words: to re-connect to
455 the same job). It is useful in cases when a job takes long time to
456 finish and your client program does not want to wait for it within the
457 same session.
458
459 =cut
460
461 sub id { shift->throw_not_implemented(); }
462
463 # -----------------------------------------------------------------------------
464
465 =head2 run
466
467 Usage : $job->run
468 Returns : itself
469 Args : none
470
471 It starts previously created job. The job already must have all input
472 data filled-in. This differs from the method of the same name of the
473 C<Bio::Tools::Run::Analysis> object where the C<run> method creates
474 also a new job allowing to set input data.
475
476 =cut
477
478 sub run { shift->throw_not_implemented(); }
479
480 # -----------------------------------------------------------------------------
481
482 =head2 wait_for
483
484 Usage : $job->wait_for
485 Returns : itself
486 Args : none
487
488 It waits until a previously started execution of this job finishes.
489
490 =cut
491
492 sub wait_for { shift->throw_not_implemented(); }
493
494 # -----------------------------------------------------------------------------
495
496 =head2 terminate
497
498 Usage : $job->terminate
499 Returns : itself
500 Args : none
501
502 Stop the currently running job (represented by this object). This is a
503 definitive stop, there is no way to resume it later.
504
505 =cut
506
507 sub terminate { shift->throw_not_implemented(); }
508
509 # -----------------------------------------------------------------------------
510
511 =head2 last_event
512
513 Usage : $job->last_event
514 Returns : an XML string
515 Args : none
516
517 It returns a short XML document showing what happened last with this
518 job. This is the used DTD:
519
520 <!-- place for extensions -->
521 <!ENTITY % event_body_template "(state_changed | heartbeat_progress | percent_progress | time_progress | step_progress)">
522
523 <!ELEMENT analysis_event (message?, (%event_body_template;)?)>
524
525 <!ATTLIST analysis_event
526 timestamp CDATA #IMPLIED>
527
528 <!ELEMENT message (#PCDATA)>
529
530 <!ELEMENT state_changed EMPTY>
531 <!ENTITY % analysis_state "created | running | completed | terminated_by_request | terminated_by_error">
532 <!ATTLIST state_changed
533 previous_state (%analysis_state;) "created"
534 new_state (%analysis_state;) "created">
535
536 <!ELEMENT heartbeat_progress EMPTY>
537
538 <!ELEMENT percent_progress EMPTY>
539 <!ATTLIST percent_progress
540 percentage CDATA #REQUIRED>
541
542 <!ELEMENT time_progress EMPTY>
543 <!ATTLIST time_progress
544 remaining CDATA #REQUIRED>
545
546 <!ELEMENT step_progress EMPTY>
547 <!ATTLIST step_progress
548 total_steps CDATA #IMPLIED
549 steps_completed CDATA #REQUIRED>
550
551 Here is an example what is returned after a job was created and
552 started, but before it finishes (note that the example uses an
553 analysis 'showdb' which does not need any input data):
554
555 use Bio::Tools::Run::Analysis;
556 print new Bio::Tools::Run::Analysis (-name => 'display::showdb')
557 ->run
558 ->last_event;
559
560 It prints:
561
562 <?xml version = "1.0"?>
563 <analysis_event>
564 <message>Mar 3, 2003 5:14:46 PM (Europe/London)</message>
565 <state_changed previous_state="created" new_state="running"/>
566 </analysis_event>
567
568 The same example but now after it finishes:
569
570 use Bio::Tools::Run::Analysis;
571 print new Bio::Tools::Run::Analysis (-name => 'display::showdb')
572 ->wait_for
573 ->last_event;
574
575 <?xml version = "1.0"?>
576 <analysis_event>
577 <message>Mar 3, 2003 5:17:14 PM (Europe/London)</message>
578 <state_changed previous_state="running" new_state="completed"/>
579 </analysis_event>
580
581 =cut
582
583 sub last_event { shift->throw_not_implemented(); }
584
585 # -----------------------------------------------------------------------------
586
587 =head2 status
588
589 Usage : $job->status
590 Returns : string describing the job status
591 Args : none
592
593 It returns one of the following strings (and perhaps more if a server
594 implementation extended possible job states):
595
596 CREATED
597 RUNNING
598 COMPLETED
599 TERMINATED_BY_REQUEST
600 TERMINATED_BY_ERROR
601
602 =cut
603
604 sub status { shift->throw_not_implemented(); }
605
606 # -----------------------------------------------------------------------------
607
608 =head2 created
609
610 Usage : $job->created (1)
611 Returns : time when this job was created
612 Args : optional
613
614 Without any argument it returns a time of creation of this job in
615 seconds, counting from the beginning of the UNIX epoch
616 (1.1.1970). With a true argument it returns a formatted time, using
617 rules described in C<Bio::Tools::Run::Analysis::Utils::format_time>.
618
619 =cut
620
621 sub created { shift->throw_not_implemented(); }
622
623 # -----------------------------------------------------------------------------
624
625 =head2 started
626
627 Usage : $job->started (1)
628 Returns : time when this job was started
629 Args : optional
630
631 See C<created>.
632
633 =cut
634
635 sub started { shift->throw_not_implemented(); }
636
637 # -----------------------------------------------------------------------------
638
639 =head2 ended
640
641 Usage : $job->ended (1)
642 Returns : time when this job was terminated
643 Args : optional
644
645 See C<created>.
646
647 =cut
648
649 sub ended { shift->throw_not_implemented(); }
650
651 # -----------------------------------------------------------------------------
652
653 =head2 elapsed
654
655 Usage : $job->elapsed
656 Returns : elapsed time of the execution of the given job
657 (in milliseconds), or 0 of job was not yet started
658 Args : none
659
660 Note that some server implementations cannot count in millisecond - so
661 the returned time may be rounded to seconds.
662
663 =cut
664
665 sub elapsed { shift->throw_not_implemented(); }
666
667 # -----------------------------------------------------------------------------
668
669 =head2 times
670
671 Usage : $job->times ('formatted')
672 Returns : a hash refrence with all time characteristics
673 Args : optional
674
675 It is a convenient method returning a hash reference with the folowing
676 keys:
677
678 created
679 started
680 ended
681 elapsed
682
683 See C<create> for remarks on time formating.
684
685 An example - both for unformatted and formatted times:
686
687 use Data::Dumper;
688 use Bio::Tools::Run::Analysis;
689 my $rh = new Bio::Tools::Run::Analysis (-name => 'nucleic_cpg_islands::cpgplot')
690 ->wait_for ( { 'sequence_usa' => 'embl:hsu52852' } )
691 ->times (1);
692 print Data::Dumper->Dump ( [$rh], ['Times']);
693 $rh = new Bio::Tools::Run::Analysis (-name => 'nucleic_cpg_islands::cpgplot')
694 ->wait_for ( { 'sequence_usa' => 'embl:AL499624' } )
695 ->times;
696 print Data::Dumper->Dump ( [$rh], ['Times']);
697
698 $Times = {
699 'ended' => 'Mon Mar 3 17:52:06 2003',
700 'started' => 'Mon Mar 3 17:52:05 2003',
701 'elapsed' => '1000',
702 'created' => 'Mon Mar 3 17:52:05 2003'
703 };
704 $Times = {
705 'ended' => '1046713961',
706 'started' => '1046713926',
707 'elapsed' => '35000',
708 'created' => '1046713926'
709 };
710
711 =cut
712
713 sub times { shift->throw_not_implemented(); }
714
715 # -----------------------------------------------------------------------------
716
717 =head2 results
718
719 Usage : $job->results (...)
720 Returns : one or more results created by this job
721 Args : various, see belou
722
723 This is a complex method trying to make sense for all kinds of
724 results. Especially it tries to help to put binary results (such as
725 images) into local files. Generally it deals with fhe following facts:
726
727 =over
728
729 =item *
730
731 Each analysis tool may produce more results.
732
733 =item *
734
735 Some results may contain binary data not suitable for printing into a
736 terminal window.
737
738 =item *
739
740 Some results may be split into variable number of parts (this is
741 mainly true for the image results that can consist of more *.png
742 files).
743
744 =back
745
746 Note also that results have names to distinguish if there are more of
747 them. The names can be obtained by method C<result_spec>.
748
749 Here are the rules how the method works:
750
751 Retrieving NAMED results:
752 -------------------------
753 results ('name1', ...) => return results as they are, no storing into files
754
755 results ( { 'name1' => 'filename', ... } ) => store into 'filename', return 'filename'
756 results ( 'name1=filename', ...) => ditto
757
758 results ( { 'name1' => '-', ... } ) => send result to the STDOUT, do not return anything
759 results ( 'name1=-', ...) => ditto
760
761 results ( { 'name1' => '@', ... } ) => store into file whose name is invented by
762 this method, perhaps using RESULT_NAME_TEMPLATE env
763 results ( 'name1=@', ...) => ditto
764
765 results ( { 'name1' => '?', ... } ) => find of what type is this result and then use
766 {'name1'=>'@' for binary files, and a regular
767 return for non-binary files
768 results ( 'name=?', ...) => ditto
769
770 Retrieving ALL results:
771 -----------------------
772 results() => return all results as they are, no storing into files
773
774 results ('@') => return all results, as if each of them given
775 as {'name' => '@'} (see above)
776
777 results ('?') => return all results, as if each of them given
778 as {'name' => '?'} (see above)
779
780 Misc:
781 -----
782 * any result can be returned as a scalar value, or as an array reference
783 (the latter is used for results consisting of more parts, such images);
784 this applies regardless whether the returned result is the result itself
785 or a filename created for the result
786
787 * look in the documentation of the C<panalysis[.PLS]> script for examples
788 (especially how to use various templates for inventing file names)
789
790 =cut
791
792 sub results { shift->throw_not_implemented(); }
793
794 # -----------------------------------------------------------------------------
795
796 =head2 result
797
798 Usage : $job->result (...)
799 Returns : the first result
800 Args : see 'results'
801
802 =cut
803
804 sub result { shift->throw_not_implemented(); }
805
806 # -----------------------------------------------------------------------------
807
808 =head2 remove
809
810 Usage : $job->remove
811 Returns : 1
812 Args : none
813
814 The job object is not actually removed in this time but it is marked
815 (setting 1 to C<_destroy_on_exit> attribute) as ready for deletion when
816 the client program ends (including a request to server to forget the job
817 mirror object on the server side).
818
819 =cut
820
821 sub remove { shift->throw_not_implemented(); }
822
823 # -----------------------------------------------------------------------------
824
825 1;
826 __END__
827