comparison variant_effect_predictor/Bio/Tools/SeqAnal.pm @ 0:1f6dce3d34e0

Uploaded
author mahtabm
date Thu, 11 Apr 2013 02:01:53 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:1f6dce3d34e0
1 #-------------------------------------------------------------------------------
2 # PACKAGE : Bio::Tools::SeqAnal
3 # PURPOSE : To provide a base class for different sequence analysis tools.
4 # AUTHOR : Steve Chervitz (sac@bioperl.org)
5 # CREATED : 27 Mar 1998
6 # REVISION: $Id: SeqAnal.pm,v 1.12 2002/10/22 07:38:46 lapp Exp $
7 # STATUS : Alpha
8 #
9 # For documentation, run this module through pod2html
10 # (preferably from Perl v5.004 or better).
11 #-------------------------------------------------------------------------------
12
13 package Bio::Tools::SeqAnal;
14
15 use Bio::Root::Object ();
16 use Bio::Root::Global qw(:std);
17
18 use strict;
19 use vars qw($ID $VERSION @ISA);
20
21 @ISA = qw( Bio::Root::Object );
22 $ID = 'Bio::Tools::SeqAnal';
23 $VERSION = 0.011;
24
25
26 ## POD Documentation:
27
28 =head1 NAME
29
30 Bio::Tools::SeqAnal - Bioperl sequence analysis base class.
31
32 =head1 SYNOPSIS
33
34 =head2 Object Creation
35
36 This module is an abstract base class. Perl will let you instantiate it,
37 but it provides little functionality on its own. This module
38 should be used via a specialized subclass. See L<_initialize()|_initialize>
39 for a description of constructor parameters.
40
41 require Bio::Tools::SeqAnal;
42
43 To run and parse a new report:
44
45 $hit = new Bio::Tools::SeqAnal ( -run => \%runParams,
46 -parse => 1);
47
48 To parse an existing report:
49
50 $hit = new Bio::Tools::SeqAnal ( -file => 'filename.data',
51 -parse => 1);
52
53 To run a report without parsing:
54
55 $hit = new Bio::Tools::SeqAnal ( -run => \%runParams
56 );
57
58 To read an existing report without parsing:
59
60 $hit = new Bio::Tools::SeqAnal ( -file => 'filename.data',
61 -read => 1);
62
63
64 =head1 INSTALLATION
65
66 This module is included with the central Bioperl distribution:
67
68 http://bio.perl.org/Core/Latest
69 ftp://bio.perl.org/pub/DIST
70
71 Follow the installation instructions included in the README file.
72
73
74 =head1 DESCRIPTION
75
76 Bio::Tools::SeqAnal.pm is a base class for specialized
77 sequence analysis modules such as B<Bio::Tools::Blast> and B<Bio::Tools::Fasta>.
78 It provides some basic data and functionalities that are not unique to
79 a specialized module such as:
80
81 =over 4
82
83 =item * reading raw data into memory.
84
85 =item * storing name and version of the program.
86
87 =item * storing name of the query sequence.
88
89 =item * storing name and version of the database.
90
91 =item * storing & determining the date on which the analysis was performed.
92
93 =item * basic file manipulations (compress, uncompress, delete).
94
95 =back
96
97 Some of these functionalities (reading, file maipulation) are inherited from
98 B<Bio::Root::Object>, from which Bio::Tools::SeqAnal.pm derives.
99
100
101
102 =head1 RUN, PARSE, and READ
103
104 A SeqAnal.pm object can be created using one of three modes: run, parse, or read.
105
106 MODE DESCRIPTION
107 ----- -----------
108 run Run a new sequence analysis report. New results can then
109 be parsed or saved for analysis later.
110
111 parse Parse the data from a sequence analysis report loading it
112 into the SeqAnal.pm object.
113
114 read Read in data from an existing raw analysis report without
115 parsing it. In the future, this may also permit persistent
116 SeqAnal.pm objects. This mode is considered experimental.
117
118 The mode is set by supplying switches to the constructor, see L<_initialize()|_initialize>.
119
120
121
122 A key feature of SeqAnal.pm is the ability to access raw data in a
123 generic fashion. Regardless of what sequence analysis method is used,
124 the raw data always need to be read into memory. The SeqAnal.pm class
125 utilizes the L<Bio::Root::Object::read()|Bio::Root::Object> method inherited from
126 B<Bio::Root::Object> to permit the following:
127
128 =over 4
129
130 =item * read from a file or STDIN.
131
132 =item * read a single record or a stream containing multiple records.
133
134 =item * specify a record separator.
135
136 =item * store all input data in memory or process the data stream as it is being read.
137
138 =back
139
140 By permitting the parsing of data as it is being read, each record can be
141 analyzed as it is being read and saved or discarded as necessary.
142 This can be useful when cruching through thousands of reports.
143 For examples of this, see the L<parse()|parse> methods defined in B<Bio::Tools::Blast> and
144 B<Bio::Tools::Fasta>.
145
146
147 =head2 Parsing & Running
148
149 Parsing and running of sequence analysis reports must be implemented for each
150 specific subclass of SeqAnal.pm. No-op stubs ("virtual methods") are provided here for
151 the L<parse()|parse> and L<run()|run> methods. See B<Bio::Tools::Blast> and B<Bio::Tools::Fasta>
152 for examples.
153
154
155 =head1 DEPENDENCIES
156
157 Bio::Tools::SeqAnal.pm is a concrete class that inherits from B<Bio::Root::Object>.
158 This module also makes use of a number of functionalities inherited from
159 B<Bio::Root::Object> (file manipulations such as reading, compressing, decompressing,
160 deleting, and obtaining date.
161
162
163 =head1 FEEDBACK
164
165 =head2 Mailing Lists
166
167 User feedback is an integral part of the evolution of this and other Bioperl modules.
168 Send your comments and suggestions preferably to one of the Bioperl mailing lists.
169 Your participation is much appreciated.
170
171 bioperl-l@bioperl.org - General discussion
172 http://bio.perl.org/MailList.html - About the mailing lists
173
174 =head2 Reporting Bugs
175
176 Report bugs to the Bioperl bug tracking system to help us keep track the bugs and
177 their resolution. Bug reports can be submitted via email or the web:
178
179 bioperl-bugs@bio.perl.org
180 http://bugzilla.bioperl.org/
181
182 =head1 AUTHOR
183
184 Steve Chervitz, sac@bioperl.org
185
186 See the L<FEEDBACK | FEEDBACK> section for where to send bug reports and comments.
187
188 =head1 VERSION
189
190 Bio::Tools::SeqAnal.pm, 0.011
191
192 =head1 COPYRIGHT
193
194 Copyright (c) 1998 Steve Chervitz. All Rights Reserved.
195 This module is free software; you can redistribute it and/or
196 modify it under the same terms as Perl itself.
197
198
199 =head1 SEE ALSO
200
201 http://bio.perl.org/Projects/modules.html - Online module documentation
202 http://bio.perl.org/Projects/Blast/ - Bioperl Blast Project
203 http://bio.perl.org/ - Bioperl Project Homepage
204
205
206 =cut
207
208
209
210 #
211 ##
212 ###
213 #### END of main POD documentation.
214 ###
215 ##
216 #
217
218 =head1 APPENDIX
219
220 Methods beginning with a leading underscore are considered private
221 and are intended for internal use by this module. They are
222 B<not> considered part of the public interface and are described here
223 for documentation purposes only.
224
225 =cut
226
227 ##############################################################################
228 ## CONSTRUCTOR ##
229 ##############################################################################
230
231
232 =head2 _initialize
233
234 Usage : n/a; automatically called by Bio::Root::Object::new()
235 Purpose : Calls private methods to extract the raw report data,
236 : Calls superclass constructor first (Bio::Root::Object.pm).
237 Returns : string containing the make parameter value.
238 Argument : Named parameters (TAGS CAN BE ALL UPPER OR ALL LOWER CASE).
239 : The SeqAnal.pm constructor only processes the following
240 : parameters passed from new()
241 : -RUN => hash reference for named parameters to be used
242 : for running a sequence analysis program.
243 : These are dereferenced and passed to the run() method.
244 : -PARSE => boolean,
245 : -READ => boolean,
246 :
247 : If -RUN is HASH ref, the run() method will be called with the
248 : dereferenced hash.
249 : If -PARSE is true, all parameters passed from new() are passed
250 : to the parse() method. This occurs after the run method call
251 : to enable combined running + parsing.
252 : If -READ is true, all parameters passed from new() are passed
253 : to the read() method.
254 : Either -PARSE or -READ should be true, not both.
255 Comments : Does not calls _rearrange() to handle parameters since only
256 : a few are required and there may be potentially many.
257
258 See Also : B<Bio::Root::Object::new()>, B<Bio::Root::Object::_rearrange()>
259
260 =cut
261
262 #-----------------
263 sub _initialize {
264 #-----------------
265 my( $self, %param ) = @_;
266
267 my $make = $self->SUPER::_initialize(%param);
268
269 my($read, $parse, $runparam) = (
270 ($param{-READ}||$param{'-read'}), ($param{-PARSE}||$param{'-parse'}),
271 ($param{-RUN}||$param{'-run'})
272 );
273
274 # $self->_rearrange([qw(READ PARSE RUN)], @param);
275
276 # Issue: How to keep all the arguments for running the analysis
277 # separate from other arguments needed for parsing the results, etc?
278 # Solution: place all the run arguments in a separate hash.
279
280 $self->run(%$runparam) if ref $runparam eq 'HASH';
281
282 if($parse) { $self->parse(%param); }
283 elsif($read) { $self->read(%param) }
284
285 $make;
286 }
287
288 #--------------
289 sub destroy {
290 #--------------
291 my $self=shift;
292 $DEBUG==2 && print STDERR "DESTROYING $self ${\$self->name}";
293 undef $self->{'_rawData'};
294 $self->SUPER::destroy;
295 }
296
297
298 ###############################################################################
299 # ACCESSORS
300 ###############################################################################
301
302 # The mode of the SeqAnal object is no longer explicitly set.
303 # This simplifies the interface somewhat.
304
305 ##----------------------------------------------------------------------
306 #=head2 mode()
307
308 # Usage : $object->mode();
309 # :
310 # Purpose : Set/Get the mode for the sequence analysis object.
311 # :
312 # Returns : String
313 # :
314 # Argument : n/a
315 # :
316 # :
317 # Comments : The mode specifies how much detail to extract from the
318 # : sequence analysis report. There are three modes:
319 # :
320 # : 'parse' -- Parse the sequence analysis output data.
321 # :
322 # : 'read' -- Reads in the raw report but does not
323 # : attempt to parse it. Useful when you just
324 # : want to work with the output as-is
325 # : (e.g., create HTML-formatted output).
326 # :
327 # : 'run' -- Generates a new report.
328 # :
329 # : Allowable modes are defined by the exported package global array
330 # : @SeqAnal_modes.
331 #
332 #See Also : _set_mode()
333 #=cut
334 ##----------------------------------------------------------------------
335 #sub mode {
336 # my $self = shift;
337 # if(@_) { $self->{'_mode'} = lc(shift); }
338 # $self->{'_mode'};
339 #}
340 #
341
342
343 =head2 best
344
345 Usage : $object->best();
346 Purpose : Set/Get the indicator for processing only the best match.
347 Returns : Boolean (1 | 0)
348 Argument : n/a
349
350 =cut
351
352 #----------
353 sub best {
354 #----------
355 my $self = shift;
356 if(@_) { $self->{'_best'} = shift; }
357 $self->{'_best'};
358 }
359
360
361
362 =head2 _set_db_stats
363
364 Usage : $object->_set_db_stats(<named parameters>);
365 Purpose : Set stats about the database searched.
366 Returns : String
367 Argument : named parameters:
368 : -LETTERS => <int> (number of letters in db)
369 : -SEQS => <int> (number of sequences in db)
370
371 =cut
372
373 #-------------------
374 sub _set_db_stats {
375 #-------------------
376 my ($self, %param) = @_;
377
378 $self->{'_db'} ||= $param{-NAME} || '';
379 $self->{'_dbRelease'} = $param{-RELEASE} || '';
380 ($self->{'_dbLetters'} = $param{-LETTERS} || 0) =~ s/,//g;
381 ($self->{'_dbSeqs'} = $param{-SEQS} || 0) =~ s/,//g;
382
383 }
384
385
386 =head2 database
387
388 Usage : $object->database();
389 Purpose : Set/Get the name of the database searched.
390 Returns : String
391 Argument : n/a
392
393 =cut
394
395 #---------------
396 sub database {
397 #---------------
398 my $self = shift;
399 if(@_) { $self->{'_db'} = shift; }
400 $self->{'_db'};
401 }
402
403
404
405 =head2 database_release
406
407 Usage : $object->database_release();
408 Purpose : Set/Get the release date of the queried database.
409 Returns : String
410 Argument : n/a
411
412 =cut
413
414 #-----------------------
415 sub database_release {
416 #-----------------------
417 my $self = shift;
418 if(@_) { $self->{'_dbRelease'} = shift; }
419 $self->{'_dbRelease'};
420 }
421
422
423 =head2 database_letters
424
425 Usage : $object->database_letters();
426 Purpose : Set/Get the number of letters in the queried database.
427 Returns : Integer
428 Argument : n/a
429
430 =cut
431
432 #----------------------
433 sub database_letters {
434 #----------------------
435 my $self = shift;
436 if(@_) { $self->{'_dbLetters'} = shift; }
437 $self->{'_dbLetters'};
438 }
439
440
441
442 =head2 database_seqs
443
444 Usage : $object->database_seqs();
445 Purpose : Set/Get the number of sequences in the queried database.
446 Returns : Integer
447 Argument : n/a
448
449 =cut
450
451 #------------------
452 sub database_seqs {
453 #------------------
454 my $self = shift;
455 if(@_) { $self->{'_dbSeqs'} = shift; }
456 $self->{'_dbSeqs'};
457 }
458
459
460
461 =head2 set_date
462
463 Usage : $object->set_date([<string>]);
464 Purpose : Set the name of the date on which the analysis was performed.
465 Argument : The optional string argument ca be the date or the
466 : string 'file' in which case the date will be obtained from
467 : the report file
468 Returns : String
469 Throws : Exception if no date is supplied and no file exists.
470 Comments : This method attempts to set the date in either of two ways:
471 : 1) using data passed in as an argument,
472 : 2) using the Bio::Root::Utilities.pm file_date() method
473 : on the output file.
474 : Another way is to extract the date from the contents of the
475 : raw output data. Such parsing will have to be specialized
476 : for different seq analysis reports. Override this method
477 : to create such custom parsing code if desired.
478
479 See Also : L<date()|date>, B<Bio::Root::Object::file_date()>
480
481 =cut
482
483 #---------------
484 sub set_date {
485 #---------------
486 my $self = shift;
487 my $date = shift;
488 my ($file);
489
490 if( !$date and ($file = $self->file)) {
491 # If no date is passed and a file exists, determine date from the file.
492 # (provided by superclass Bio::Root::Object.pm)
493 eval {
494 $date = $self->SUPER::file_date(-FMT => 'd m y');
495 };
496 if($@) {
497 $date = 'UNKNOWN';
498 $self->warn("Can't set date of report.");
499 }
500 }
501 $self->{'_date'} = $date;
502 }
503
504
505
506 =head2 date
507
508 Usage : $object->date();
509 Purpose : Get the name of the date on which the analysis was performed.
510 Returns : String
511 Argument : n/a
512 Comments : This method is not a combination set/get, it only gets.
513
514 See Also : L<set_date()|set_date>
515
516 =cut
517
518 #----------
519 sub date { my $self = shift; $self->{'_date'}; }
520 #----------
521
522
523
524
525 =head2 length
526
527 Usage : $object->length();
528 Purpose : Set/Get the length of the query sequence (number of monomers).
529 Returns : Integer
530 Argument : n/a
531 Comments : Developer note: when using the built-in length function within
532 : this module, call it as CORE::length().
533
534 =cut
535
536 #------------
537 sub length {
538 #------------
539 my $self = shift;
540 if(@_) { $self->{'_length'} = shift; }
541 $self->{'_length'};
542 }
543
544 =head2 program
545
546 Usage : $object->program();
547 Purpose : Set/Get the name of the sequence analysis (BLASTP, FASTA, etc.)
548 Returns : String
549 Argument : n/a
550
551 =cut
552
553 #-------------
554 sub program {
555 #-------------
556 my $self = shift;
557 if(@_) { $self->{'_prog'} = shift; }
558 $self->{'_prog'};
559 }
560
561
562
563 =head2 program_version
564
565 Usage : $object->program_version();
566 Purpose : Set/Get the version number of the sequence analysis program.
567 : (e.g., 1.4.9MP, 2.0a19MP-WashU).
568 Returns : String
569 Argument : n/a
570
571 =cut
572
573 #---------------------
574 sub program_version {
575 #---------------------
576 my $self = shift;
577 if(@_) { $self->{'_progVersion'} = shift; }
578 $self->{'_progVersion'};
579 }
580
581
582 =head2 query
583
584 Usage : $name = $object->query();
585 Purpose : Get the name of the query sequence used to generate the report.
586 Argument : n/a
587 Returns : String
588 Comments : Equivalent to $object->name().
589
590 =cut
591
592 #--------
593 sub query { my $self = shift; $self->name; }
594 #--------
595
596
597 =head2 query_desc
598
599 Usage : $object->desc();
600 Purpose : Set/Get the description of the query sequence for the analysis.
601 Returns : String
602 Argument : n/a
603
604 =cut
605
606 #--------------
607 sub query_desc {
608 #--------------
609 my $self = shift;
610 if(@_) { $self->{'_qDesc'} = shift; }
611 $self->{'_qDesc'};
612 }
613
614
615
616
617 =head2 display
618
619 Usage : $object->display(<named parameters>);
620 Purpose : Display information about Bio::Tools::SeqAnal.pm data members.
621 : Overrides Bio::Root::Object::display().
622 Example : $object->display(-SHOW=>'stats');
623 Argument : Named parameters: -SHOW => 'file' | 'stats'
624 : -WHERE => filehandle (default = STDOUT)
625 Returns : n/a
626 Status : Experimental
627
628 See Also : L<_display_stats()|_display_stats>, L<_display_file()|_display_file>, B<Bio::Root::Object::display()>
629
630 =cut
631
632 #---------------
633 sub display {
634 #---------------
635 my( $self, %param ) = @_;
636
637 $self->SUPER::display(%param);
638
639 my $OUT = $self->fh();
640 $self->show =~ /file/i and $self->_display_file($OUT);
641 1;
642 }
643
644
645
646 =head2 _display_file
647
648 Usage : n/a; called automatically by display()
649 Purpose : Print the contents of the raw report file.
650 Example : n/a
651 Argument : one argument = filehandle object.
652 Returns : true (1)
653 Status : Experimental
654
655 See Also : L<display()|display>
656
657 =cut
658
659 #------------------
660 sub _display_file {
661 #------------------
662 my( $self, $OUT) = @_;
663
664 print $OUT scalar($self->read);
665 1;
666 }
667
668
669
670 =head2 _display_stats
671
672 Usage : n/a; called automatically by display()
673 Purpose : Display information about Bio::Tools::SeqAnal.pm data members.
674 : Prints the file name, program, program version, database name,
675 : database version, query name, query length,
676 Example : n/a
677 Argument : one argument = filehandle object.
678 Returns : printf call.
679 Status : Experimental
680
681 See Also : B<Bio::Root::Object::display()>
682
683 =cut
684
685 #--------------------
686 sub _display_stats {
687 #--------------------
688 my( $self, $OUT ) = @_;
689
690 printf( $OUT "\n%-15s: %s\n", "QUERY NAME", $self->query ||'UNKNOWN' );
691 printf( $OUT "%-15s: %s\n", "QUERY DESC", $self->query_desc || 'UNKNOWN');
692 printf( $OUT "%-15s: %s\n", "LENGTH", $self->length || 'UNKNOWN');
693 printf( $OUT "%-15s: %s\n", "FILE", $self->file || 'STDIN');
694 printf( $OUT "%-15s: %s\n", "DATE", $self->date || 'UNKNOWN');
695 printf( $OUT "%-15s: %s\n", "PROGRAM", $self->program || 'UNKNOWN');
696 printf( $OUT "%-15s: %s\n", "VERSION", $self->program_version || 'UNKNOWN');
697 printf( $OUT "%-15s: %s\n", "DB-NAME", $self->database || 'UNKNOWN');
698 printf( $OUT "%-15s: %s\n", "DB-RELEASE", ($self->database_release || 'UNKNOWN'));
699 printf( $OUT "%-15s: %s\n", "DB-LETTERS", ($self->database_letters) ? $self->database_letters : 'UNKNOWN');
700 printf( $OUT "%-15s: %s\n", "DB-SEQUENCES", ($self->database_seqs) ? $self->database_seqs : 'UNKNOWN');
701 }
702
703
704 #####################################################################################
705 ## VIRTUAL METHODS ##
706 #####################################################################################
707
708 =head1 VIRTUAL METHODS
709
710 =head2 parse
711
712 Usage : $object->parse( %named_parameters )
713 Purpose : Parse a raw sequence analysis report.
714 Returns : Integer (number of sequence analysis reports parsed).
715 Argument : Named parameters.
716 Throws : Exception: virtual method not defined.
717 : Propagates any exception thrown by read()
718 Status : Virtual
719 Comments : This is virtual method that should be overridden to
720 : parse a specific type of data.
721
722 See Also : B<Bio::Root::Object::read()>
723
724 =cut
725
726 #---------
727 sub parse {
728 #---------
729 my ($self, @param) = @_;
730
731 $self->throw("Virtual method parse() not defined ${ref($self)} objects.");
732
733 # The first step in parsing is reading in the data:
734 $self->read(@param);
735 }
736
737
738
739 =head2 run
740
741 Usage : $object->run( %named_parameters )
742 Purpose : Run a sequence analysis program on one or more sequences.
743 Returns : n/a
744 : Run mode should be configurable to return a parsed object or
745 : the raw results data.
746 Argument : Named parameters:
747 Throws : Exception: virtual method not defined.
748 Status : Virtual
749
750 =cut
751
752 #--------
753 sub run {
754 #--------
755 my ($self, %param) = @_;
756 $self->throw("Virtual method run() not defined ${ref($self)} objects.");
757 }
758
759
760 1;
761 __END__
762
763 #####################################################################################
764 # END OF CLASS #
765 #####################################################################################
766
767
768 =head1 FOR DEVELOPERS ONLY
769
770 =head2 Data Members
771
772 Information about the various data members of this module is provided for those
773 wishing to modify or understand the code. Two things to bear in mind:
774
775 =over 4
776
777 =item 1 Do NOT rely on these in any code outside of this module.
778
779 All data members are prefixed with an underscore to signify that they are private.
780 Always use accessor methods. If the accessor doesn't exist or is inadequate,
781 create or modify an accessor (and let me know, too!).
782
783 =item 2 This documentation may be incomplete and out of date.
784
785 It is easy for these data member descriptions to become obsolete as
786 this module is still evolving. Always double check this info and search
787 for members not described here.
788
789 =back
790
791 An instance of Bio::Tools::SeqAnal.pm is a blessed reference to a hash containing
792 all or some of the following fields:
793
794 FIELD VALUE
795 --------------------------------------------------------------
796 _file Full path to file containing raw sequence analysis report.
797
798 _mode Affects how much detail to extract from the raw report.
799 Future mode will also distinguish 'running' from 'parsing'
800
801
802 THE FOLLOWING MAY BE EXTRACTABLE FROM THE RAW REPORT FILE:
803
804 _prog Name of the sequence analysis program.
805
806 _progVersion Version number of the program.
807
808 _db Database searched.
809
810 _dbRelease Version or date of the database searched.
811
812 _dbLetters Total number of letters in the database.
813
814 _dbSequences Total number of sequences in the database.
815
816 _query Name of query sequence.
817
818 _length Length of the query sequence.
819
820 _date Date on which the analysis was performed.
821
822
823 INHERITED DATA MEMBERS
824
825 _name From Bio::Root::Object.pm. String representing the name of the query sequence.
826 Typically obtained from the report file.
827
828 _parent From Bio::Root::Object.pm. This member contains a reference to the
829 object to which this seq anal report belongs. Optional & experimenta.
830 (E.g., a protein object could create and own a Blast object.)
831
832 =cut
833
834 1;