diff variant_effect_predictor/Bio/Tools/SeqAnal.pm @ 0:1f6dce3d34e0

Uploaded
author mahtabm
date Thu, 11 Apr 2013 02:01:53 -0400
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/variant_effect_predictor/Bio/Tools/SeqAnal.pm	Thu Apr 11 02:01:53 2013 -0400
@@ -0,0 +1,834 @@
+#-------------------------------------------------------------------------------
+# PACKAGE : Bio::Tools::SeqAnal
+# PURPOSE : To provide a base class for different sequence analysis tools.
+# AUTHOR  : Steve Chervitz (sac@bioperl.org)
+# CREATED : 27 Mar 1998
+# REVISION: $Id: SeqAnal.pm,v 1.12 2002/10/22 07:38:46 lapp Exp $
+# STATUS  : Alpha
+#
+# For documentation, run this module through pod2html
+# (preferably from Perl v5.004 or better).
+#-------------------------------------------------------------------------------
+
+package Bio::Tools::SeqAnal;
+
+use Bio::Root::Object ();
+use Bio::Root::Global qw(:std);
+
+use strict;
+use vars qw($ID $VERSION @ISA);
+
+@ISA        = qw( Bio::Root::Object );
+$ID = 'Bio::Tools::SeqAnal';
+$VERSION  = 0.011;
+
+
+## POD Documentation:
+
+=head1 NAME
+
+Bio::Tools::SeqAnal - Bioperl sequence analysis base class.
+
+=head1 SYNOPSIS
+
+=head2 Object Creation
+
+This module is an abstract base class. Perl will let you instantiate it,
+but it provides little functionality on its own. This module
+should be used via a specialized subclass. See L<_initialize()|_initialize>
+for a description of constructor parameters.
+
+    require Bio::Tools::SeqAnal;
+
+To run and parse a new report:
+
+    $hit = new Bio::Tools::SeqAnal ( -run   => \%runParams,
+				     -parse => 1);
+
+To parse an existing report:
+
+    $hit = new Bio::Tools::SeqAnal ( -file  => 'filename.data',
+				     -parse => 1);
+
+To run a report without parsing:
+
+    $hit = new Bio::Tools::SeqAnal ( -run   => \%runParams
+				     );
+
+To read an existing report without parsing:
+
+    $hit = new Bio::Tools::SeqAnal ( -file  => 'filename.data',
+				     -read  => 1);
+
+
+=head1 INSTALLATION
+
+This module is included with the central Bioperl distribution:
+
+   http://bio.perl.org/Core/Latest
+   ftp://bio.perl.org/pub/DIST
+
+Follow the installation instructions included in the README file.
+
+
+=head1 DESCRIPTION
+
+Bio::Tools::SeqAnal.pm is a base class for specialized
+sequence analysis modules such as B<Bio::Tools::Blast> and B<Bio::Tools::Fasta>.
+It provides some basic data and functionalities that are not unique to
+a specialized module such as:
+
+=over 4
+
+=item * reading raw data into memory.
+
+=item * storing name and version of the program.
+
+=item * storing name of the query sequence.
+
+=item * storing name and version of the database.
+
+=item * storing & determining the date on which the analysis was performed.
+
+=item * basic file manipulations (compress, uncompress, delete).
+
+=back
+
+Some of these functionalities (reading, file maipulation) are inherited from
+B<Bio::Root::Object>, from which Bio::Tools::SeqAnal.pm derives.
+
+
+
+=head1 RUN, PARSE, and READ
+
+A SeqAnal.pm object can be created using one of three modes: run, parse, or read.
+
+  MODE      DESCRIPTION
+  -----     -----------
+  run       Run a new sequence analysis report. New results can then
+            be parsed or saved for analysis later.
+
+  parse     Parse the data from a sequence analysis report loading it
+            into the SeqAnal.pm object.
+
+  read      Read in data from an existing raw analysis report without
+            parsing it. In the future, this may also permit persistent
+            SeqAnal.pm objects. This mode is considered experimental.
+
+The mode is set by supplying switches to the constructor, see L<_initialize()|_initialize>.
+
+
+
+A key feature of SeqAnal.pm is the ability to access raw data in a
+generic fashion. Regardless of what sequence analysis method is used,
+the raw data always need to be read into memory.  The SeqAnal.pm class
+utilizes the L<Bio::Root::Object::read()|Bio::Root::Object> method inherited from
+B<Bio::Root::Object> to permit the following:
+
+=over 4
+
+=item * read from a file or STDIN.
+
+=item * read a single record or a stream containing multiple records.
+
+=item * specify a record separator.
+
+=item * store all input data in memory or process the data stream as it is being read.
+
+=back
+
+By permitting the parsing of data as it is being read, each record can be
+analyzed as it is being read and saved or discarded as necessary.
+This can be useful when cruching through thousands of reports.
+For examples of this, see the L<parse()|parse> methods defined in B<Bio::Tools::Blast> and
+B<Bio::Tools::Fasta>.
+
+
+=head2 Parsing & Running
+
+Parsing and running of sequence analysis reports must be implemented for each
+specific subclass of SeqAnal.pm. No-op stubs ("virtual methods") are provided here for
+the L<parse()|parse> and L<run()|run> methods. See B<Bio::Tools::Blast> and B<Bio::Tools::Fasta>
+for examples.
+
+
+=head1 DEPENDENCIES
+
+Bio::Tools::SeqAnal.pm is a concrete class that inherits from B<Bio::Root::Object>.
+This module also makes use of a number of functionalities inherited from
+B<Bio::Root::Object> (file manipulations such as reading, compressing, decompressing,
+deleting, and obtaining date.
+
+
+=head1 FEEDBACK
+
+=head2 Mailing Lists
+
+User feedback is an integral part of the evolution of this and other Bioperl modules.
+Send your comments and suggestions preferably to one of the Bioperl mailing lists.
+Your participation is much appreciated.
+
+    bioperl-l@bioperl.org          - General discussion
+    http://bio.perl.org/MailList.html             - About the mailing lists
+
+=head2 Reporting Bugs
+
+Report bugs to the Bioperl bug tracking system to help us keep track the bugs and
+their resolution. Bug reports can be submitted via email or the web:
+
+    bioperl-bugs@bio.perl.org
+    http://bugzilla.bioperl.org/
+
+=head1 AUTHOR
+
+Steve Chervitz, sac@bioperl.org
+
+See the L<FEEDBACK | FEEDBACK> section for where to send bug reports and comments.
+
+=head1 VERSION
+
+Bio::Tools::SeqAnal.pm, 0.011
+
+=head1 COPYRIGHT
+
+Copyright (c) 1998 Steve Chervitz. All Rights Reserved.
+This module is free software; you can redistribute it and/or
+modify it under the same terms as Perl itself.
+
+
+=head1 SEE ALSO
+
+ http://bio.perl.org/Projects/modules.html  - Online module documentation
+ http://bio.perl.org/Projects/Blast/        - Bioperl Blast Project
+ http://bio.perl.org/                       - Bioperl Project Homepage
+
+
+=cut
+
+
+
+#
+##
+###
+#### END of main POD documentation.
+###
+##
+#
+
+=head1 APPENDIX
+
+Methods beginning with a leading underscore are considered private
+and are intended for internal use by this module. They are
+B<not> considered part of the public interface and are described here
+for documentation purposes only.
+
+=cut
+
+##############################################################################
+##                          CONSTRUCTOR                                     ##
+##############################################################################
+
+
+=head2 _initialize
+
+ Usage     : n/a; automatically called by Bio::Root::Object::new()
+ Purpose   : Calls private methods to extract the raw report data,
+           : Calls superclass constructor first (Bio::Root::Object.pm).
+ Returns   : string containing the make parameter value.
+ Argument  : Named parameters (TAGS CAN BE ALL UPPER OR ALL LOWER CASE).
+           : The SeqAnal.pm constructor only processes the following
+           : parameters passed from new()
+           :     -RUN     => hash reference for named parameters to be used
+           :                 for running a sequence analysis program.
+           :                 These are dereferenced and passed to the run() method.
+	   :     -PARSE   => boolean,
+	   :     -READ    => boolean,
+           :
+           : If -RUN is HASH ref, the run() method will be called with the
+           :   dereferenced hash.
+           : If -PARSE is true, all parameters passed from new() are passed
+           :   to the parse() method. This occurs after the run method call
+           :   to enable combined running + parsing.
+           : If -READ is true, all parameters passed from new() are passed
+           :   to the read() method.
+           : Either -PARSE or -READ should be true, not both.
+ Comments  : Does not calls _rearrange() to handle parameters since only
+           : a few are required and there may be potentially many.
+
+See Also   : B<Bio::Root::Object::new()>, B<Bio::Root::Object::_rearrange()>
+
+=cut
+
+#-----------------
+sub _initialize {
+#-----------------
+    my( $self, %param ) = @_;
+
+    my $make = $self->SUPER::_initialize(%param);
+
+    my($read, $parse, $runparam) = (
+	($param{-READ}||$param{'-read'}), ($param{-PARSE}||$param{'-parse'}),
+	($param{-RUN}||$param{'-run'})
+				    );
+
+#	$self->_rearrange([qw(READ PARSE RUN)], @param);
+	
+    # Issue: How to keep all the arguments for running the analysis
+    # separate from other arguments needed for parsing the results, etc?
+    # Solution: place all the run arguments in a separate hash.
+
+    $self->run(%$runparam) if ref $runparam eq 'HASH';
+
+    if($parse) { $self->parse(%param); }
+    elsif($read) { $self->read(%param) }
+
+    $make;
+}
+
+#--------------
+sub destroy {
+#--------------
+    my $self=shift;
+    $DEBUG==2 && print STDERR "DESTROYING $self ${\$self->name}";
+    undef $self->{'_rawData'};
+    $self->SUPER::destroy;
+}
+
+
+###############################################################################
+#                                 ACCESSORS
+###############################################################################
+
+# The mode of the SeqAnal object is no longer explicitly set.
+# This simplifies the interface somewhat.
+
+##----------------------------------------------------------------------
+#=head2 mode()
+
+# Usage     : $object->mode();
+#	    :
+# Purpose   : Set/Get the mode for the sequence analysis object.
+#	    :
+# Returns   : String
+#	    :
+# Argument  : n/a
+#	    :
+#	    :
+# Comments  : The mode specifies how much detail to extract from the
+#	    : sequence analysis report. There are three modes:
+#	    :
+#	    :    'parse' -- Parse the sequence analysis output data.
+#	    :
+#	    :     'read' -- Reads in the raw report but does not
+#	    :               attempt to parse it. Useful when you just
+#	    :               want to work with the output as-is
+#	    :               (e.g., create HTML-formatted output).
+#	    :
+#	    :     'run'  -- Generates a new report.
+#	    :
+#	    : Allowable modes are defined by the exported package global array
+#	    : @SeqAnal_modes.
+#
+#See Also   : _set_mode()
+#=cut
+##----------------------------------------------------------------------
+#sub mode {
+#    my $self = shift;
+#    if(@_) { $self->{'_mode'} = lc(shift); }
+#    $self->{'_mode'};
+#}
+#
+
+
+=head2 best
+
+ Usage     : $object->best();
+ Purpose   : Set/Get the indicator for processing only the best match.
+ Returns   : Boolean (1 | 0)
+ Argument  : n/a
+
+=cut
+
+#----------
+sub best {
+#----------
+    my $self = shift;
+    if(@_) { $self->{'_best'} = shift; }
+    $self->{'_best'};
+}
+
+
+
+=head2 _set_db_stats
+
+ Usage     : $object->_set_db_stats(<named parameters>);
+ Purpose   : Set stats about the database searched.
+ Returns   : String
+ Argument  : named parameters:
+           :   -LETTERS => <int>  (number of letters in db)
+           :   -SEQS    => <int>  (number of sequences in db)
+
+=cut
+
+#-------------------
+sub _set_db_stats {
+#-------------------
+    my ($self, %param) = @_;
+
+    $self->{'_db'}        ||= $param{-NAME}    || '';
+    $self->{'_dbRelease'}   = $param{-RELEASE} || '';
+    ($self->{'_dbLetters'}  = $param{-LETTERS} || 0)  =~ s/,//g;
+    ($self->{'_dbSeqs'}     = $param{-SEQS}    || 0) =~ s/,//g;
+
+}
+
+
+=head2 database
+
+ Usage     : $object->database();
+ Purpose   : Set/Get the name of the database searched.
+ Returns   : String
+ Argument  : n/a
+
+=cut
+
+#---------------
+sub database {
+#---------------
+    my $self = shift;
+    if(@_) { $self->{'_db'} = shift; }
+    $self->{'_db'};
+}
+
+
+
+=head2 database_release
+
+ Usage     : $object->database_release();
+ Purpose   : Set/Get the release date of the queried database.
+ Returns   : String
+ Argument  : n/a
+
+=cut
+
+#-----------------------
+sub database_release {
+#-----------------------
+    my $self = shift;
+    if(@_) { $self->{'_dbRelease'} = shift; }
+    $self->{'_dbRelease'};
+}
+
+
+=head2 database_letters
+
+ Usage     : $object->database_letters();
+ Purpose   : Set/Get the number of letters in the queried database.
+ Returns   : Integer
+ Argument  : n/a
+
+=cut
+
+#----------------------
+sub database_letters {
+#----------------------
+    my $self = shift;
+    if(@_) { $self->{'_dbLetters'} = shift; }
+    $self->{'_dbLetters'};
+}
+
+
+
+=head2 database_seqs
+
+ Usage     : $object->database_seqs();
+ Purpose   : Set/Get the number of sequences in the queried database.
+ Returns   : Integer
+ Argument  : n/a
+
+=cut
+
+#------------------
+sub database_seqs {
+#------------------
+    my $self = shift;
+    if(@_) { $self->{'_dbSeqs'} = shift; }
+    $self->{'_dbSeqs'};
+}
+
+
+
+=head2 set_date
+
+ Usage     : $object->set_date([<string>]);
+ Purpose   : Set the name of the date on which the analysis was performed.
+ Argument  : The optional string argument ca be the date or the
+           : string 'file' in which case the date will be obtained from
+           : the report file
+ Returns   : String
+ Throws    : Exception if no date is supplied and no file exists.
+ Comments  : This method attempts to set the date in either of two ways:
+           :   1) using data passed in as an argument,
+           :   2) using the Bio::Root::Utilities.pm file_date() method
+           :      on the output file.
+           : Another way is to extract the date from the contents of the
+           : raw output data. Such parsing will have to be specialized
+           : for different seq analysis reports. Override this method
+           : to create such custom parsing code if desired.
+
+See Also   : L<date()|date>, B<Bio::Root::Object::file_date()>
+
+=cut
+
+#---------------
+sub set_date {
+#---------------
+    my $self = shift;
+    my $date = shift;
+    my ($file);
+
+    if( !$date and ($file = $self->file)) {
+	# If no date is passed and a file exists, determine date from the file.
+	# (provided by superclass Bio::Root::Object.pm)
+	eval {
+	    $date = $self->SUPER::file_date(-FMT => 'd m y');
+	};
+	if($@) {
+	    $date = 'UNKNOWN';
+	    $self->warn("Can't set date of report.");
+	}
+    }
+    $self->{'_date'} = $date;
+}
+
+
+
+=head2 date
+
+ Usage     : $object->date();
+ Purpose   : Get the name of the date on which the analysis was performed.
+ Returns   : String
+ Argument  : n/a
+ Comments  : This method is not a combination set/get, it only gets.
+
+See Also   : L<set_date()|set_date>
+
+=cut
+
+#----------
+sub date {  my $self = shift;  $self->{'_date'}; }
+#----------
+
+
+
+
+=head2 length
+
+ Usage     : $object->length();
+ Purpose   : Set/Get the length of the query sequence (number of monomers).
+ Returns   : Integer
+ Argument  : n/a
+ Comments  : Developer note: when using the built-in length function within
+           : this module, call it as CORE::length().
+
+=cut
+
+#------------
+sub length {
+#------------
+    my $self = shift;
+    if(@_) { $self->{'_length'} = shift; }
+    $self->{'_length'};
+}
+
+=head2 program
+
+ Usage     : $object->program();
+ Purpose   : Set/Get the name of the sequence analysis (BLASTP, FASTA, etc.)
+ Returns   : String
+ Argument  : n/a
+
+=cut
+
+#-------------
+sub program {
+#-------------
+    my $self = shift;
+    if(@_) { $self->{'_prog'} = shift; }
+    $self->{'_prog'};
+}
+
+
+
+=head2 program_version
+
+ Usage     : $object->program_version();
+ Purpose   : Set/Get the version number of the sequence analysis program.
+           : (e.g., 1.4.9MP, 2.0a19MP-WashU).
+ Returns   : String
+ Argument  : n/a
+
+=cut
+
+#---------------------
+sub program_version {
+#---------------------
+    my $self = shift;
+    if(@_) { $self->{'_progVersion'} = shift; }
+    $self->{'_progVersion'};
+}
+
+
+=head2 query
+
+ Usage     : $name = $object->query();
+ Purpose   : Get the name of the query sequence used to generate the report.
+ Argument  : n/a
+ Returns   : String
+ Comments  : Equivalent to $object->name().
+
+=cut
+
+#--------
+sub query { my $self = shift; $self->name; }
+#--------
+
+
+=head2 query_desc
+
+ Usage     : $object->desc();
+ Purpose   : Set/Get the description of the query sequence for the analysis.
+ Returns   : String
+ Argument  : n/a
+
+=cut
+
+#--------------
+sub query_desc {
+#--------------
+    my $self = shift;
+    if(@_) { $self->{'_qDesc'} = shift; }
+    $self->{'_qDesc'};
+}
+
+
+
+
+=head2 display
+
+ Usage     : $object->display(<named parameters>);
+ Purpose   : Display information about Bio::Tools::SeqAnal.pm data members.
+           : Overrides Bio::Root::Object::display().
+ Example   : $object->display(-SHOW=>'stats');
+ Argument  : Named parameters: -SHOW  => 'file' | 'stats'
+           :                   -WHERE => filehandle (default = STDOUT)
+ Returns   : n/a
+ Status    : Experimental
+
+See Also   : L<_display_stats()|_display_stats>, L<_display_file()|_display_file>, B<Bio::Root::Object::display()>
+
+=cut
+
+#---------------
+sub display {
+#---------------
+    my( $self, %param ) = @_;
+
+    $self->SUPER::display(%param);
+
+    my $OUT = $self->fh();
+    $self->show =~ /file/i and $self->_display_file($OUT);
+    1;
+}
+
+
+
+=head2 _display_file
+
+ Usage     : n/a; called automatically by display()
+ Purpose   : Print the contents of the raw report file.
+ Example   : n/a
+ Argument  : one argument = filehandle object.
+ Returns   : true (1)
+ Status    : Experimental
+
+See Also   : L<display()|display>
+
+=cut
+
+#------------------
+sub _display_file {
+#------------------
+    my( $self, $OUT) = @_;
+
+    print $OUT scalar($self->read);
+    1;
+}
+
+
+
+=head2 _display_stats
+
+ Usage     : n/a; called automatically by display()
+ Purpose   : Display information about Bio::Tools::SeqAnal.pm data members.
+           : Prints the file name, program, program version, database name,
+           : database version, query name, query length,
+ Example   : n/a
+ Argument  : one argument = filehandle object.
+ Returns   : printf call.
+ Status    : Experimental
+
+See Also   : B<Bio::Root::Object::display()>
+
+=cut
+
+#--------------------
+sub _display_stats {
+#--------------------
+    my( $self, $OUT ) = @_;
+
+    printf( $OUT "\n%-15s: %s\n", "QUERY NAME", $self->query ||'UNKNOWN' );
+    printf( $OUT "%-15s: %s\n", "QUERY DESC", $self->query_desc || 'UNKNOWN');
+    printf( $OUT "%-15s: %s\n", "LENGTH", $self->length || 'UNKNOWN');
+    printf( $OUT "%-15s: %s\n", "FILE", $self->file || 'STDIN');
+    printf( $OUT "%-15s: %s\n", "DATE", $self->date || 'UNKNOWN');
+    printf( $OUT "%-15s: %s\n", "PROGRAM", $self->program || 'UNKNOWN');
+    printf( $OUT "%-15s: %s\n", "VERSION", $self->program_version || 'UNKNOWN');
+    printf( $OUT "%-15s: %s\n", "DB-NAME", $self->database || 'UNKNOWN');
+    printf( $OUT "%-15s: %s\n", "DB-RELEASE", ($self->database_release || 'UNKNOWN'));
+    printf( $OUT "%-15s: %s\n", "DB-LETTERS", ($self->database_letters) ? $self->database_letters : 'UNKNOWN');
+    printf( $OUT "%-15s: %s\n", "DB-SEQUENCES", ($self->database_seqs) ? $self->database_seqs : 'UNKNOWN');
+}
+
+
+#####################################################################################
+##                                 VIRTUAL METHODS                                 ##
+#####################################################################################
+
+=head1 VIRTUAL METHODS
+
+=head2 parse
+
+ Usage     : $object->parse( %named_parameters )
+ Purpose   : Parse a raw sequence analysis report.
+ Returns   : Integer (number of sequence analysis reports parsed).
+ Argument  : Named parameters.
+ Throws    : Exception: virtual method not defined.
+           : Propagates any exception thrown by read()
+ Status    : Virtual
+ Comments  : This is virtual method that should be overridden to
+           : parse a specific type of data.
+
+See Also   : B<Bio::Root::Object::read()>
+
+=cut
+
+#---------
+sub parse {
+#---------
+    my ($self, @param) = @_;
+
+    $self->throw("Virtual method parse() not defined ${ref($self)} objects.");
+
+    # The first step in parsing is reading in the data:
+    $self->read(@param);
+}
+
+
+
+=head2 run
+
+ Usage     : $object->run( %named_parameters )
+ Purpose   : Run a sequence analysis program on one or more sequences.
+ Returns   : n/a
+           : Run mode should be configurable to return a parsed object or
+           : the raw results data.
+ Argument  : Named parameters:
+ Throws    : Exception: virtual method not defined.
+ Status    : Virtual
+
+=cut
+
+#--------
+sub run {
+#--------
+    my ($self, %param) = @_;
+    $self->throw("Virtual method run() not defined ${ref($self)} objects.");
+}
+
+
+1;
+__END__
+
+#####################################################################################
+#                                END OF CLASS                                       #
+#####################################################################################
+
+
+=head1 FOR DEVELOPERS ONLY
+
+=head2 Data Members
+
+Information about the various data members of this module is provided for those
+wishing to modify or understand the code. Two things to bear in mind:
+
+=over 4
+
+=item 1 Do NOT rely on these in any code outside of this module.
+
+All data members are prefixed with an underscore to signify that they are private.
+Always use accessor methods. If the accessor doesn't exist or is inadequate,
+create or modify an accessor (and let me know, too!).
+
+=item 2 This documentation may be incomplete and out of date.
+
+It is easy for these data member descriptions to become obsolete as
+this module is still evolving. Always double check this info and search
+for members not described here.
+
+=back
+
+An instance of Bio::Tools::SeqAnal.pm is a blessed reference to a hash containing
+all or some of the following fields:
+
+ FIELD           VALUE
+ --------------------------------------------------------------
+  _file            Full path to file containing raw sequence analysis report.
+
+  _mode            Affects how much detail to extract from the raw report.
+ 		   Future mode will also distinguish 'running' from 'parsing'
+
+
+ THE FOLLOWING MAY BE EXTRACTABLE FROM THE RAW REPORT FILE:
+
+  _prog            Name of the sequence analysis program.
+
+  _progVersion     Version number of the program.
+
+  _db              Database searched.
+
+  _dbRelease       Version or date of the database searched.
+
+  _dbLetters       Total number of letters in the database.
+
+  _dbSequences     Total number of sequences in the database.
+
+  _query           Name of query sequence.
+
+  _length          Length of the query sequence.
+
+  _date            Date on which the analysis was performed.
+
+
+  INHERITED DATA MEMBERS
+
+  _name            From Bio::Root::Object.pm. String representing the name of the query sequence.
+ 		   Typically obtained from the report file.
+
+  _parent          From Bio::Root::Object.pm. This member contains a reference to the
+ 		   object to which this seq anal report belongs. Optional & experimenta.
+                   (E.g., a protein object could create and own a Blast object.)
+
+=cut
+
+1;