Mercurial > repos > mahtabm > ensembl
diff variant_effect_predictor/Bio/Tools/SeqPattern.pm @ 0:1f6dce3d34e0
Uploaded
author | mahtabm |
---|---|
date | Thu, 11 Apr 2013 02:01:53 -0400 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/variant_effect_predictor/Bio/Tools/SeqPattern.pm Thu Apr 11 02:01:53 2013 -0400 @@ -0,0 +1,926 @@ +# $Id: SeqPattern.pm,v 1.14 2002/10/22 07:38:46 lapp Exp $ +# +# bioperl module for Bio::Tools::SeqPattern +# +# Cared for by Steve Chervitz (sac@bioperl.org) +# +# Copyright Steve Chervitz +# +# You may distribute this module under the same terms as perl itself + +# POD documentation - main docs before the code + +=head1 NAME + +Bio::Tools::SeqPattern - Bioperl object for a sequence pattern or motif + +=head1 SYNOPSIS + +=head2 Object Creation + + use Bio::Tools::SeqPattern (); + + $pat1 = 'T[GA]AA...TAAT'; + $pattern1 = new Bio::Tools::SeqPattern(-SEQ =>$pattern, -TYPE =>'Dna'); + + $pat2 = '[VILM]R(GXX){3,2}...[^PG]'; + $pattern2 = new Bio::Tools::SeqPattern(-SEQ =>$pattern, -TYPE =>'Amino'); + +=head1 DESCRIPTION + +The Bio::Tools::SeqPattern.pm module encapsulates generic data and +methods for manipulating regular expressions describing nucleic or +amino acid sequence patterns (a.k.a, "motifs"). + +Bio::Tools::SeqPattern.pm is a concrete class that inherits from +B<Bio::Seq.pm>. + +This class grew out of a need to have a standard module for doing routine +tasks with sequence patterns such as: + + -- Forming a reverse-complement version of a nucleotide sequence pattern + -- Expanding patterns containing ambiguity codes + -- Checking for invalid regexp characters + -- Untainting yet preserving special characters in the pattern + +Other features to look for in the future: + + -- Full pattern syntax checking + -- Conversion between expanded and ondensed forms of the pattern + +=head1 MOTIVATIONS + +A key motivation for Bio::Tools::SeqPattern.pm is to have a way to +generate a reverse complement of a nucleotide sequence pattern. +This makes possible simultaneous pattern matching on both sense and +anti-sense strands of a query sequence. + +In principle, one could do such a search more inefficiently by testing +against both sense and anti-sense versions of a sequence. +It is entirely equivalent to test a regexp containing both sense and +anti-sense versions of the *pattern* against one copy of the sequence. +The latter approach is much more efficient since: + + 1) You need only one copy of the sequence. + 2) Only one regexp is executed. + 3) Regexp patterns are typically much smaller than sequences. + +Patterns can be quite complex and it is often difficult to +generate the reverse complement pattern. The Bioperl SeqPattern.pm +addresses this problem, providing a convenient set of tools +for working with biological sequence regular expressions. + +Not all patterns have been tested. If you discover a pattern that +is not handled properly by Bio::Tools::SeqPattern.pm, please +send me some email (sac@bioperl.org). Thanks. + +=head1 OTHER FEATURES + +=head2 Extended Alphabet Support + +This module supports the same set of ambiguity codes for nucleotide +sequences as supported by B<Bio::Seq.pm>. These ambiguity codes +define the behavior or the expand() method. + + ------------------------------------------ + Symbol Meaning Nucleic Acid + ------------------------------------------ + A A Adenine + C C Cytosine + G G Guanine + T T Thymine + U U Uracil + M A or C + R A or G Any purine + W A or T + S C or G + Y C or T Any pyrimidine + K G or T + V A or C or G + H A or C or T + D A or G or T + B C or G or T + X G or A or T or C + N G or A or T or C + . G or A or T or C + + + + ------------------------------------------ + Symbol Meaning + ------------------------------------------ + A Alanine + C Cysteine + D Aspartic Acid + E Glutamic Acid + F Phenylalanine + G Glycine + H Histidine + I Isoleucine + K Lysine + L Leucine + M Methionine + N Asparagine + P Proline + Q Glutamine + R Arginine + S Serine + T Threonine + V Valine + W Tryptophan + Y Tyrosine + + B Aspartic Acid, Asparagine + Z Glutamic Acid, Glutamine + X Any amino acid + . Any amino acid + + +=head2 Multiple Format Support + +Ultimately, this module should be able to build SeqPattern.pm objects +using a variety of pattern formats such as ProSite, Blocks, Prints, GCG, etc. +Currently, this module only supports patterns using a grep-like syntax. + +=head1 USAGE + +A simple demo script called seq_pattern.pl is included in the examples/ +directory of the central Bioperl distribution. + +=head1 SEE ALSO + +L<Bio::Root::Object> - Base class. +L<Bio::Seq> - Lightweight sequence object. + +http://bio.perl.org/Projects/modules.html - Online module documentation +http://bio.perl.org/ - Bioperl Project Homepage + +=head1 FEEDBACK + +=head2 Mailing Lists + +User feedback is an integral part of the evolution of this and other +Bioperl modules. Send your comments and suggestions preferably to one +of the Bioperl mailing lists. Your participation is much appreciated. + + bioperl-l@bioperl.org - General discussion + http://bio.perl.org/MailList.html - About the mailing lists + +=head2 Reporting Bugs + +Report bugs to the Bioperl bug tracking system to help us keep track +the bugs and their resolution. Bug reports can be submitted via email +or the web: + + bioperl-bugs@bio.perl.org + http://bugzilla.bioperl.org/ + +=head1 AUTHOR + +Steve Chervitz, sac@bioperl.org + +=head1 VERSION + +Bio::Tools::SeqPattern.pm, 0.011 + +=head1 COPYRIGHT + +Copyright (c) 1997-8 Steve Chervitz. All Rights Reserved. +This module is free software; you can redistribute it and/or +modify it under the same terms as Perl itself. + +=cut + +# +## +### +#### END of main POD documentation. +### +## +#' +# CREATED : 28 Aug 1997 + + +package Bio::Tools::SeqPattern; + +use Bio::Root::Root; +@ISA = qw(Bio::Root::Root); +use strict; +use vars qw ($ID $VERSION); +$ID = 'Bio::Tools::SeqPattern'; +$VERSION = 0.011; + +## These constants may be more appropriate in a Bio::Dictionary.pm +## type of class. +my $PURINES = 'AG'; +my $PYRIMIDINES = 'CT'; +my $BEE = 'DN'; +my $ZED = 'EQ'; +my $Regexp_chars = '\w,.\*()\[\]<>\{\}^\$'; # quoted for use in regexps + +## Package variables used in reverse complementing. +my (%Processed_braces, %Processed_asterics); + +##################################################################################### +## CONSTRUCTOR ## +##################################################################################### + + +=head1 new + + Title : new + Usage : my $seqpat = new Bio::Tools::SeqPattern(); + Purpose : Verifies that the type is correct for superclass (Bio::Seq.pm) + : and calls superclass constructor last. + Returns : n/a + Argument : Parameters passed to new() + Throws : Exception if the pattern string (seq) is empty. + Comments : The process of creating a new SeqPattern.pm object + : ensures that the pattern string is untained. + +See Also : B<Bio::Root::Root::new()>, + B<Bio::Seq::_initialize()> + +=cut + +#---------------- +sub new { +#---------------- + my($class, %param) = @_; + + my $self = $class->SUPER::new(%param); + my ($seq,$type) = $self->_rearrange([qw(SEQ TYPE)], %param); + + $seq || $self->throw("Empty pattern."); + my $t; + # Get the type ready for Bio::Seq.pm + if ($type =~ /nuc|[dr]na/i) { + $t = 'Dna'; + } elsif ($type =~ /amino|pep|prot/i) { + $t = 'Amino'; + } + $seq =~ tr/a-z/A-Z/; #ps 8/8/00 Canonicalize to upper case + $self->str($seq); + $self->type($t); + + return $self; +} + + +=head1 alphabet_ok + + Title : alphabet_ok + Usage : $mypat->alphabet_ok; + Purpose : Checks for invalid regexp characters. + : Overrides Bio::Seq::alphabet_ok() to allow + : additional regexp characters ,.*()[]<>{}^$ + : in addition to the standard genetic alphabet. + : Also untaints the pattern and sets the sequence + : object's sequence to the untained string. + Returns : Boolean (1 | 0) + Argument : n/a + Throws : Exception if the pattern contains invalid characters. + Comments : Does not call the superclass method. + : Actually permits any alphanumeric, not just the + : standard genetic alphabet. + +=cut + +#----------------' +sub alphabet_ok { +#---------------- + my( $self) = @_; + + return 1 if $self->{'_alphabet_checked'}; + + $self->{'_alphabet_checked'} = 1; + + my $pat = $self->seq(); + + if($pat =~ /[^$Regexp_chars]/io) { + $self->throw("Pattern contains invalid characters: $pat", + 'Legal characters: a-z,A-Z,0-9,,.*()[]<>{}^$ '); + } + + # Untaint pattern (makes code taint-safe). + $pat =~ /[$Regexp_chars]+/io; + $self->setseq(uc($&)); +# print STDERR "\npattern ok: $pat\n"; + 1; +} + +=head1 expand + + Title : expand + Usage : $seqpat_object->expand(); + Purpose : Expands the sequence pattern using special ambiguity codes. + Example : $pat = $seq_pat->expand(); + Returns : String containing fully expanded sequence pattern + Argument : n/a + Throws : Exception if sequence type is not recognized + : (i.e., is not one of [DR]NA, Amino) + +See Also : B<Extended Alphabet Support>, L<_expand_pep>(), L<_expand_nuc>() + +=cut + +#---------- +sub expand { +#---------- + my $self = shift; + + if($self->type =~ /[DR]na/i) { $self->_expand_nuc(); } + elsif($self->type =~ /Amino/i) { $self->_expand_pep(); } + else{ + $self->throw("Don't know how to expand ${\$self->type} patterns.\n"); + } +} + + +=head1 _expand_pep + + Title : _expand_pep + Usage : n/a; automatically called by expand() + Purpose : Expands peptide patterns + Returns : String (the expanded pattern) + Argument : String (the unexpanded pattern) + Throws : n/a + +See Also : L<expand>(), L<_expand_nuc>() + +=cut + +#---------------- +sub _expand_pep { +#---------------- + my ($self,$pat) = @_; + $pat ||= $self->str; + $pat =~ s/X/./g; + $pat =~ s/^</\^/; + $pat =~ s/>$/\$/; + + ## Avoid nested situations: [bmnq] --/--> [[$ZED]mnq] + ## Yet correctly deal with: fze[bmnq] ---> f[$BEE]e[$ZEDmnq] + if($pat =~ /\[\w*[BZ]\w*\]/) { + $pat =~ s/\[(\w*)B(\w*)\]/\[$1$ZED$2\]/g; + $pat =~ s/\[(\w*)Z(\w*)\]/\[$1$BEE$2\]/g; + $pat =~ s/B/\[$ZED\]/g; + $pat =~ s/Z/\[$BEE\]/g; + } else { + $pat =~ s/B/\[$ZED\]/g; + $pat =~ s/Z/\[$BEE\]/g; + } + $pat =~ s/\((.)\)/$1/g; ## Doing these last since: + $pat =~ s/\[(.)\]/$1/g; ## Pattern could contain [B] (for example) + + return $pat; +} + + + +=head1 _expand_nuc + + Title : _expand_nuc + Purpose : Expands nucleotide patterns + Returns : String (the expanded pattern) + Argument : String (the unexpanded pattern) + Throws : n/a + +See Also : L<expand>(), L<_expand_pep>() + +=cut + +#--------------- +sub _expand_nuc { +#--------------- + my ($self,$pat) = @_; + + $pat ||= $self->str; + $pat =~ s/N|X/./g; + $pat =~ s/pu/R/ig; + $pat =~ s/py/Y/ig; + $pat =~ s/U/T/g; + $pat =~ s/^</\^/; + $pat =~ s/>$/\$/; + + ## Avoid nested situations: [ya] --/--> [[ct]a] + ## Yet correctly deal with: sg[ya] ---> [gc]g[cta] + if($pat =~ /\[\w*[RYSWMK]\w*\]/) { + $pat =~ s/\[(\w*)R(\w*)\]/\[$1$PURINES$2\]/g; + $pat =~ s/\[(\w*)Y(\w*)\]/\[$1$PYRIMIDINES$2\]/g; + $pat =~ s/\[(\w*)S(\w*)\]/\[$1GC$2\]/g; + $pat =~ s/\[(\w*)W(\w*)\]/\[$1AT$2\]/g; + $pat =~ s/\[(\w*)M(\w*)\]/\[$1AC$2\]/g; + $pat =~ s/\[(\w*)K(\w*)\]/\[$1GT$2\]/g; + $pat =~ s/\[(\w*)V(\w*)\]/\[$1ACG$2\]/g; + $pat =~ s/\[(\w*)H(\w*)\]/\[$1ACT$2\]/g; + $pat =~ s/\[(\w*)D(\w*)\]/\[$1AGT$2\]/g; + $pat =~ s/\[(\w*)B(\w*)\]/\[$1CGT$2\]/g; + $pat =~ s/R/\[$PURINES\]/g; + $pat =~ s/Y/\[$PYRIMIDINES\]/g; + $pat =~ s/S/\[GC\]/g; + $pat =~ s/W/\[AT\]/g; + $pat =~ s/M/\[AC\]/g; + $pat =~ s/K/\[GT\]/g; + $pat =~ s/V/\[ACG\]/g; + $pat =~ s/H/\[ACT\]/g; + $pat =~ s/D/\[AGT\]/g; + $pat =~ s/B/\[CGT\]/g; + } else { + $pat =~ s/R/\[$PURINES\]/g; + $pat =~ s/Y/\[$PYRIMIDINES\]/g; + $pat =~ s/S/\[GC\]/g; + $pat =~ s/W/\[AT\]/g; + $pat =~ s/M/\[AC\]/g; + $pat =~ s/K/\[GT\]/g; + $pat =~ s/V/\[ACG\]/g; + $pat =~ s/H/\[ACT\]/g; + $pat =~ s/D/\[AGT\]/g; + $pat =~ s/B/\[CGT\]/g; + } + $pat =~ s/\((.)\)/$1/g; ## Doing thses last since: + $pat =~ s/\[(.)\]/$1/g; ## Pattern could contain [y] (for example) + + return $pat; +} + + + +=head1 revcom + + Title : revcom + Usage : revcom([1]); + Purpose : Forms a pattern capable of recognizing the reverse complement + : version of a nucleotide sequence pattern. + Example : $pattern_object->revcom(); + : $pattern_object->revcom(1); ## returns expanded rev complement pattern. + Returns : Object reference for a new Bio::Tools::SeqPattern containing + : the revcom of the current pattern as its sequence. + Argument : (1) boolean (optional) (default= false) + : true : expand the pattern before rev-complementing. + : false: don't expand pattern before or after rev-complementing. + Throws : Exception if called for amino acid sequence pattern. + Comments : This method permits the simultaneous searching of both + : sense and anti-sense versions of a nucleotide pattern + : by means of a grep-type of functionality in which any + : number of patterns may be or-ed into the recognition + : pattern. + : Overrides Bio::Seq::revcom() and calls it first thing. + : The order of _fixpat() calls is critical. + +See Also : B<Bio::Seq::revcom()>, L<_fixpat_1>(), L<_fixpat_2>(), L<_fixpat_3>(), L<_fixpat_4>(), L<_fixpat_5>() + +=cut + +#-----------' +sub revcom { +#----------- + my($self,$expand) = @_; + + if ($self->type !~ /Dna|Rna/i) { + $self->throw("Can't get revcom for ${\$self->type} sequence types.\n"); + } +# return $self->{'_rev'} if defined $self->{'_rev'}; + + $expand ||= 0; + my $str = $self->str; + $str =~ tr/acgtrymkswhbvdnxACGTRYMKSWHBVDNX/tgcayrkmswdvbhnxTGCAYRKMSWDVBHNX/; + my $rev = CORE::reverse $str; + $rev =~ tr/[](){}<>/][)(}{></; + + if($expand) { + $rev = $self->_expand_nuc($rev); +# print "\nExpanded: $rev\n"; + } + + %Processed_braces = (); + %Processed_asterics = (); + + my $fixrev = _fixpat_1($rev); +# print "FIX 1: $fixrev";<STDIN>; + + $fixrev = _fixpat_2($fixrev); +# print "FIX 2: $fixrev";<STDIN>; + + $fixrev = _fixpat_3($fixrev); +# print "FIX 3: $fixrev";<STDIN>; + + $fixrev = _fixpat_4($fixrev); +# print "FIX 4: $fixrev";<STDIN>; + + $fixrev = _fixpat_5($fixrev); +# print "FIX 5: $fixrev";<STDIN>; + +##### Added by ps 8/7/00 to allow non-greedy matching + $fixrev = _fixpat_6($fixrev); +# print "FIX 6: $fixrev";<STDIN>; + +# $self->{'_rev'} = $fixrev; + + return new Bio::Tools::SeqPattern(-seq =>$fixrev, -type =>$self->type); +} + + + +=head1 _fixpat_1 + + Title : _fixpat_1 + Usage : n/a; called automatically by revcom() + Purpose : Utility method for revcom() + : Converts all {7,5} --> {5,7} (Part I) + : and [T^] --> [^T] (Part II) + : and *N --> N* (Part III) + Returns : String (the new, partially reversed pattern) + Argument : String (the expanded pattern) + Throws : n/a + +See Also : L<revcom>() + +=cut + +#-------------- +sub _fixpat_1 { +#-------------- + my $pat = shift; + + ## Part I: + my (@done,@parts); + while(1) { + $pat =~ /(.*)\{(\S+?)\}(.*)/ or do{ push @done, $pat; last; }; + $pat = $1.'#{'.reverse($2).'}'.$3; +# print "1: $1\n2: $2\n3: $3\n"; +# print "modified pat: $pat";<STDIN>; + @parts = split '#', $pat; + push @done, $parts[1]; + $pat = $parts[0]; +# print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; + last if not $pat; + } + $pat = join('', reverse @done); + + ## Part II: + @done = (); + while(1) { + $pat =~ /(.*)\[(\S+?)\](.*)/ or do{ push @done, $pat; last; }; + $pat = $1.'#['.reverse($2).']'.$3; +# print "1: $1\n2: $2\n3: $3\n"; +# print "modified pat: $pat";<STDIN>; + @parts = split '#', $pat; + push @done, $parts[1]; + $pat = $parts[0]; +# print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; + last if not $pat; + } + $pat = join('', reverse @done); + + ## Part III: + @done = (); + while(1) { + $pat =~ /(.*)\*([\w.])(.*)/ or do{ push @done, $pat; last; }; + $pat = $1.'#'.$2.'*'.$3; + $Processed_asterics{$2}++; +# print "1: $1\n2: $2\n3: $3\n"; +# print "modified pat: $pat";<STDIN>; + @parts = split '#', $pat; + push @done, $parts[1]; + $pat = $parts[0]; +# print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; + last if not $pat; + } + return join('', reverse @done); + +} + + +=head1 _fixpat_2 + + Title : _fixpat_2 + Usage : n/a; called automatically by revcom() + Purpose : Utility method for revcom() + : Converts all {5,7}Y ---> Y{5,7} + : and {10,}. ---> .{10,} + Returns : String (the new, partially reversed pattern) + Argument : String (the expanded, partially reversed pattern) + Throws : n/a + +See Also : L<revcom>() + +=cut + +#-------------- +sub _fixpat_2 { +#-------------- + my $pat = shift; + + local($^W) = 0; + my (@done,@parts,$braces); + while(1) { +# $pat =~ s/(.*)([^])])(\{\S+?\})([\w.])(.*)/$1$2#$4$3$5/ or do{ push @done, $pat; last; }; + $pat =~ s/(.*)(\{\S+?\})([\w.])(.*)/$1#$3$2$4/ or do{ push @done, $pat; last; }; + $braces = $2; + $braces =~ s/[{}]//g; + $Processed_braces{"$3$braces"}++; +# print "modified pat: $pat";<STDIN>; + @parts = split '#', $pat; + push @done, $parts[1]; + $pat = $parts[0]; +# print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; + last if not $pat; + } + return join('', reverse @done); +} + + +=head1 _fixpat_3 + + Title : _fixpat_3 + Usage : n/a; called automatically by revcom() + Purpose : Utility method for revcom() + : Converts all {5,7}(XXX) ---> (XXX){5,7} + Returns : String (the new, partially reversed pattern) + Argument : String (the expanded, partially reversed pattern) + Throws : n/a + +See Also : L<revcom>() + +=cut + +#------------- +sub _fixpat_3 { +#------------- + my $pat = shift; + + my (@done,@parts,$braces,$newpat,$oldpat); + while(1) { +# $pat =~ s/(.+)(\{\S+\})(\(\w+\))(.*)/$1#$3$2$4/ or do{ push @done, $pat; last; }; + if( $pat =~ /(.*)(.)(\{\S+\})(\(\w+\))(.*)/) { + $newpat = "$1#$2$4$3$5"; +##ps $oldpat = "$1#$2$3$4$5"; +# print "1: $1\n2: $2\n3: $3\n4: $4\n5: $5\n"; +##ps $braces = $3; +##ps $braces =~ s/[{}]//g; +##ps if( exists $Processed_braces{"$2$braces"} || exists $Processed_asterics{$2}) { +##ps $pat = $oldpat; # Don't change it. Already processed. +# print "saved pat: $pat";<STDIN>; +##ps } else { +# print "new pat: $newpat";<STDIN>; + $pat = $newpat; # Change it. +##ps } + } elsif( $pat =~ /^(\{\S+\})(\(\w+\))(.*)/) { + $pat = "#$2$1$3"; + } else { + push @done, $pat; last; + } + @parts = split '#', $pat; + push @done, $parts[1]; + $pat = $parts[0]; +# print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; + last if not $pat; + } + return join('', reverse @done); +} + + +=head1 _fixpat_4 + + Title : _fixpat_4 + Usage : n/a; called automatically by revcom() + Purpose : Utility method for revcom() + : Converts all {5,7}[XXX] ---> [XXX]{5,7} + Returns : String (the new, partially reversed pattern) + Argument : String (the expanded, partially reversed pattern) + Throws : n/a + +See Also : L<revcom>() + +=cut + +#--------------- +sub _fixpat_4 { +#--------------- + my $pat = shift; + + my (@done,@parts,$braces,$newpat,$oldpat); + while(1) { +# $pat =~ s/(.*)(\{\S+\})(\[\w+\])(.*)/$1#$3$2$4/ or do{ push @done, $pat; last; }; +# $pat =~ s/(.*)([^\w.])(\{\S+\})(\[\w+\])(.*)/$1$2#$4$3$5/ or do{ push @done, $pat; last; }; + if( $pat =~ /(.*)(.)(\{\S+\})(\[\w+\])(.*)/) { + $newpat = "$1#$2$4$3$5"; + $oldpat = "$1#$2$3$4$5"; +# print "1: $1\n2: $2\n3: $3\n4: $4\n5: $5\n"; + $braces = $3; + $braces =~ s/[{}]//g; + if( (defined $braces and defined $2) and + exists $Processed_braces{"$2$braces"} || exists $Processed_asterics{$2}) { + $pat = $oldpat; # Don't change it. Already processed. +# print "saved pat: $pat";<STDIN>; + } else { + $pat = $newpat; # Change it. +# print "new pat: $pat";<STDIN>; + } + } elsif( $pat =~ /^(\{\S+\})(\[\w+\])(.*)/) { + $pat = "#$2$1$3"; + } else { + push @done, $pat; last; + } + + @parts = split '#', $pat; + push @done, $parts[1]; + $pat = $parts[0]; +# print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; + last if not $pat; + } + return join('', reverse @done); +} + + +=head1 _fixpat_5 + + Title : _fixpat_5 + Usage : n/a; called automatically by revcom() + Purpose : Utility method for revcom() + : Converts all *[XXX] ---> [XXX]* + : and *(XXX) ---> (XXX)* + Returns : String (the new, partially reversed pattern) + Argument : String (the expanded, partially reversed pattern) + Throws : n/a + +See Also : L<revcom>() + +=cut + +#-------------- +sub _fixpat_5 { +#-------------- + my $pat = shift; + + my (@done,@parts,$newpat,$oldpat); + while(1) { +# $pat =~ s/(.*)(\{\S+\})(\[\w+\])(.*)/$1#$3$2$4/ or do{ push @done, $pat; last; }; +# $pat =~ s/(.*)([^\w.])(\{\S+\})(\[\w+\])(.*)/$1$2#$4$3$5/ or do{ push @done, $pat; last; }; + if( $pat =~ /(.*)(.)\*(\[\w+\]|\(\w+\))(.*)/) { + $newpat = "$1#$2$3*$4"; + $oldpat = "$1#$2*$3$4"; +# print "1: $1\n2: $2\n3: $3\n4: $4\n"; + if( exists $Processed_asterics{$2}) { + $pat = $oldpat; # Don't change it. Already processed. +# print "saved pat: $pat";<STDIN>; + } else { + $pat = $newpat; # Change it. +# print "new pat: $pat";<STDIN>; + } + } elsif( $pat =~ /^\*(\[\w+\]|\(\w+\))(.*)/) { + $pat = "#$1*$3"; + } else { + push @done, $pat; last; + } + + @parts = split '#', $pat; + push @done, $parts[1]; + $pat = $parts[0]; +# print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; + last if not $pat; + } + return join('', reverse @done); +} + + + + + +############################ +# +# PS: Added 8/7/00 to allow non-greedy matching patterns +# +###################################### + +=head1 _fixpat_6 + + Title : _fixpat_6 + Usage : n/a; called automatically by revcom() + Purpose : Utility method for revcom() + : Converts all ?Y{5,7} ---> Y{5,7}? + : and ?(XXX){5,7} ---> (XXX){5,7}? + : and ?[XYZ]{5,7} ---> [XYZ]{5,7}? + Returns : String (the new, partially reversed pattern) + Argument : String (the expanded, partially reversed pattern) + Throws : n/a + +See Also : L<revcom>() + +=cut + +#-------------- +sub _fixpat_6 { +#-------------- + my $pat = shift; + my (@done,@parts); + + @done = (); + while(1) { + $pat =~ /(.*)\?(\[\w+\]|\(\w+\)|\w)(\{\S+?\})?(.*)/ or do{ push @done, $pat; last; }; + my $quantifier = $3 ? $3 : ""; # Shut up warning if no explicit quantifier + $pat = $1.'#'.$2.$quantifier.'?'.$4; +# $pat = $1.'#'.$2.$3.'?'.$4; + +# print "1: $1\n2: $2\n3: $3\n"; +# print "modified pat: $pat";<STDIN>; + @parts = split '#', $pat; + push @done, $parts[1]; + $pat = $parts[0]; +# print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; + last if not $pat; + } + return join('', reverse @done); + + } + +=head2 str + + Title : str + Usage : $obj->str($newval) + Function: + Returns : value of str + Args : newvalue (optional) + + +=cut + +sub str{ + my $obj = shift; + if( @_ ) { + my $value = shift; + $obj->{'str'} = $value; + } + return $obj->{'str'}; + +} + +=head2 type + + Title : type + Usage : $obj->type($newval) + Function: + Returns : value of type + Args : newvalue (optional) + + +=cut + +sub type{ + my $obj = shift; + if( @_ ) { + my $value = shift; + $obj->{'type'} = $value; + } + return $obj->{'type'}; + +} + +1; + +__END__ + +######################################################################### +# End of class +######################################################################### + +=head1 FOR DEVELOPERS ONLY + +=head2 Data Members + +Information about the various data members of this module is provided +for those wishing to modify or understand the code. Two things to bear +in mind: + +=over 2 + +=item 1 Do NOT rely on these in any code outside of this module. + +All data members are prefixed with an underscore to signify that they +are private. Always use accessor methods. If the accessor doesn't +exist or is inadequate, create or modify an accessor (and let me know, +too!). + +=item 2 This documentation may be incomplete and out of date. + +It is easy for this documentation to become obsolete as this module is +still evolving. Always double check this info and search for members +not described here. + +=back + +An instance of Bio::Tools::RestrictionEnzyme.pm is a blessed reference +to a hash containing all or some of the following fields: + + FIELD VALUE + ------------------------------------------------------------------------ + _rev : The corrected reverse complement of the fully expanded pattern. + + INHERITED DATA MEMBERS: + + _seq : (From Bio::Seq.pm) The original, unexpanded input sequence after untainting. + _type : (From Bio::Seq.pm) 'Dna' or 'Amino' + + +=cut