comparison variant_effect_predictor/Bio/Tools/SeqPattern.pm @ 0:2bc9b66ada89 draft default tip

Uploaded
author mahtabm
date Thu, 11 Apr 2013 06:29:17 -0400
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:2bc9b66ada89
1 # $Id: SeqPattern.pm,v 1.14 2002/10/22 07:38:46 lapp Exp $
2 #
3 # bioperl module for Bio::Tools::SeqPattern
4 #
5 # Cared for by Steve Chervitz (sac@bioperl.org)
6 #
7 # Copyright Steve Chervitz
8 #
9 # You may distribute this module under the same terms as perl itself
10
11 # POD documentation - main docs before the code
12
13 =head1 NAME
14
15 Bio::Tools::SeqPattern - Bioperl object for a sequence pattern or motif
16
17 =head1 SYNOPSIS
18
19 =head2 Object Creation
20
21 use Bio::Tools::SeqPattern ();
22
23 $pat1 = 'T[GA]AA...TAAT';
24 $pattern1 = new Bio::Tools::SeqPattern(-SEQ =>$pattern, -TYPE =>'Dna');
25
26 $pat2 = '[VILM]R(GXX){3,2}...[^PG]';
27 $pattern2 = new Bio::Tools::SeqPattern(-SEQ =>$pattern, -TYPE =>'Amino');
28
29 =head1 DESCRIPTION
30
31 The Bio::Tools::SeqPattern.pm module encapsulates generic data and
32 methods for manipulating regular expressions describing nucleic or
33 amino acid sequence patterns (a.k.a, "motifs").
34
35 Bio::Tools::SeqPattern.pm is a concrete class that inherits from
36 B<Bio::Seq.pm>.
37
38 This class grew out of a need to have a standard module for doing routine
39 tasks with sequence patterns such as:
40
41 -- Forming a reverse-complement version of a nucleotide sequence pattern
42 -- Expanding patterns containing ambiguity codes
43 -- Checking for invalid regexp characters
44 -- Untainting yet preserving special characters in the pattern
45
46 Other features to look for in the future:
47
48 -- Full pattern syntax checking
49 -- Conversion between expanded and ondensed forms of the pattern
50
51 =head1 MOTIVATIONS
52
53 A key motivation for Bio::Tools::SeqPattern.pm is to have a way to
54 generate a reverse complement of a nucleotide sequence pattern.
55 This makes possible simultaneous pattern matching on both sense and
56 anti-sense strands of a query sequence.
57
58 In principle, one could do such a search more inefficiently by testing
59 against both sense and anti-sense versions of a sequence.
60 It is entirely equivalent to test a regexp containing both sense and
61 anti-sense versions of the *pattern* against one copy of the sequence.
62 The latter approach is much more efficient since:
63
64 1) You need only one copy of the sequence.
65 2) Only one regexp is executed.
66 3) Regexp patterns are typically much smaller than sequences.
67
68 Patterns can be quite complex and it is often difficult to
69 generate the reverse complement pattern. The Bioperl SeqPattern.pm
70 addresses this problem, providing a convenient set of tools
71 for working with biological sequence regular expressions.
72
73 Not all patterns have been tested. If you discover a pattern that
74 is not handled properly by Bio::Tools::SeqPattern.pm, please
75 send me some email (sac@bioperl.org). Thanks.
76
77 =head1 OTHER FEATURES
78
79 =head2 Extended Alphabet Support
80
81 This module supports the same set of ambiguity codes for nucleotide
82 sequences as supported by B<Bio::Seq.pm>. These ambiguity codes
83 define the behavior or the expand() method.
84
85 ------------------------------------------
86 Symbol Meaning Nucleic Acid
87 ------------------------------------------
88 A A Adenine
89 C C Cytosine
90 G G Guanine
91 T T Thymine
92 U U Uracil
93 M A or C
94 R A or G Any purine
95 W A or T
96 S C or G
97 Y C or T Any pyrimidine
98 K G or T
99 V A or C or G
100 H A or C or T
101 D A or G or T
102 B C or G or T
103 X G or A or T or C
104 N G or A or T or C
105 . G or A or T or C
106
107
108
109 ------------------------------------------
110 Symbol Meaning
111 ------------------------------------------
112 A Alanine
113 C Cysteine
114 D Aspartic Acid
115 E Glutamic Acid
116 F Phenylalanine
117 G Glycine
118 H Histidine
119 I Isoleucine
120 K Lysine
121 L Leucine
122 M Methionine
123 N Asparagine
124 P Proline
125 Q Glutamine
126 R Arginine
127 S Serine
128 T Threonine
129 V Valine
130 W Tryptophan
131 Y Tyrosine
132
133 B Aspartic Acid, Asparagine
134 Z Glutamic Acid, Glutamine
135 X Any amino acid
136 . Any amino acid
137
138
139 =head2 Multiple Format Support
140
141 Ultimately, this module should be able to build SeqPattern.pm objects
142 using a variety of pattern formats such as ProSite, Blocks, Prints, GCG, etc.
143 Currently, this module only supports patterns using a grep-like syntax.
144
145 =head1 USAGE
146
147 A simple demo script called seq_pattern.pl is included in the examples/
148 directory of the central Bioperl distribution.
149
150 =head1 SEE ALSO
151
152 L<Bio::Root::Object> - Base class.
153 L<Bio::Seq> - Lightweight sequence object.
154
155 http://bio.perl.org/Projects/modules.html - Online module documentation
156 http://bio.perl.org/ - Bioperl Project Homepage
157
158 =head1 FEEDBACK
159
160 =head2 Mailing Lists
161
162 User feedback is an integral part of the evolution of this and other
163 Bioperl modules. Send your comments and suggestions preferably to one
164 of the Bioperl mailing lists. Your participation is much appreciated.
165
166 bioperl-l@bioperl.org - General discussion
167 http://bio.perl.org/MailList.html - About the mailing lists
168
169 =head2 Reporting Bugs
170
171 Report bugs to the Bioperl bug tracking system to help us keep track
172 the bugs and their resolution. Bug reports can be submitted via email
173 or the web:
174
175 bioperl-bugs@bio.perl.org
176 http://bugzilla.bioperl.org/
177
178 =head1 AUTHOR
179
180 Steve Chervitz, sac@bioperl.org
181
182 =head1 VERSION
183
184 Bio::Tools::SeqPattern.pm, 0.011
185
186 =head1 COPYRIGHT
187
188 Copyright (c) 1997-8 Steve Chervitz. All Rights Reserved.
189 This module is free software; you can redistribute it and/or
190 modify it under the same terms as Perl itself.
191
192 =cut
193
194 #
195 ##
196 ###
197 #### END of main POD documentation.
198 ###
199 ##
200 #'
201 # CREATED : 28 Aug 1997
202
203
204 package Bio::Tools::SeqPattern;
205
206 use Bio::Root::Root;
207 @ISA = qw(Bio::Root::Root);
208 use strict;
209 use vars qw ($ID $VERSION);
210 $ID = 'Bio::Tools::SeqPattern';
211 $VERSION = 0.011;
212
213 ## These constants may be more appropriate in a Bio::Dictionary.pm
214 ## type of class.
215 my $PURINES = 'AG';
216 my $PYRIMIDINES = 'CT';
217 my $BEE = 'DN';
218 my $ZED = 'EQ';
219 my $Regexp_chars = '\w,.\*()\[\]<>\{\}^\$'; # quoted for use in regexps
220
221 ## Package variables used in reverse complementing.
222 my (%Processed_braces, %Processed_asterics);
223
224 #####################################################################################
225 ## CONSTRUCTOR ##
226 #####################################################################################
227
228
229 =head1 new
230
231 Title : new
232 Usage : my $seqpat = new Bio::Tools::SeqPattern();
233 Purpose : Verifies that the type is correct for superclass (Bio::Seq.pm)
234 : and calls superclass constructor last.
235 Returns : n/a
236 Argument : Parameters passed to new()
237 Throws : Exception if the pattern string (seq) is empty.
238 Comments : The process of creating a new SeqPattern.pm object
239 : ensures that the pattern string is untained.
240
241 See Also : B<Bio::Root::Root::new()>,
242 B<Bio::Seq::_initialize()>
243
244 =cut
245
246 #----------------
247 sub new {
248 #----------------
249 my($class, %param) = @_;
250
251 my $self = $class->SUPER::new(%param);
252 my ($seq,$type) = $self->_rearrange([qw(SEQ TYPE)], %param);
253
254 $seq || $self->throw("Empty pattern.");
255 my $t;
256 # Get the type ready for Bio::Seq.pm
257 if ($type =~ /nuc|[dr]na/i) {
258 $t = 'Dna';
259 } elsif ($type =~ /amino|pep|prot/i) {
260 $t = 'Amino';
261 }
262 $seq =~ tr/a-z/A-Z/; #ps 8/8/00 Canonicalize to upper case
263 $self->str($seq);
264 $self->type($t);
265
266 return $self;
267 }
268
269
270 =head1 alphabet_ok
271
272 Title : alphabet_ok
273 Usage : $mypat->alphabet_ok;
274 Purpose : Checks for invalid regexp characters.
275 : Overrides Bio::Seq::alphabet_ok() to allow
276 : additional regexp characters ,.*()[]<>{}^$
277 : in addition to the standard genetic alphabet.
278 : Also untaints the pattern and sets the sequence
279 : object's sequence to the untained string.
280 Returns : Boolean (1 | 0)
281 Argument : n/a
282 Throws : Exception if the pattern contains invalid characters.
283 Comments : Does not call the superclass method.
284 : Actually permits any alphanumeric, not just the
285 : standard genetic alphabet.
286
287 =cut
288
289 #----------------'
290 sub alphabet_ok {
291 #----------------
292 my( $self) = @_;
293
294 return 1 if $self->{'_alphabet_checked'};
295
296 $self->{'_alphabet_checked'} = 1;
297
298 my $pat = $self->seq();
299
300 if($pat =~ /[^$Regexp_chars]/io) {
301 $self->throw("Pattern contains invalid characters: $pat",
302 'Legal characters: a-z,A-Z,0-9,,.*()[]<>{}^$ ');
303 }
304
305 # Untaint pattern (makes code taint-safe).
306 $pat =~ /[$Regexp_chars]+/io;
307 $self->setseq(uc($&));
308 # print STDERR "\npattern ok: $pat\n";
309 1;
310 }
311
312 =head1 expand
313
314 Title : expand
315 Usage : $seqpat_object->expand();
316 Purpose : Expands the sequence pattern using special ambiguity codes.
317 Example : $pat = $seq_pat->expand();
318 Returns : String containing fully expanded sequence pattern
319 Argument : n/a
320 Throws : Exception if sequence type is not recognized
321 : (i.e., is not one of [DR]NA, Amino)
322
323 See Also : B<Extended Alphabet Support>, L<_expand_pep>(), L<_expand_nuc>()
324
325 =cut
326
327 #----------
328 sub expand {
329 #----------
330 my $self = shift;
331
332 if($self->type =~ /[DR]na/i) { $self->_expand_nuc(); }
333 elsif($self->type =~ /Amino/i) { $self->_expand_pep(); }
334 else{
335 $self->throw("Don't know how to expand ${\$self->type} patterns.\n");
336 }
337 }
338
339
340 =head1 _expand_pep
341
342 Title : _expand_pep
343 Usage : n/a; automatically called by expand()
344 Purpose : Expands peptide patterns
345 Returns : String (the expanded pattern)
346 Argument : String (the unexpanded pattern)
347 Throws : n/a
348
349 See Also : L<expand>(), L<_expand_nuc>()
350
351 =cut
352
353 #----------------
354 sub _expand_pep {
355 #----------------
356 my ($self,$pat) = @_;
357 $pat ||= $self->str;
358 $pat =~ s/X/./g;
359 $pat =~ s/^</\^/;
360 $pat =~ s/>$/\$/;
361
362 ## Avoid nested situations: [bmnq] --/--> [[$ZED]mnq]
363 ## Yet correctly deal with: fze[bmnq] ---> f[$BEE]e[$ZEDmnq]
364 if($pat =~ /\[\w*[BZ]\w*\]/) {
365 $pat =~ s/\[(\w*)B(\w*)\]/\[$1$ZED$2\]/g;
366 $pat =~ s/\[(\w*)Z(\w*)\]/\[$1$BEE$2\]/g;
367 $pat =~ s/B/\[$ZED\]/g;
368 $pat =~ s/Z/\[$BEE\]/g;
369 } else {
370 $pat =~ s/B/\[$ZED\]/g;
371 $pat =~ s/Z/\[$BEE\]/g;
372 }
373 $pat =~ s/\((.)\)/$1/g; ## Doing these last since:
374 $pat =~ s/\[(.)\]/$1/g; ## Pattern could contain [B] (for example)
375
376 return $pat;
377 }
378
379
380
381 =head1 _expand_nuc
382
383 Title : _expand_nuc
384 Purpose : Expands nucleotide patterns
385 Returns : String (the expanded pattern)
386 Argument : String (the unexpanded pattern)
387 Throws : n/a
388
389 See Also : L<expand>(), L<_expand_pep>()
390
391 =cut
392
393 #---------------
394 sub _expand_nuc {
395 #---------------
396 my ($self,$pat) = @_;
397
398 $pat ||= $self->str;
399 $pat =~ s/N|X/./g;
400 $pat =~ s/pu/R/ig;
401 $pat =~ s/py/Y/ig;
402 $pat =~ s/U/T/g;
403 $pat =~ s/^</\^/;
404 $pat =~ s/>$/\$/;
405
406 ## Avoid nested situations: [ya] --/--> [[ct]a]
407 ## Yet correctly deal with: sg[ya] ---> [gc]g[cta]
408 if($pat =~ /\[\w*[RYSWMK]\w*\]/) {
409 $pat =~ s/\[(\w*)R(\w*)\]/\[$1$PURINES$2\]/g;
410 $pat =~ s/\[(\w*)Y(\w*)\]/\[$1$PYRIMIDINES$2\]/g;
411 $pat =~ s/\[(\w*)S(\w*)\]/\[$1GC$2\]/g;
412 $pat =~ s/\[(\w*)W(\w*)\]/\[$1AT$2\]/g;
413 $pat =~ s/\[(\w*)M(\w*)\]/\[$1AC$2\]/g;
414 $pat =~ s/\[(\w*)K(\w*)\]/\[$1GT$2\]/g;
415 $pat =~ s/\[(\w*)V(\w*)\]/\[$1ACG$2\]/g;
416 $pat =~ s/\[(\w*)H(\w*)\]/\[$1ACT$2\]/g;
417 $pat =~ s/\[(\w*)D(\w*)\]/\[$1AGT$2\]/g;
418 $pat =~ s/\[(\w*)B(\w*)\]/\[$1CGT$2\]/g;
419 $pat =~ s/R/\[$PURINES\]/g;
420 $pat =~ s/Y/\[$PYRIMIDINES\]/g;
421 $pat =~ s/S/\[GC\]/g;
422 $pat =~ s/W/\[AT\]/g;
423 $pat =~ s/M/\[AC\]/g;
424 $pat =~ s/K/\[GT\]/g;
425 $pat =~ s/V/\[ACG\]/g;
426 $pat =~ s/H/\[ACT\]/g;
427 $pat =~ s/D/\[AGT\]/g;
428 $pat =~ s/B/\[CGT\]/g;
429 } else {
430 $pat =~ s/R/\[$PURINES\]/g;
431 $pat =~ s/Y/\[$PYRIMIDINES\]/g;
432 $pat =~ s/S/\[GC\]/g;
433 $pat =~ s/W/\[AT\]/g;
434 $pat =~ s/M/\[AC\]/g;
435 $pat =~ s/K/\[GT\]/g;
436 $pat =~ s/V/\[ACG\]/g;
437 $pat =~ s/H/\[ACT\]/g;
438 $pat =~ s/D/\[AGT\]/g;
439 $pat =~ s/B/\[CGT\]/g;
440 }
441 $pat =~ s/\((.)\)/$1/g; ## Doing thses last since:
442 $pat =~ s/\[(.)\]/$1/g; ## Pattern could contain [y] (for example)
443
444 return $pat;
445 }
446
447
448
449 =head1 revcom
450
451 Title : revcom
452 Usage : revcom([1]);
453 Purpose : Forms a pattern capable of recognizing the reverse complement
454 : version of a nucleotide sequence pattern.
455 Example : $pattern_object->revcom();
456 : $pattern_object->revcom(1); ## returns expanded rev complement pattern.
457 Returns : Object reference for a new Bio::Tools::SeqPattern containing
458 : the revcom of the current pattern as its sequence.
459 Argument : (1) boolean (optional) (default= false)
460 : true : expand the pattern before rev-complementing.
461 : false: don't expand pattern before or after rev-complementing.
462 Throws : Exception if called for amino acid sequence pattern.
463 Comments : This method permits the simultaneous searching of both
464 : sense and anti-sense versions of a nucleotide pattern
465 : by means of a grep-type of functionality in which any
466 : number of patterns may be or-ed into the recognition
467 : pattern.
468 : Overrides Bio::Seq::revcom() and calls it first thing.
469 : The order of _fixpat() calls is critical.
470
471 See Also : B<Bio::Seq::revcom()>, L<_fixpat_1>(), L<_fixpat_2>(), L<_fixpat_3>(), L<_fixpat_4>(), L<_fixpat_5>()
472
473 =cut
474
475 #-----------'
476 sub revcom {
477 #-----------
478 my($self,$expand) = @_;
479
480 if ($self->type !~ /Dna|Rna/i) {
481 $self->throw("Can't get revcom for ${\$self->type} sequence types.\n");
482 }
483 # return $self->{'_rev'} if defined $self->{'_rev'};
484
485 $expand ||= 0;
486 my $str = $self->str;
487 $str =~ tr/acgtrymkswhbvdnxACGTRYMKSWHBVDNX/tgcayrkmswdvbhnxTGCAYRKMSWDVBHNX/;
488 my $rev = CORE::reverse $str;
489 $rev =~ tr/[](){}<>/][)(}{></;
490
491 if($expand) {
492 $rev = $self->_expand_nuc($rev);
493 # print "\nExpanded: $rev\n";
494 }
495
496 %Processed_braces = ();
497 %Processed_asterics = ();
498
499 my $fixrev = _fixpat_1($rev);
500 # print "FIX 1: $fixrev";<STDIN>;
501
502 $fixrev = _fixpat_2($fixrev);
503 # print "FIX 2: $fixrev";<STDIN>;
504
505 $fixrev = _fixpat_3($fixrev);
506 # print "FIX 3: $fixrev";<STDIN>;
507
508 $fixrev = _fixpat_4($fixrev);
509 # print "FIX 4: $fixrev";<STDIN>;
510
511 $fixrev = _fixpat_5($fixrev);
512 # print "FIX 5: $fixrev";<STDIN>;
513
514 ##### Added by ps 8/7/00 to allow non-greedy matching
515 $fixrev = _fixpat_6($fixrev);
516 # print "FIX 6: $fixrev";<STDIN>;
517
518 # $self->{'_rev'} = $fixrev;
519
520 return new Bio::Tools::SeqPattern(-seq =>$fixrev, -type =>$self->type);
521 }
522
523
524
525 =head1 _fixpat_1
526
527 Title : _fixpat_1
528 Usage : n/a; called automatically by revcom()
529 Purpose : Utility method for revcom()
530 : Converts all {7,5} --> {5,7} (Part I)
531 : and [T^] --> [^T] (Part II)
532 : and *N --> N* (Part III)
533 Returns : String (the new, partially reversed pattern)
534 Argument : String (the expanded pattern)
535 Throws : n/a
536
537 See Also : L<revcom>()
538
539 =cut
540
541 #--------------
542 sub _fixpat_1 {
543 #--------------
544 my $pat = shift;
545
546 ## Part I:
547 my (@done,@parts);
548 while(1) {
549 $pat =~ /(.*)\{(\S+?)\}(.*)/ or do{ push @done, $pat; last; };
550 $pat = $1.'#{'.reverse($2).'}'.$3;
551 # print "1: $1\n2: $2\n3: $3\n";
552 # print "modified pat: $pat";<STDIN>;
553 @parts = split '#', $pat;
554 push @done, $parts[1];
555 $pat = $parts[0];
556 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>;
557 last if not $pat;
558 }
559 $pat = join('', reverse @done);
560
561 ## Part II:
562 @done = ();
563 while(1) {
564 $pat =~ /(.*)\[(\S+?)\](.*)/ or do{ push @done, $pat; last; };
565 $pat = $1.'#['.reverse($2).']'.$3;
566 # print "1: $1\n2: $2\n3: $3\n";
567 # print "modified pat: $pat";<STDIN>;
568 @parts = split '#', $pat;
569 push @done, $parts[1];
570 $pat = $parts[0];
571 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>;
572 last if not $pat;
573 }
574 $pat = join('', reverse @done);
575
576 ## Part III:
577 @done = ();
578 while(1) {
579 $pat =~ /(.*)\*([\w.])(.*)/ or do{ push @done, $pat; last; };
580 $pat = $1.'#'.$2.'*'.$3;
581 $Processed_asterics{$2}++;
582 # print "1: $1\n2: $2\n3: $3\n";
583 # print "modified pat: $pat";<STDIN>;
584 @parts = split '#', $pat;
585 push @done, $parts[1];
586 $pat = $parts[0];
587 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>;
588 last if not $pat;
589 }
590 return join('', reverse @done);
591
592 }
593
594
595 =head1 _fixpat_2
596
597 Title : _fixpat_2
598 Usage : n/a; called automatically by revcom()
599 Purpose : Utility method for revcom()
600 : Converts all {5,7}Y ---> Y{5,7}
601 : and {10,}. ---> .{10,}
602 Returns : String (the new, partially reversed pattern)
603 Argument : String (the expanded, partially reversed pattern)
604 Throws : n/a
605
606 See Also : L<revcom>()
607
608 =cut
609
610 #--------------
611 sub _fixpat_2 {
612 #--------------
613 my $pat = shift;
614
615 local($^W) = 0;
616 my (@done,@parts,$braces);
617 while(1) {
618 # $pat =~ s/(.*)([^])])(\{\S+?\})([\w.])(.*)/$1$2#$4$3$5/ or do{ push @done, $pat; last; };
619 $pat =~ s/(.*)(\{\S+?\})([\w.])(.*)/$1#$3$2$4/ or do{ push @done, $pat; last; };
620 $braces = $2;
621 $braces =~ s/[{}]//g;
622 $Processed_braces{"$3$braces"}++;
623 # print "modified pat: $pat";<STDIN>;
624 @parts = split '#', $pat;
625 push @done, $parts[1];
626 $pat = $parts[0];
627 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>;
628 last if not $pat;
629 }
630 return join('', reverse @done);
631 }
632
633
634 =head1 _fixpat_3
635
636 Title : _fixpat_3
637 Usage : n/a; called automatically by revcom()
638 Purpose : Utility method for revcom()
639 : Converts all {5,7}(XXX) ---> (XXX){5,7}
640 Returns : String (the new, partially reversed pattern)
641 Argument : String (the expanded, partially reversed pattern)
642 Throws : n/a
643
644 See Also : L<revcom>()
645
646 =cut
647
648 #-------------
649 sub _fixpat_3 {
650 #-------------
651 my $pat = shift;
652
653 my (@done,@parts,$braces,$newpat,$oldpat);
654 while(1) {
655 # $pat =~ s/(.+)(\{\S+\})(\(\w+\))(.*)/$1#$3$2$4/ or do{ push @done, $pat; last; };
656 if( $pat =~ /(.*)(.)(\{\S+\})(\(\w+\))(.*)/) {
657 $newpat = "$1#$2$4$3$5";
658 ##ps $oldpat = "$1#$2$3$4$5";
659 # print "1: $1\n2: $2\n3: $3\n4: $4\n5: $5\n";
660 ##ps $braces = $3;
661 ##ps $braces =~ s/[{}]//g;
662 ##ps if( exists $Processed_braces{"$2$braces"} || exists $Processed_asterics{$2}) {
663 ##ps $pat = $oldpat; # Don't change it. Already processed.
664 # print "saved pat: $pat";<STDIN>;
665 ##ps } else {
666 # print "new pat: $newpat";<STDIN>;
667 $pat = $newpat; # Change it.
668 ##ps }
669 } elsif( $pat =~ /^(\{\S+\})(\(\w+\))(.*)/) {
670 $pat = "#$2$1$3";
671 } else {
672 push @done, $pat; last;
673 }
674 @parts = split '#', $pat;
675 push @done, $parts[1];
676 $pat = $parts[0];
677 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>;
678 last if not $pat;
679 }
680 return join('', reverse @done);
681 }
682
683
684 =head1 _fixpat_4
685
686 Title : _fixpat_4
687 Usage : n/a; called automatically by revcom()
688 Purpose : Utility method for revcom()
689 : Converts all {5,7}[XXX] ---> [XXX]{5,7}
690 Returns : String (the new, partially reversed pattern)
691 Argument : String (the expanded, partially reversed pattern)
692 Throws : n/a
693
694 See Also : L<revcom>()
695
696 =cut
697
698 #---------------
699 sub _fixpat_4 {
700 #---------------
701 my $pat = shift;
702
703 my (@done,@parts,$braces,$newpat,$oldpat);
704 while(1) {
705 # $pat =~ s/(.*)(\{\S+\})(\[\w+\])(.*)/$1#$3$2$4/ or do{ push @done, $pat; last; };
706 # $pat =~ s/(.*)([^\w.])(\{\S+\})(\[\w+\])(.*)/$1$2#$4$3$5/ or do{ push @done, $pat; last; };
707 if( $pat =~ /(.*)(.)(\{\S+\})(\[\w+\])(.*)/) {
708 $newpat = "$1#$2$4$3$5";
709 $oldpat = "$1#$2$3$4$5";
710 # print "1: $1\n2: $2\n3: $3\n4: $4\n5: $5\n";
711 $braces = $3;
712 $braces =~ s/[{}]//g;
713 if( (defined $braces and defined $2) and
714 exists $Processed_braces{"$2$braces"} || exists $Processed_asterics{$2}) {
715 $pat = $oldpat; # Don't change it. Already processed.
716 # print "saved pat: $pat";<STDIN>;
717 } else {
718 $pat = $newpat; # Change it.
719 # print "new pat: $pat";<STDIN>;
720 }
721 } elsif( $pat =~ /^(\{\S+\})(\[\w+\])(.*)/) {
722 $pat = "#$2$1$3";
723 } else {
724 push @done, $pat; last;
725 }
726
727 @parts = split '#', $pat;
728 push @done, $parts[1];
729 $pat = $parts[0];
730 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>;
731 last if not $pat;
732 }
733 return join('', reverse @done);
734 }
735
736
737 =head1 _fixpat_5
738
739 Title : _fixpat_5
740 Usage : n/a; called automatically by revcom()
741 Purpose : Utility method for revcom()
742 : Converts all *[XXX] ---> [XXX]*
743 : and *(XXX) ---> (XXX)*
744 Returns : String (the new, partially reversed pattern)
745 Argument : String (the expanded, partially reversed pattern)
746 Throws : n/a
747
748 See Also : L<revcom>()
749
750 =cut
751
752 #--------------
753 sub _fixpat_5 {
754 #--------------
755 my $pat = shift;
756
757 my (@done,@parts,$newpat,$oldpat);
758 while(1) {
759 # $pat =~ s/(.*)(\{\S+\})(\[\w+\])(.*)/$1#$3$2$4/ or do{ push @done, $pat; last; };
760 # $pat =~ s/(.*)([^\w.])(\{\S+\})(\[\w+\])(.*)/$1$2#$4$3$5/ or do{ push @done, $pat; last; };
761 if( $pat =~ /(.*)(.)\*(\[\w+\]|\(\w+\))(.*)/) {
762 $newpat = "$1#$2$3*$4";
763 $oldpat = "$1#$2*$3$4";
764 # print "1: $1\n2: $2\n3: $3\n4: $4\n";
765 if( exists $Processed_asterics{$2}) {
766 $pat = $oldpat; # Don't change it. Already processed.
767 # print "saved pat: $pat";<STDIN>;
768 } else {
769 $pat = $newpat; # Change it.
770 # print "new pat: $pat";<STDIN>;
771 }
772 } elsif( $pat =~ /^\*(\[\w+\]|\(\w+\))(.*)/) {
773 $pat = "#$1*$3";
774 } else {
775 push @done, $pat; last;
776 }
777
778 @parts = split '#', $pat;
779 push @done, $parts[1];
780 $pat = $parts[0];
781 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>;
782 last if not $pat;
783 }
784 return join('', reverse @done);
785 }
786
787
788
789
790
791 ############################
792 #
793 # PS: Added 8/7/00 to allow non-greedy matching patterns
794 #
795 ######################################
796
797 =head1 _fixpat_6
798
799 Title : _fixpat_6
800 Usage : n/a; called automatically by revcom()
801 Purpose : Utility method for revcom()
802 : Converts all ?Y{5,7} ---> Y{5,7}?
803 : and ?(XXX){5,7} ---> (XXX){5,7}?
804 : and ?[XYZ]{5,7} ---> [XYZ]{5,7}?
805 Returns : String (the new, partially reversed pattern)
806 Argument : String (the expanded, partially reversed pattern)
807 Throws : n/a
808
809 See Also : L<revcom>()
810
811 =cut
812
813 #--------------
814 sub _fixpat_6 {
815 #--------------
816 my $pat = shift;
817 my (@done,@parts);
818
819 @done = ();
820 while(1) {
821 $pat =~ /(.*)\?(\[\w+\]|\(\w+\)|\w)(\{\S+?\})?(.*)/ or do{ push @done, $pat; last; };
822 my $quantifier = $3 ? $3 : ""; # Shut up warning if no explicit quantifier
823 $pat = $1.'#'.$2.$quantifier.'?'.$4;
824 # $pat = $1.'#'.$2.$3.'?'.$4;
825
826 # print "1: $1\n2: $2\n3: $3\n";
827 # print "modified pat: $pat";<STDIN>;
828 @parts = split '#', $pat;
829 push @done, $parts[1];
830 $pat = $parts[0];
831 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>;
832 last if not $pat;
833 }
834 return join('', reverse @done);
835
836 }
837
838 =head2 str
839
840 Title : str
841 Usage : $obj->str($newval)
842 Function:
843 Returns : value of str
844 Args : newvalue (optional)
845
846
847 =cut
848
849 sub str{
850 my $obj = shift;
851 if( @_ ) {
852 my $value = shift;
853 $obj->{'str'} = $value;
854 }
855 return $obj->{'str'};
856
857 }
858
859 =head2 type
860
861 Title : type
862 Usage : $obj->type($newval)
863 Function:
864 Returns : value of type
865 Args : newvalue (optional)
866
867
868 =cut
869
870 sub type{
871 my $obj = shift;
872 if( @_ ) {
873 my $value = shift;
874 $obj->{'type'} = $value;
875 }
876 return $obj->{'type'};
877
878 }
879
880 1;
881
882 __END__
883
884 #########################################################################
885 # End of class
886 #########################################################################
887
888 =head1 FOR DEVELOPERS ONLY
889
890 =head2 Data Members
891
892 Information about the various data members of this module is provided
893 for those wishing to modify or understand the code. Two things to bear
894 in mind:
895
896 =over 2
897
898 =item 1 Do NOT rely on these in any code outside of this module.
899
900 All data members are prefixed with an underscore to signify that they
901 are private. Always use accessor methods. If the accessor doesn't
902 exist or is inadequate, create or modify an accessor (and let me know,
903 too!).
904
905 =item 2 This documentation may be incomplete and out of date.
906
907 It is easy for this documentation to become obsolete as this module is
908 still evolving. Always double check this info and search for members
909 not described here.
910
911 =back
912
913 An instance of Bio::Tools::RestrictionEnzyme.pm is a blessed reference
914 to a hash containing all or some of the following fields:
915
916 FIELD VALUE
917 ------------------------------------------------------------------------
918 _rev : The corrected reverse complement of the fully expanded pattern.
919
920 INHERITED DATA MEMBERS:
921
922 _seq : (From Bio::Seq.pm) The original, unexpanded input sequence after untainting.
923 _type : (From Bio::Seq.pm) 'Dna' or 'Amino'
924
925
926 =cut