Mercurial > repos > mahtabm > ensemb_rep_gvl
comparison variant_effect_predictor/Bio/Tools/SeqPattern.pm @ 0:2bc9b66ada89 draft default tip
Uploaded
author | mahtabm |
---|---|
date | Thu, 11 Apr 2013 06:29:17 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:2bc9b66ada89 |
---|---|
1 # $Id: SeqPattern.pm,v 1.14 2002/10/22 07:38:46 lapp Exp $ | |
2 # | |
3 # bioperl module for Bio::Tools::SeqPattern | |
4 # | |
5 # Cared for by Steve Chervitz (sac@bioperl.org) | |
6 # | |
7 # Copyright Steve Chervitz | |
8 # | |
9 # You may distribute this module under the same terms as perl itself | |
10 | |
11 # POD documentation - main docs before the code | |
12 | |
13 =head1 NAME | |
14 | |
15 Bio::Tools::SeqPattern - Bioperl object for a sequence pattern or motif | |
16 | |
17 =head1 SYNOPSIS | |
18 | |
19 =head2 Object Creation | |
20 | |
21 use Bio::Tools::SeqPattern (); | |
22 | |
23 $pat1 = 'T[GA]AA...TAAT'; | |
24 $pattern1 = new Bio::Tools::SeqPattern(-SEQ =>$pattern, -TYPE =>'Dna'); | |
25 | |
26 $pat2 = '[VILM]R(GXX){3,2}...[^PG]'; | |
27 $pattern2 = new Bio::Tools::SeqPattern(-SEQ =>$pattern, -TYPE =>'Amino'); | |
28 | |
29 =head1 DESCRIPTION | |
30 | |
31 The Bio::Tools::SeqPattern.pm module encapsulates generic data and | |
32 methods for manipulating regular expressions describing nucleic or | |
33 amino acid sequence patterns (a.k.a, "motifs"). | |
34 | |
35 Bio::Tools::SeqPattern.pm is a concrete class that inherits from | |
36 B<Bio::Seq.pm>. | |
37 | |
38 This class grew out of a need to have a standard module for doing routine | |
39 tasks with sequence patterns such as: | |
40 | |
41 -- Forming a reverse-complement version of a nucleotide sequence pattern | |
42 -- Expanding patterns containing ambiguity codes | |
43 -- Checking for invalid regexp characters | |
44 -- Untainting yet preserving special characters in the pattern | |
45 | |
46 Other features to look for in the future: | |
47 | |
48 -- Full pattern syntax checking | |
49 -- Conversion between expanded and ondensed forms of the pattern | |
50 | |
51 =head1 MOTIVATIONS | |
52 | |
53 A key motivation for Bio::Tools::SeqPattern.pm is to have a way to | |
54 generate a reverse complement of a nucleotide sequence pattern. | |
55 This makes possible simultaneous pattern matching on both sense and | |
56 anti-sense strands of a query sequence. | |
57 | |
58 In principle, one could do such a search more inefficiently by testing | |
59 against both sense and anti-sense versions of a sequence. | |
60 It is entirely equivalent to test a regexp containing both sense and | |
61 anti-sense versions of the *pattern* against one copy of the sequence. | |
62 The latter approach is much more efficient since: | |
63 | |
64 1) You need only one copy of the sequence. | |
65 2) Only one regexp is executed. | |
66 3) Regexp patterns are typically much smaller than sequences. | |
67 | |
68 Patterns can be quite complex and it is often difficult to | |
69 generate the reverse complement pattern. The Bioperl SeqPattern.pm | |
70 addresses this problem, providing a convenient set of tools | |
71 for working with biological sequence regular expressions. | |
72 | |
73 Not all patterns have been tested. If you discover a pattern that | |
74 is not handled properly by Bio::Tools::SeqPattern.pm, please | |
75 send me some email (sac@bioperl.org). Thanks. | |
76 | |
77 =head1 OTHER FEATURES | |
78 | |
79 =head2 Extended Alphabet Support | |
80 | |
81 This module supports the same set of ambiguity codes for nucleotide | |
82 sequences as supported by B<Bio::Seq.pm>. These ambiguity codes | |
83 define the behavior or the expand() method. | |
84 | |
85 ------------------------------------------ | |
86 Symbol Meaning Nucleic Acid | |
87 ------------------------------------------ | |
88 A A Adenine | |
89 C C Cytosine | |
90 G G Guanine | |
91 T T Thymine | |
92 U U Uracil | |
93 M A or C | |
94 R A or G Any purine | |
95 W A or T | |
96 S C or G | |
97 Y C or T Any pyrimidine | |
98 K G or T | |
99 V A or C or G | |
100 H A or C or T | |
101 D A or G or T | |
102 B C or G or T | |
103 X G or A or T or C | |
104 N G or A or T or C | |
105 . G or A or T or C | |
106 | |
107 | |
108 | |
109 ------------------------------------------ | |
110 Symbol Meaning | |
111 ------------------------------------------ | |
112 A Alanine | |
113 C Cysteine | |
114 D Aspartic Acid | |
115 E Glutamic Acid | |
116 F Phenylalanine | |
117 G Glycine | |
118 H Histidine | |
119 I Isoleucine | |
120 K Lysine | |
121 L Leucine | |
122 M Methionine | |
123 N Asparagine | |
124 P Proline | |
125 Q Glutamine | |
126 R Arginine | |
127 S Serine | |
128 T Threonine | |
129 V Valine | |
130 W Tryptophan | |
131 Y Tyrosine | |
132 | |
133 B Aspartic Acid, Asparagine | |
134 Z Glutamic Acid, Glutamine | |
135 X Any amino acid | |
136 . Any amino acid | |
137 | |
138 | |
139 =head2 Multiple Format Support | |
140 | |
141 Ultimately, this module should be able to build SeqPattern.pm objects | |
142 using a variety of pattern formats such as ProSite, Blocks, Prints, GCG, etc. | |
143 Currently, this module only supports patterns using a grep-like syntax. | |
144 | |
145 =head1 USAGE | |
146 | |
147 A simple demo script called seq_pattern.pl is included in the examples/ | |
148 directory of the central Bioperl distribution. | |
149 | |
150 =head1 SEE ALSO | |
151 | |
152 L<Bio::Root::Object> - Base class. | |
153 L<Bio::Seq> - Lightweight sequence object. | |
154 | |
155 http://bio.perl.org/Projects/modules.html - Online module documentation | |
156 http://bio.perl.org/ - Bioperl Project Homepage | |
157 | |
158 =head1 FEEDBACK | |
159 | |
160 =head2 Mailing Lists | |
161 | |
162 User feedback is an integral part of the evolution of this and other | |
163 Bioperl modules. Send your comments and suggestions preferably to one | |
164 of the Bioperl mailing lists. Your participation is much appreciated. | |
165 | |
166 bioperl-l@bioperl.org - General discussion | |
167 http://bio.perl.org/MailList.html - About the mailing lists | |
168 | |
169 =head2 Reporting Bugs | |
170 | |
171 Report bugs to the Bioperl bug tracking system to help us keep track | |
172 the bugs and their resolution. Bug reports can be submitted via email | |
173 or the web: | |
174 | |
175 bioperl-bugs@bio.perl.org | |
176 http://bugzilla.bioperl.org/ | |
177 | |
178 =head1 AUTHOR | |
179 | |
180 Steve Chervitz, sac@bioperl.org | |
181 | |
182 =head1 VERSION | |
183 | |
184 Bio::Tools::SeqPattern.pm, 0.011 | |
185 | |
186 =head1 COPYRIGHT | |
187 | |
188 Copyright (c) 1997-8 Steve Chervitz. All Rights Reserved. | |
189 This module is free software; you can redistribute it and/or | |
190 modify it under the same terms as Perl itself. | |
191 | |
192 =cut | |
193 | |
194 # | |
195 ## | |
196 ### | |
197 #### END of main POD documentation. | |
198 ### | |
199 ## | |
200 #' | |
201 # CREATED : 28 Aug 1997 | |
202 | |
203 | |
204 package Bio::Tools::SeqPattern; | |
205 | |
206 use Bio::Root::Root; | |
207 @ISA = qw(Bio::Root::Root); | |
208 use strict; | |
209 use vars qw ($ID $VERSION); | |
210 $ID = 'Bio::Tools::SeqPattern'; | |
211 $VERSION = 0.011; | |
212 | |
213 ## These constants may be more appropriate in a Bio::Dictionary.pm | |
214 ## type of class. | |
215 my $PURINES = 'AG'; | |
216 my $PYRIMIDINES = 'CT'; | |
217 my $BEE = 'DN'; | |
218 my $ZED = 'EQ'; | |
219 my $Regexp_chars = '\w,.\*()\[\]<>\{\}^\$'; # quoted for use in regexps | |
220 | |
221 ## Package variables used in reverse complementing. | |
222 my (%Processed_braces, %Processed_asterics); | |
223 | |
224 ##################################################################################### | |
225 ## CONSTRUCTOR ## | |
226 ##################################################################################### | |
227 | |
228 | |
229 =head1 new | |
230 | |
231 Title : new | |
232 Usage : my $seqpat = new Bio::Tools::SeqPattern(); | |
233 Purpose : Verifies that the type is correct for superclass (Bio::Seq.pm) | |
234 : and calls superclass constructor last. | |
235 Returns : n/a | |
236 Argument : Parameters passed to new() | |
237 Throws : Exception if the pattern string (seq) is empty. | |
238 Comments : The process of creating a new SeqPattern.pm object | |
239 : ensures that the pattern string is untained. | |
240 | |
241 See Also : B<Bio::Root::Root::new()>, | |
242 B<Bio::Seq::_initialize()> | |
243 | |
244 =cut | |
245 | |
246 #---------------- | |
247 sub new { | |
248 #---------------- | |
249 my($class, %param) = @_; | |
250 | |
251 my $self = $class->SUPER::new(%param); | |
252 my ($seq,$type) = $self->_rearrange([qw(SEQ TYPE)], %param); | |
253 | |
254 $seq || $self->throw("Empty pattern."); | |
255 my $t; | |
256 # Get the type ready for Bio::Seq.pm | |
257 if ($type =~ /nuc|[dr]na/i) { | |
258 $t = 'Dna'; | |
259 } elsif ($type =~ /amino|pep|prot/i) { | |
260 $t = 'Amino'; | |
261 } | |
262 $seq =~ tr/a-z/A-Z/; #ps 8/8/00 Canonicalize to upper case | |
263 $self->str($seq); | |
264 $self->type($t); | |
265 | |
266 return $self; | |
267 } | |
268 | |
269 | |
270 =head1 alphabet_ok | |
271 | |
272 Title : alphabet_ok | |
273 Usage : $mypat->alphabet_ok; | |
274 Purpose : Checks for invalid regexp characters. | |
275 : Overrides Bio::Seq::alphabet_ok() to allow | |
276 : additional regexp characters ,.*()[]<>{}^$ | |
277 : in addition to the standard genetic alphabet. | |
278 : Also untaints the pattern and sets the sequence | |
279 : object's sequence to the untained string. | |
280 Returns : Boolean (1 | 0) | |
281 Argument : n/a | |
282 Throws : Exception if the pattern contains invalid characters. | |
283 Comments : Does not call the superclass method. | |
284 : Actually permits any alphanumeric, not just the | |
285 : standard genetic alphabet. | |
286 | |
287 =cut | |
288 | |
289 #----------------' | |
290 sub alphabet_ok { | |
291 #---------------- | |
292 my( $self) = @_; | |
293 | |
294 return 1 if $self->{'_alphabet_checked'}; | |
295 | |
296 $self->{'_alphabet_checked'} = 1; | |
297 | |
298 my $pat = $self->seq(); | |
299 | |
300 if($pat =~ /[^$Regexp_chars]/io) { | |
301 $self->throw("Pattern contains invalid characters: $pat", | |
302 'Legal characters: a-z,A-Z,0-9,,.*()[]<>{}^$ '); | |
303 } | |
304 | |
305 # Untaint pattern (makes code taint-safe). | |
306 $pat =~ /[$Regexp_chars]+/io; | |
307 $self->setseq(uc($&)); | |
308 # print STDERR "\npattern ok: $pat\n"; | |
309 1; | |
310 } | |
311 | |
312 =head1 expand | |
313 | |
314 Title : expand | |
315 Usage : $seqpat_object->expand(); | |
316 Purpose : Expands the sequence pattern using special ambiguity codes. | |
317 Example : $pat = $seq_pat->expand(); | |
318 Returns : String containing fully expanded sequence pattern | |
319 Argument : n/a | |
320 Throws : Exception if sequence type is not recognized | |
321 : (i.e., is not one of [DR]NA, Amino) | |
322 | |
323 See Also : B<Extended Alphabet Support>, L<_expand_pep>(), L<_expand_nuc>() | |
324 | |
325 =cut | |
326 | |
327 #---------- | |
328 sub expand { | |
329 #---------- | |
330 my $self = shift; | |
331 | |
332 if($self->type =~ /[DR]na/i) { $self->_expand_nuc(); } | |
333 elsif($self->type =~ /Amino/i) { $self->_expand_pep(); } | |
334 else{ | |
335 $self->throw("Don't know how to expand ${\$self->type} patterns.\n"); | |
336 } | |
337 } | |
338 | |
339 | |
340 =head1 _expand_pep | |
341 | |
342 Title : _expand_pep | |
343 Usage : n/a; automatically called by expand() | |
344 Purpose : Expands peptide patterns | |
345 Returns : String (the expanded pattern) | |
346 Argument : String (the unexpanded pattern) | |
347 Throws : n/a | |
348 | |
349 See Also : L<expand>(), L<_expand_nuc>() | |
350 | |
351 =cut | |
352 | |
353 #---------------- | |
354 sub _expand_pep { | |
355 #---------------- | |
356 my ($self,$pat) = @_; | |
357 $pat ||= $self->str; | |
358 $pat =~ s/X/./g; | |
359 $pat =~ s/^</\^/; | |
360 $pat =~ s/>$/\$/; | |
361 | |
362 ## Avoid nested situations: [bmnq] --/--> [[$ZED]mnq] | |
363 ## Yet correctly deal with: fze[bmnq] ---> f[$BEE]e[$ZEDmnq] | |
364 if($pat =~ /\[\w*[BZ]\w*\]/) { | |
365 $pat =~ s/\[(\w*)B(\w*)\]/\[$1$ZED$2\]/g; | |
366 $pat =~ s/\[(\w*)Z(\w*)\]/\[$1$BEE$2\]/g; | |
367 $pat =~ s/B/\[$ZED\]/g; | |
368 $pat =~ s/Z/\[$BEE\]/g; | |
369 } else { | |
370 $pat =~ s/B/\[$ZED\]/g; | |
371 $pat =~ s/Z/\[$BEE\]/g; | |
372 } | |
373 $pat =~ s/\((.)\)/$1/g; ## Doing these last since: | |
374 $pat =~ s/\[(.)\]/$1/g; ## Pattern could contain [B] (for example) | |
375 | |
376 return $pat; | |
377 } | |
378 | |
379 | |
380 | |
381 =head1 _expand_nuc | |
382 | |
383 Title : _expand_nuc | |
384 Purpose : Expands nucleotide patterns | |
385 Returns : String (the expanded pattern) | |
386 Argument : String (the unexpanded pattern) | |
387 Throws : n/a | |
388 | |
389 See Also : L<expand>(), L<_expand_pep>() | |
390 | |
391 =cut | |
392 | |
393 #--------------- | |
394 sub _expand_nuc { | |
395 #--------------- | |
396 my ($self,$pat) = @_; | |
397 | |
398 $pat ||= $self->str; | |
399 $pat =~ s/N|X/./g; | |
400 $pat =~ s/pu/R/ig; | |
401 $pat =~ s/py/Y/ig; | |
402 $pat =~ s/U/T/g; | |
403 $pat =~ s/^</\^/; | |
404 $pat =~ s/>$/\$/; | |
405 | |
406 ## Avoid nested situations: [ya] --/--> [[ct]a] | |
407 ## Yet correctly deal with: sg[ya] ---> [gc]g[cta] | |
408 if($pat =~ /\[\w*[RYSWMK]\w*\]/) { | |
409 $pat =~ s/\[(\w*)R(\w*)\]/\[$1$PURINES$2\]/g; | |
410 $pat =~ s/\[(\w*)Y(\w*)\]/\[$1$PYRIMIDINES$2\]/g; | |
411 $pat =~ s/\[(\w*)S(\w*)\]/\[$1GC$2\]/g; | |
412 $pat =~ s/\[(\w*)W(\w*)\]/\[$1AT$2\]/g; | |
413 $pat =~ s/\[(\w*)M(\w*)\]/\[$1AC$2\]/g; | |
414 $pat =~ s/\[(\w*)K(\w*)\]/\[$1GT$2\]/g; | |
415 $pat =~ s/\[(\w*)V(\w*)\]/\[$1ACG$2\]/g; | |
416 $pat =~ s/\[(\w*)H(\w*)\]/\[$1ACT$2\]/g; | |
417 $pat =~ s/\[(\w*)D(\w*)\]/\[$1AGT$2\]/g; | |
418 $pat =~ s/\[(\w*)B(\w*)\]/\[$1CGT$2\]/g; | |
419 $pat =~ s/R/\[$PURINES\]/g; | |
420 $pat =~ s/Y/\[$PYRIMIDINES\]/g; | |
421 $pat =~ s/S/\[GC\]/g; | |
422 $pat =~ s/W/\[AT\]/g; | |
423 $pat =~ s/M/\[AC\]/g; | |
424 $pat =~ s/K/\[GT\]/g; | |
425 $pat =~ s/V/\[ACG\]/g; | |
426 $pat =~ s/H/\[ACT\]/g; | |
427 $pat =~ s/D/\[AGT\]/g; | |
428 $pat =~ s/B/\[CGT\]/g; | |
429 } else { | |
430 $pat =~ s/R/\[$PURINES\]/g; | |
431 $pat =~ s/Y/\[$PYRIMIDINES\]/g; | |
432 $pat =~ s/S/\[GC\]/g; | |
433 $pat =~ s/W/\[AT\]/g; | |
434 $pat =~ s/M/\[AC\]/g; | |
435 $pat =~ s/K/\[GT\]/g; | |
436 $pat =~ s/V/\[ACG\]/g; | |
437 $pat =~ s/H/\[ACT\]/g; | |
438 $pat =~ s/D/\[AGT\]/g; | |
439 $pat =~ s/B/\[CGT\]/g; | |
440 } | |
441 $pat =~ s/\((.)\)/$1/g; ## Doing thses last since: | |
442 $pat =~ s/\[(.)\]/$1/g; ## Pattern could contain [y] (for example) | |
443 | |
444 return $pat; | |
445 } | |
446 | |
447 | |
448 | |
449 =head1 revcom | |
450 | |
451 Title : revcom | |
452 Usage : revcom([1]); | |
453 Purpose : Forms a pattern capable of recognizing the reverse complement | |
454 : version of a nucleotide sequence pattern. | |
455 Example : $pattern_object->revcom(); | |
456 : $pattern_object->revcom(1); ## returns expanded rev complement pattern. | |
457 Returns : Object reference for a new Bio::Tools::SeqPattern containing | |
458 : the revcom of the current pattern as its sequence. | |
459 Argument : (1) boolean (optional) (default= false) | |
460 : true : expand the pattern before rev-complementing. | |
461 : false: don't expand pattern before or after rev-complementing. | |
462 Throws : Exception if called for amino acid sequence pattern. | |
463 Comments : This method permits the simultaneous searching of both | |
464 : sense and anti-sense versions of a nucleotide pattern | |
465 : by means of a grep-type of functionality in which any | |
466 : number of patterns may be or-ed into the recognition | |
467 : pattern. | |
468 : Overrides Bio::Seq::revcom() and calls it first thing. | |
469 : The order of _fixpat() calls is critical. | |
470 | |
471 See Also : B<Bio::Seq::revcom()>, L<_fixpat_1>(), L<_fixpat_2>(), L<_fixpat_3>(), L<_fixpat_4>(), L<_fixpat_5>() | |
472 | |
473 =cut | |
474 | |
475 #-----------' | |
476 sub revcom { | |
477 #----------- | |
478 my($self,$expand) = @_; | |
479 | |
480 if ($self->type !~ /Dna|Rna/i) { | |
481 $self->throw("Can't get revcom for ${\$self->type} sequence types.\n"); | |
482 } | |
483 # return $self->{'_rev'} if defined $self->{'_rev'}; | |
484 | |
485 $expand ||= 0; | |
486 my $str = $self->str; | |
487 $str =~ tr/acgtrymkswhbvdnxACGTRYMKSWHBVDNX/tgcayrkmswdvbhnxTGCAYRKMSWDVBHNX/; | |
488 my $rev = CORE::reverse $str; | |
489 $rev =~ tr/[](){}<>/][)(}{></; | |
490 | |
491 if($expand) { | |
492 $rev = $self->_expand_nuc($rev); | |
493 # print "\nExpanded: $rev\n"; | |
494 } | |
495 | |
496 %Processed_braces = (); | |
497 %Processed_asterics = (); | |
498 | |
499 my $fixrev = _fixpat_1($rev); | |
500 # print "FIX 1: $fixrev";<STDIN>; | |
501 | |
502 $fixrev = _fixpat_2($fixrev); | |
503 # print "FIX 2: $fixrev";<STDIN>; | |
504 | |
505 $fixrev = _fixpat_3($fixrev); | |
506 # print "FIX 3: $fixrev";<STDIN>; | |
507 | |
508 $fixrev = _fixpat_4($fixrev); | |
509 # print "FIX 4: $fixrev";<STDIN>; | |
510 | |
511 $fixrev = _fixpat_5($fixrev); | |
512 # print "FIX 5: $fixrev";<STDIN>; | |
513 | |
514 ##### Added by ps 8/7/00 to allow non-greedy matching | |
515 $fixrev = _fixpat_6($fixrev); | |
516 # print "FIX 6: $fixrev";<STDIN>; | |
517 | |
518 # $self->{'_rev'} = $fixrev; | |
519 | |
520 return new Bio::Tools::SeqPattern(-seq =>$fixrev, -type =>$self->type); | |
521 } | |
522 | |
523 | |
524 | |
525 =head1 _fixpat_1 | |
526 | |
527 Title : _fixpat_1 | |
528 Usage : n/a; called automatically by revcom() | |
529 Purpose : Utility method for revcom() | |
530 : Converts all {7,5} --> {5,7} (Part I) | |
531 : and [T^] --> [^T] (Part II) | |
532 : and *N --> N* (Part III) | |
533 Returns : String (the new, partially reversed pattern) | |
534 Argument : String (the expanded pattern) | |
535 Throws : n/a | |
536 | |
537 See Also : L<revcom>() | |
538 | |
539 =cut | |
540 | |
541 #-------------- | |
542 sub _fixpat_1 { | |
543 #-------------- | |
544 my $pat = shift; | |
545 | |
546 ## Part I: | |
547 my (@done,@parts); | |
548 while(1) { | |
549 $pat =~ /(.*)\{(\S+?)\}(.*)/ or do{ push @done, $pat; last; }; | |
550 $pat = $1.'#{'.reverse($2).'}'.$3; | |
551 # print "1: $1\n2: $2\n3: $3\n"; | |
552 # print "modified pat: $pat";<STDIN>; | |
553 @parts = split '#', $pat; | |
554 push @done, $parts[1]; | |
555 $pat = $parts[0]; | |
556 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; | |
557 last if not $pat; | |
558 } | |
559 $pat = join('', reverse @done); | |
560 | |
561 ## Part II: | |
562 @done = (); | |
563 while(1) { | |
564 $pat =~ /(.*)\[(\S+?)\](.*)/ or do{ push @done, $pat; last; }; | |
565 $pat = $1.'#['.reverse($2).']'.$3; | |
566 # print "1: $1\n2: $2\n3: $3\n"; | |
567 # print "modified pat: $pat";<STDIN>; | |
568 @parts = split '#', $pat; | |
569 push @done, $parts[1]; | |
570 $pat = $parts[0]; | |
571 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; | |
572 last if not $pat; | |
573 } | |
574 $pat = join('', reverse @done); | |
575 | |
576 ## Part III: | |
577 @done = (); | |
578 while(1) { | |
579 $pat =~ /(.*)\*([\w.])(.*)/ or do{ push @done, $pat; last; }; | |
580 $pat = $1.'#'.$2.'*'.$3; | |
581 $Processed_asterics{$2}++; | |
582 # print "1: $1\n2: $2\n3: $3\n"; | |
583 # print "modified pat: $pat";<STDIN>; | |
584 @parts = split '#', $pat; | |
585 push @done, $parts[1]; | |
586 $pat = $parts[0]; | |
587 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; | |
588 last if not $pat; | |
589 } | |
590 return join('', reverse @done); | |
591 | |
592 } | |
593 | |
594 | |
595 =head1 _fixpat_2 | |
596 | |
597 Title : _fixpat_2 | |
598 Usage : n/a; called automatically by revcom() | |
599 Purpose : Utility method for revcom() | |
600 : Converts all {5,7}Y ---> Y{5,7} | |
601 : and {10,}. ---> .{10,} | |
602 Returns : String (the new, partially reversed pattern) | |
603 Argument : String (the expanded, partially reversed pattern) | |
604 Throws : n/a | |
605 | |
606 See Also : L<revcom>() | |
607 | |
608 =cut | |
609 | |
610 #-------------- | |
611 sub _fixpat_2 { | |
612 #-------------- | |
613 my $pat = shift; | |
614 | |
615 local($^W) = 0; | |
616 my (@done,@parts,$braces); | |
617 while(1) { | |
618 # $pat =~ s/(.*)([^])])(\{\S+?\})([\w.])(.*)/$1$2#$4$3$5/ or do{ push @done, $pat; last; }; | |
619 $pat =~ s/(.*)(\{\S+?\})([\w.])(.*)/$1#$3$2$4/ or do{ push @done, $pat; last; }; | |
620 $braces = $2; | |
621 $braces =~ s/[{}]//g; | |
622 $Processed_braces{"$3$braces"}++; | |
623 # print "modified pat: $pat";<STDIN>; | |
624 @parts = split '#', $pat; | |
625 push @done, $parts[1]; | |
626 $pat = $parts[0]; | |
627 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; | |
628 last if not $pat; | |
629 } | |
630 return join('', reverse @done); | |
631 } | |
632 | |
633 | |
634 =head1 _fixpat_3 | |
635 | |
636 Title : _fixpat_3 | |
637 Usage : n/a; called automatically by revcom() | |
638 Purpose : Utility method for revcom() | |
639 : Converts all {5,7}(XXX) ---> (XXX){5,7} | |
640 Returns : String (the new, partially reversed pattern) | |
641 Argument : String (the expanded, partially reversed pattern) | |
642 Throws : n/a | |
643 | |
644 See Also : L<revcom>() | |
645 | |
646 =cut | |
647 | |
648 #------------- | |
649 sub _fixpat_3 { | |
650 #------------- | |
651 my $pat = shift; | |
652 | |
653 my (@done,@parts,$braces,$newpat,$oldpat); | |
654 while(1) { | |
655 # $pat =~ s/(.+)(\{\S+\})(\(\w+\))(.*)/$1#$3$2$4/ or do{ push @done, $pat; last; }; | |
656 if( $pat =~ /(.*)(.)(\{\S+\})(\(\w+\))(.*)/) { | |
657 $newpat = "$1#$2$4$3$5"; | |
658 ##ps $oldpat = "$1#$2$3$4$5"; | |
659 # print "1: $1\n2: $2\n3: $3\n4: $4\n5: $5\n"; | |
660 ##ps $braces = $3; | |
661 ##ps $braces =~ s/[{}]//g; | |
662 ##ps if( exists $Processed_braces{"$2$braces"} || exists $Processed_asterics{$2}) { | |
663 ##ps $pat = $oldpat; # Don't change it. Already processed. | |
664 # print "saved pat: $pat";<STDIN>; | |
665 ##ps } else { | |
666 # print "new pat: $newpat";<STDIN>; | |
667 $pat = $newpat; # Change it. | |
668 ##ps } | |
669 } elsif( $pat =~ /^(\{\S+\})(\(\w+\))(.*)/) { | |
670 $pat = "#$2$1$3"; | |
671 } else { | |
672 push @done, $pat; last; | |
673 } | |
674 @parts = split '#', $pat; | |
675 push @done, $parts[1]; | |
676 $pat = $parts[0]; | |
677 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; | |
678 last if not $pat; | |
679 } | |
680 return join('', reverse @done); | |
681 } | |
682 | |
683 | |
684 =head1 _fixpat_4 | |
685 | |
686 Title : _fixpat_4 | |
687 Usage : n/a; called automatically by revcom() | |
688 Purpose : Utility method for revcom() | |
689 : Converts all {5,7}[XXX] ---> [XXX]{5,7} | |
690 Returns : String (the new, partially reversed pattern) | |
691 Argument : String (the expanded, partially reversed pattern) | |
692 Throws : n/a | |
693 | |
694 See Also : L<revcom>() | |
695 | |
696 =cut | |
697 | |
698 #--------------- | |
699 sub _fixpat_4 { | |
700 #--------------- | |
701 my $pat = shift; | |
702 | |
703 my (@done,@parts,$braces,$newpat,$oldpat); | |
704 while(1) { | |
705 # $pat =~ s/(.*)(\{\S+\})(\[\w+\])(.*)/$1#$3$2$4/ or do{ push @done, $pat; last; }; | |
706 # $pat =~ s/(.*)([^\w.])(\{\S+\})(\[\w+\])(.*)/$1$2#$4$3$5/ or do{ push @done, $pat; last; }; | |
707 if( $pat =~ /(.*)(.)(\{\S+\})(\[\w+\])(.*)/) { | |
708 $newpat = "$1#$2$4$3$5"; | |
709 $oldpat = "$1#$2$3$4$5"; | |
710 # print "1: $1\n2: $2\n3: $3\n4: $4\n5: $5\n"; | |
711 $braces = $3; | |
712 $braces =~ s/[{}]//g; | |
713 if( (defined $braces and defined $2) and | |
714 exists $Processed_braces{"$2$braces"} || exists $Processed_asterics{$2}) { | |
715 $pat = $oldpat; # Don't change it. Already processed. | |
716 # print "saved pat: $pat";<STDIN>; | |
717 } else { | |
718 $pat = $newpat; # Change it. | |
719 # print "new pat: $pat";<STDIN>; | |
720 } | |
721 } elsif( $pat =~ /^(\{\S+\})(\[\w+\])(.*)/) { | |
722 $pat = "#$2$1$3"; | |
723 } else { | |
724 push @done, $pat; last; | |
725 } | |
726 | |
727 @parts = split '#', $pat; | |
728 push @done, $parts[1]; | |
729 $pat = $parts[0]; | |
730 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; | |
731 last if not $pat; | |
732 } | |
733 return join('', reverse @done); | |
734 } | |
735 | |
736 | |
737 =head1 _fixpat_5 | |
738 | |
739 Title : _fixpat_5 | |
740 Usage : n/a; called automatically by revcom() | |
741 Purpose : Utility method for revcom() | |
742 : Converts all *[XXX] ---> [XXX]* | |
743 : and *(XXX) ---> (XXX)* | |
744 Returns : String (the new, partially reversed pattern) | |
745 Argument : String (the expanded, partially reversed pattern) | |
746 Throws : n/a | |
747 | |
748 See Also : L<revcom>() | |
749 | |
750 =cut | |
751 | |
752 #-------------- | |
753 sub _fixpat_5 { | |
754 #-------------- | |
755 my $pat = shift; | |
756 | |
757 my (@done,@parts,$newpat,$oldpat); | |
758 while(1) { | |
759 # $pat =~ s/(.*)(\{\S+\})(\[\w+\])(.*)/$1#$3$2$4/ or do{ push @done, $pat; last; }; | |
760 # $pat =~ s/(.*)([^\w.])(\{\S+\})(\[\w+\])(.*)/$1$2#$4$3$5/ or do{ push @done, $pat; last; }; | |
761 if( $pat =~ /(.*)(.)\*(\[\w+\]|\(\w+\))(.*)/) { | |
762 $newpat = "$1#$2$3*$4"; | |
763 $oldpat = "$1#$2*$3$4"; | |
764 # print "1: $1\n2: $2\n3: $3\n4: $4\n"; | |
765 if( exists $Processed_asterics{$2}) { | |
766 $pat = $oldpat; # Don't change it. Already processed. | |
767 # print "saved pat: $pat";<STDIN>; | |
768 } else { | |
769 $pat = $newpat; # Change it. | |
770 # print "new pat: $pat";<STDIN>; | |
771 } | |
772 } elsif( $pat =~ /^\*(\[\w+\]|\(\w+\))(.*)/) { | |
773 $pat = "#$1*$3"; | |
774 } else { | |
775 push @done, $pat; last; | |
776 } | |
777 | |
778 @parts = split '#', $pat; | |
779 push @done, $parts[1]; | |
780 $pat = $parts[0]; | |
781 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; | |
782 last if not $pat; | |
783 } | |
784 return join('', reverse @done); | |
785 } | |
786 | |
787 | |
788 | |
789 | |
790 | |
791 ############################ | |
792 # | |
793 # PS: Added 8/7/00 to allow non-greedy matching patterns | |
794 # | |
795 ###################################### | |
796 | |
797 =head1 _fixpat_6 | |
798 | |
799 Title : _fixpat_6 | |
800 Usage : n/a; called automatically by revcom() | |
801 Purpose : Utility method for revcom() | |
802 : Converts all ?Y{5,7} ---> Y{5,7}? | |
803 : and ?(XXX){5,7} ---> (XXX){5,7}? | |
804 : and ?[XYZ]{5,7} ---> [XYZ]{5,7}? | |
805 Returns : String (the new, partially reversed pattern) | |
806 Argument : String (the expanded, partially reversed pattern) | |
807 Throws : n/a | |
808 | |
809 See Also : L<revcom>() | |
810 | |
811 =cut | |
812 | |
813 #-------------- | |
814 sub _fixpat_6 { | |
815 #-------------- | |
816 my $pat = shift; | |
817 my (@done,@parts); | |
818 | |
819 @done = (); | |
820 while(1) { | |
821 $pat =~ /(.*)\?(\[\w+\]|\(\w+\)|\w)(\{\S+?\})?(.*)/ or do{ push @done, $pat; last; }; | |
822 my $quantifier = $3 ? $3 : ""; # Shut up warning if no explicit quantifier | |
823 $pat = $1.'#'.$2.$quantifier.'?'.$4; | |
824 # $pat = $1.'#'.$2.$3.'?'.$4; | |
825 | |
826 # print "1: $1\n2: $2\n3: $3\n"; | |
827 # print "modified pat: $pat";<STDIN>; | |
828 @parts = split '#', $pat; | |
829 push @done, $parts[1]; | |
830 $pat = $parts[0]; | |
831 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; | |
832 last if not $pat; | |
833 } | |
834 return join('', reverse @done); | |
835 | |
836 } | |
837 | |
838 =head2 str | |
839 | |
840 Title : str | |
841 Usage : $obj->str($newval) | |
842 Function: | |
843 Returns : value of str | |
844 Args : newvalue (optional) | |
845 | |
846 | |
847 =cut | |
848 | |
849 sub str{ | |
850 my $obj = shift; | |
851 if( @_ ) { | |
852 my $value = shift; | |
853 $obj->{'str'} = $value; | |
854 } | |
855 return $obj->{'str'}; | |
856 | |
857 } | |
858 | |
859 =head2 type | |
860 | |
861 Title : type | |
862 Usage : $obj->type($newval) | |
863 Function: | |
864 Returns : value of type | |
865 Args : newvalue (optional) | |
866 | |
867 | |
868 =cut | |
869 | |
870 sub type{ | |
871 my $obj = shift; | |
872 if( @_ ) { | |
873 my $value = shift; | |
874 $obj->{'type'} = $value; | |
875 } | |
876 return $obj->{'type'}; | |
877 | |
878 } | |
879 | |
880 1; | |
881 | |
882 __END__ | |
883 | |
884 ######################################################################### | |
885 # End of class | |
886 ######################################################################### | |
887 | |
888 =head1 FOR DEVELOPERS ONLY | |
889 | |
890 =head2 Data Members | |
891 | |
892 Information about the various data members of this module is provided | |
893 for those wishing to modify or understand the code. Two things to bear | |
894 in mind: | |
895 | |
896 =over 2 | |
897 | |
898 =item 1 Do NOT rely on these in any code outside of this module. | |
899 | |
900 All data members are prefixed with an underscore to signify that they | |
901 are private. Always use accessor methods. If the accessor doesn't | |
902 exist or is inadequate, create or modify an accessor (and let me know, | |
903 too!). | |
904 | |
905 =item 2 This documentation may be incomplete and out of date. | |
906 | |
907 It is easy for this documentation to become obsolete as this module is | |
908 still evolving. Always double check this info and search for members | |
909 not described here. | |
910 | |
911 =back | |
912 | |
913 An instance of Bio::Tools::RestrictionEnzyme.pm is a blessed reference | |
914 to a hash containing all or some of the following fields: | |
915 | |
916 FIELD VALUE | |
917 ------------------------------------------------------------------------ | |
918 _rev : The corrected reverse complement of the fully expanded pattern. | |
919 | |
920 INHERITED DATA MEMBERS: | |
921 | |
922 _seq : (From Bio::Seq.pm) The original, unexpanded input sequence after untainting. | |
923 _type : (From Bio::Seq.pm) 'Dna' or 'Amino' | |
924 | |
925 | |
926 =cut |