Mercurial > repos > mahtabm > ensembl
comparison variant_effect_predictor/Bio/Tools/SeqPattern.pm @ 0:1f6dce3d34e0
Uploaded
| author | mahtabm |
|---|---|
| date | Thu, 11 Apr 2013 02:01:53 -0400 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:1f6dce3d34e0 |
|---|---|
| 1 # $Id: SeqPattern.pm,v 1.14 2002/10/22 07:38:46 lapp Exp $ | |
| 2 # | |
| 3 # bioperl module for Bio::Tools::SeqPattern | |
| 4 # | |
| 5 # Cared for by Steve Chervitz (sac@bioperl.org) | |
| 6 # | |
| 7 # Copyright Steve Chervitz | |
| 8 # | |
| 9 # You may distribute this module under the same terms as perl itself | |
| 10 | |
| 11 # POD documentation - main docs before the code | |
| 12 | |
| 13 =head1 NAME | |
| 14 | |
| 15 Bio::Tools::SeqPattern - Bioperl object for a sequence pattern or motif | |
| 16 | |
| 17 =head1 SYNOPSIS | |
| 18 | |
| 19 =head2 Object Creation | |
| 20 | |
| 21 use Bio::Tools::SeqPattern (); | |
| 22 | |
| 23 $pat1 = 'T[GA]AA...TAAT'; | |
| 24 $pattern1 = new Bio::Tools::SeqPattern(-SEQ =>$pattern, -TYPE =>'Dna'); | |
| 25 | |
| 26 $pat2 = '[VILM]R(GXX){3,2}...[^PG]'; | |
| 27 $pattern2 = new Bio::Tools::SeqPattern(-SEQ =>$pattern, -TYPE =>'Amino'); | |
| 28 | |
| 29 =head1 DESCRIPTION | |
| 30 | |
| 31 The Bio::Tools::SeqPattern.pm module encapsulates generic data and | |
| 32 methods for manipulating regular expressions describing nucleic or | |
| 33 amino acid sequence patterns (a.k.a, "motifs"). | |
| 34 | |
| 35 Bio::Tools::SeqPattern.pm is a concrete class that inherits from | |
| 36 B<Bio::Seq.pm>. | |
| 37 | |
| 38 This class grew out of a need to have a standard module for doing routine | |
| 39 tasks with sequence patterns such as: | |
| 40 | |
| 41 -- Forming a reverse-complement version of a nucleotide sequence pattern | |
| 42 -- Expanding patterns containing ambiguity codes | |
| 43 -- Checking for invalid regexp characters | |
| 44 -- Untainting yet preserving special characters in the pattern | |
| 45 | |
| 46 Other features to look for in the future: | |
| 47 | |
| 48 -- Full pattern syntax checking | |
| 49 -- Conversion between expanded and ondensed forms of the pattern | |
| 50 | |
| 51 =head1 MOTIVATIONS | |
| 52 | |
| 53 A key motivation for Bio::Tools::SeqPattern.pm is to have a way to | |
| 54 generate a reverse complement of a nucleotide sequence pattern. | |
| 55 This makes possible simultaneous pattern matching on both sense and | |
| 56 anti-sense strands of a query sequence. | |
| 57 | |
| 58 In principle, one could do such a search more inefficiently by testing | |
| 59 against both sense and anti-sense versions of a sequence. | |
| 60 It is entirely equivalent to test a regexp containing both sense and | |
| 61 anti-sense versions of the *pattern* against one copy of the sequence. | |
| 62 The latter approach is much more efficient since: | |
| 63 | |
| 64 1) You need only one copy of the sequence. | |
| 65 2) Only one regexp is executed. | |
| 66 3) Regexp patterns are typically much smaller than sequences. | |
| 67 | |
| 68 Patterns can be quite complex and it is often difficult to | |
| 69 generate the reverse complement pattern. The Bioperl SeqPattern.pm | |
| 70 addresses this problem, providing a convenient set of tools | |
| 71 for working with biological sequence regular expressions. | |
| 72 | |
| 73 Not all patterns have been tested. If you discover a pattern that | |
| 74 is not handled properly by Bio::Tools::SeqPattern.pm, please | |
| 75 send me some email (sac@bioperl.org). Thanks. | |
| 76 | |
| 77 =head1 OTHER FEATURES | |
| 78 | |
| 79 =head2 Extended Alphabet Support | |
| 80 | |
| 81 This module supports the same set of ambiguity codes for nucleotide | |
| 82 sequences as supported by B<Bio::Seq.pm>. These ambiguity codes | |
| 83 define the behavior or the expand() method. | |
| 84 | |
| 85 ------------------------------------------ | |
| 86 Symbol Meaning Nucleic Acid | |
| 87 ------------------------------------------ | |
| 88 A A Adenine | |
| 89 C C Cytosine | |
| 90 G G Guanine | |
| 91 T T Thymine | |
| 92 U U Uracil | |
| 93 M A or C | |
| 94 R A or G Any purine | |
| 95 W A or T | |
| 96 S C or G | |
| 97 Y C or T Any pyrimidine | |
| 98 K G or T | |
| 99 V A or C or G | |
| 100 H A or C or T | |
| 101 D A or G or T | |
| 102 B C or G or T | |
| 103 X G or A or T or C | |
| 104 N G or A or T or C | |
| 105 . G or A or T or C | |
| 106 | |
| 107 | |
| 108 | |
| 109 ------------------------------------------ | |
| 110 Symbol Meaning | |
| 111 ------------------------------------------ | |
| 112 A Alanine | |
| 113 C Cysteine | |
| 114 D Aspartic Acid | |
| 115 E Glutamic Acid | |
| 116 F Phenylalanine | |
| 117 G Glycine | |
| 118 H Histidine | |
| 119 I Isoleucine | |
| 120 K Lysine | |
| 121 L Leucine | |
| 122 M Methionine | |
| 123 N Asparagine | |
| 124 P Proline | |
| 125 Q Glutamine | |
| 126 R Arginine | |
| 127 S Serine | |
| 128 T Threonine | |
| 129 V Valine | |
| 130 W Tryptophan | |
| 131 Y Tyrosine | |
| 132 | |
| 133 B Aspartic Acid, Asparagine | |
| 134 Z Glutamic Acid, Glutamine | |
| 135 X Any amino acid | |
| 136 . Any amino acid | |
| 137 | |
| 138 | |
| 139 =head2 Multiple Format Support | |
| 140 | |
| 141 Ultimately, this module should be able to build SeqPattern.pm objects | |
| 142 using a variety of pattern formats such as ProSite, Blocks, Prints, GCG, etc. | |
| 143 Currently, this module only supports patterns using a grep-like syntax. | |
| 144 | |
| 145 =head1 USAGE | |
| 146 | |
| 147 A simple demo script called seq_pattern.pl is included in the examples/ | |
| 148 directory of the central Bioperl distribution. | |
| 149 | |
| 150 =head1 SEE ALSO | |
| 151 | |
| 152 L<Bio::Root::Object> - Base class. | |
| 153 L<Bio::Seq> - Lightweight sequence object. | |
| 154 | |
| 155 http://bio.perl.org/Projects/modules.html - Online module documentation | |
| 156 http://bio.perl.org/ - Bioperl Project Homepage | |
| 157 | |
| 158 =head1 FEEDBACK | |
| 159 | |
| 160 =head2 Mailing Lists | |
| 161 | |
| 162 User feedback is an integral part of the evolution of this and other | |
| 163 Bioperl modules. Send your comments and suggestions preferably to one | |
| 164 of the Bioperl mailing lists. Your participation is much appreciated. | |
| 165 | |
| 166 bioperl-l@bioperl.org - General discussion | |
| 167 http://bio.perl.org/MailList.html - About the mailing lists | |
| 168 | |
| 169 =head2 Reporting Bugs | |
| 170 | |
| 171 Report bugs to the Bioperl bug tracking system to help us keep track | |
| 172 the bugs and their resolution. Bug reports can be submitted via email | |
| 173 or the web: | |
| 174 | |
| 175 bioperl-bugs@bio.perl.org | |
| 176 http://bugzilla.bioperl.org/ | |
| 177 | |
| 178 =head1 AUTHOR | |
| 179 | |
| 180 Steve Chervitz, sac@bioperl.org | |
| 181 | |
| 182 =head1 VERSION | |
| 183 | |
| 184 Bio::Tools::SeqPattern.pm, 0.011 | |
| 185 | |
| 186 =head1 COPYRIGHT | |
| 187 | |
| 188 Copyright (c) 1997-8 Steve Chervitz. All Rights Reserved. | |
| 189 This module is free software; you can redistribute it and/or | |
| 190 modify it under the same terms as Perl itself. | |
| 191 | |
| 192 =cut | |
| 193 | |
| 194 # | |
| 195 ## | |
| 196 ### | |
| 197 #### END of main POD documentation. | |
| 198 ### | |
| 199 ## | |
| 200 #' | |
| 201 # CREATED : 28 Aug 1997 | |
| 202 | |
| 203 | |
| 204 package Bio::Tools::SeqPattern; | |
| 205 | |
| 206 use Bio::Root::Root; | |
| 207 @ISA = qw(Bio::Root::Root); | |
| 208 use strict; | |
| 209 use vars qw ($ID $VERSION); | |
| 210 $ID = 'Bio::Tools::SeqPattern'; | |
| 211 $VERSION = 0.011; | |
| 212 | |
| 213 ## These constants may be more appropriate in a Bio::Dictionary.pm | |
| 214 ## type of class. | |
| 215 my $PURINES = 'AG'; | |
| 216 my $PYRIMIDINES = 'CT'; | |
| 217 my $BEE = 'DN'; | |
| 218 my $ZED = 'EQ'; | |
| 219 my $Regexp_chars = '\w,.\*()\[\]<>\{\}^\$'; # quoted for use in regexps | |
| 220 | |
| 221 ## Package variables used in reverse complementing. | |
| 222 my (%Processed_braces, %Processed_asterics); | |
| 223 | |
| 224 ##################################################################################### | |
| 225 ## CONSTRUCTOR ## | |
| 226 ##################################################################################### | |
| 227 | |
| 228 | |
| 229 =head1 new | |
| 230 | |
| 231 Title : new | |
| 232 Usage : my $seqpat = new Bio::Tools::SeqPattern(); | |
| 233 Purpose : Verifies that the type is correct for superclass (Bio::Seq.pm) | |
| 234 : and calls superclass constructor last. | |
| 235 Returns : n/a | |
| 236 Argument : Parameters passed to new() | |
| 237 Throws : Exception if the pattern string (seq) is empty. | |
| 238 Comments : The process of creating a new SeqPattern.pm object | |
| 239 : ensures that the pattern string is untained. | |
| 240 | |
| 241 See Also : B<Bio::Root::Root::new()>, | |
| 242 B<Bio::Seq::_initialize()> | |
| 243 | |
| 244 =cut | |
| 245 | |
| 246 #---------------- | |
| 247 sub new { | |
| 248 #---------------- | |
| 249 my($class, %param) = @_; | |
| 250 | |
| 251 my $self = $class->SUPER::new(%param); | |
| 252 my ($seq,$type) = $self->_rearrange([qw(SEQ TYPE)], %param); | |
| 253 | |
| 254 $seq || $self->throw("Empty pattern."); | |
| 255 my $t; | |
| 256 # Get the type ready for Bio::Seq.pm | |
| 257 if ($type =~ /nuc|[dr]na/i) { | |
| 258 $t = 'Dna'; | |
| 259 } elsif ($type =~ /amino|pep|prot/i) { | |
| 260 $t = 'Amino'; | |
| 261 } | |
| 262 $seq =~ tr/a-z/A-Z/; #ps 8/8/00 Canonicalize to upper case | |
| 263 $self->str($seq); | |
| 264 $self->type($t); | |
| 265 | |
| 266 return $self; | |
| 267 } | |
| 268 | |
| 269 | |
| 270 =head1 alphabet_ok | |
| 271 | |
| 272 Title : alphabet_ok | |
| 273 Usage : $mypat->alphabet_ok; | |
| 274 Purpose : Checks for invalid regexp characters. | |
| 275 : Overrides Bio::Seq::alphabet_ok() to allow | |
| 276 : additional regexp characters ,.*()[]<>{}^$ | |
| 277 : in addition to the standard genetic alphabet. | |
| 278 : Also untaints the pattern and sets the sequence | |
| 279 : object's sequence to the untained string. | |
| 280 Returns : Boolean (1 | 0) | |
| 281 Argument : n/a | |
| 282 Throws : Exception if the pattern contains invalid characters. | |
| 283 Comments : Does not call the superclass method. | |
| 284 : Actually permits any alphanumeric, not just the | |
| 285 : standard genetic alphabet. | |
| 286 | |
| 287 =cut | |
| 288 | |
| 289 #----------------' | |
| 290 sub alphabet_ok { | |
| 291 #---------------- | |
| 292 my( $self) = @_; | |
| 293 | |
| 294 return 1 if $self->{'_alphabet_checked'}; | |
| 295 | |
| 296 $self->{'_alphabet_checked'} = 1; | |
| 297 | |
| 298 my $pat = $self->seq(); | |
| 299 | |
| 300 if($pat =~ /[^$Regexp_chars]/io) { | |
| 301 $self->throw("Pattern contains invalid characters: $pat", | |
| 302 'Legal characters: a-z,A-Z,0-9,,.*()[]<>{}^$ '); | |
| 303 } | |
| 304 | |
| 305 # Untaint pattern (makes code taint-safe). | |
| 306 $pat =~ /[$Regexp_chars]+/io; | |
| 307 $self->setseq(uc($&)); | |
| 308 # print STDERR "\npattern ok: $pat\n"; | |
| 309 1; | |
| 310 } | |
| 311 | |
| 312 =head1 expand | |
| 313 | |
| 314 Title : expand | |
| 315 Usage : $seqpat_object->expand(); | |
| 316 Purpose : Expands the sequence pattern using special ambiguity codes. | |
| 317 Example : $pat = $seq_pat->expand(); | |
| 318 Returns : String containing fully expanded sequence pattern | |
| 319 Argument : n/a | |
| 320 Throws : Exception if sequence type is not recognized | |
| 321 : (i.e., is not one of [DR]NA, Amino) | |
| 322 | |
| 323 See Also : B<Extended Alphabet Support>, L<_expand_pep>(), L<_expand_nuc>() | |
| 324 | |
| 325 =cut | |
| 326 | |
| 327 #---------- | |
| 328 sub expand { | |
| 329 #---------- | |
| 330 my $self = shift; | |
| 331 | |
| 332 if($self->type =~ /[DR]na/i) { $self->_expand_nuc(); } | |
| 333 elsif($self->type =~ /Amino/i) { $self->_expand_pep(); } | |
| 334 else{ | |
| 335 $self->throw("Don't know how to expand ${\$self->type} patterns.\n"); | |
| 336 } | |
| 337 } | |
| 338 | |
| 339 | |
| 340 =head1 _expand_pep | |
| 341 | |
| 342 Title : _expand_pep | |
| 343 Usage : n/a; automatically called by expand() | |
| 344 Purpose : Expands peptide patterns | |
| 345 Returns : String (the expanded pattern) | |
| 346 Argument : String (the unexpanded pattern) | |
| 347 Throws : n/a | |
| 348 | |
| 349 See Also : L<expand>(), L<_expand_nuc>() | |
| 350 | |
| 351 =cut | |
| 352 | |
| 353 #---------------- | |
| 354 sub _expand_pep { | |
| 355 #---------------- | |
| 356 my ($self,$pat) = @_; | |
| 357 $pat ||= $self->str; | |
| 358 $pat =~ s/X/./g; | |
| 359 $pat =~ s/^</\^/; | |
| 360 $pat =~ s/>$/\$/; | |
| 361 | |
| 362 ## Avoid nested situations: [bmnq] --/--> [[$ZED]mnq] | |
| 363 ## Yet correctly deal with: fze[bmnq] ---> f[$BEE]e[$ZEDmnq] | |
| 364 if($pat =~ /\[\w*[BZ]\w*\]/) { | |
| 365 $pat =~ s/\[(\w*)B(\w*)\]/\[$1$ZED$2\]/g; | |
| 366 $pat =~ s/\[(\w*)Z(\w*)\]/\[$1$BEE$2\]/g; | |
| 367 $pat =~ s/B/\[$ZED\]/g; | |
| 368 $pat =~ s/Z/\[$BEE\]/g; | |
| 369 } else { | |
| 370 $pat =~ s/B/\[$ZED\]/g; | |
| 371 $pat =~ s/Z/\[$BEE\]/g; | |
| 372 } | |
| 373 $pat =~ s/\((.)\)/$1/g; ## Doing these last since: | |
| 374 $pat =~ s/\[(.)\]/$1/g; ## Pattern could contain [B] (for example) | |
| 375 | |
| 376 return $pat; | |
| 377 } | |
| 378 | |
| 379 | |
| 380 | |
| 381 =head1 _expand_nuc | |
| 382 | |
| 383 Title : _expand_nuc | |
| 384 Purpose : Expands nucleotide patterns | |
| 385 Returns : String (the expanded pattern) | |
| 386 Argument : String (the unexpanded pattern) | |
| 387 Throws : n/a | |
| 388 | |
| 389 See Also : L<expand>(), L<_expand_pep>() | |
| 390 | |
| 391 =cut | |
| 392 | |
| 393 #--------------- | |
| 394 sub _expand_nuc { | |
| 395 #--------------- | |
| 396 my ($self,$pat) = @_; | |
| 397 | |
| 398 $pat ||= $self->str; | |
| 399 $pat =~ s/N|X/./g; | |
| 400 $pat =~ s/pu/R/ig; | |
| 401 $pat =~ s/py/Y/ig; | |
| 402 $pat =~ s/U/T/g; | |
| 403 $pat =~ s/^</\^/; | |
| 404 $pat =~ s/>$/\$/; | |
| 405 | |
| 406 ## Avoid nested situations: [ya] --/--> [[ct]a] | |
| 407 ## Yet correctly deal with: sg[ya] ---> [gc]g[cta] | |
| 408 if($pat =~ /\[\w*[RYSWMK]\w*\]/) { | |
| 409 $pat =~ s/\[(\w*)R(\w*)\]/\[$1$PURINES$2\]/g; | |
| 410 $pat =~ s/\[(\w*)Y(\w*)\]/\[$1$PYRIMIDINES$2\]/g; | |
| 411 $pat =~ s/\[(\w*)S(\w*)\]/\[$1GC$2\]/g; | |
| 412 $pat =~ s/\[(\w*)W(\w*)\]/\[$1AT$2\]/g; | |
| 413 $pat =~ s/\[(\w*)M(\w*)\]/\[$1AC$2\]/g; | |
| 414 $pat =~ s/\[(\w*)K(\w*)\]/\[$1GT$2\]/g; | |
| 415 $pat =~ s/\[(\w*)V(\w*)\]/\[$1ACG$2\]/g; | |
| 416 $pat =~ s/\[(\w*)H(\w*)\]/\[$1ACT$2\]/g; | |
| 417 $pat =~ s/\[(\w*)D(\w*)\]/\[$1AGT$2\]/g; | |
| 418 $pat =~ s/\[(\w*)B(\w*)\]/\[$1CGT$2\]/g; | |
| 419 $pat =~ s/R/\[$PURINES\]/g; | |
| 420 $pat =~ s/Y/\[$PYRIMIDINES\]/g; | |
| 421 $pat =~ s/S/\[GC\]/g; | |
| 422 $pat =~ s/W/\[AT\]/g; | |
| 423 $pat =~ s/M/\[AC\]/g; | |
| 424 $pat =~ s/K/\[GT\]/g; | |
| 425 $pat =~ s/V/\[ACG\]/g; | |
| 426 $pat =~ s/H/\[ACT\]/g; | |
| 427 $pat =~ s/D/\[AGT\]/g; | |
| 428 $pat =~ s/B/\[CGT\]/g; | |
| 429 } else { | |
| 430 $pat =~ s/R/\[$PURINES\]/g; | |
| 431 $pat =~ s/Y/\[$PYRIMIDINES\]/g; | |
| 432 $pat =~ s/S/\[GC\]/g; | |
| 433 $pat =~ s/W/\[AT\]/g; | |
| 434 $pat =~ s/M/\[AC\]/g; | |
| 435 $pat =~ s/K/\[GT\]/g; | |
| 436 $pat =~ s/V/\[ACG\]/g; | |
| 437 $pat =~ s/H/\[ACT\]/g; | |
| 438 $pat =~ s/D/\[AGT\]/g; | |
| 439 $pat =~ s/B/\[CGT\]/g; | |
| 440 } | |
| 441 $pat =~ s/\((.)\)/$1/g; ## Doing thses last since: | |
| 442 $pat =~ s/\[(.)\]/$1/g; ## Pattern could contain [y] (for example) | |
| 443 | |
| 444 return $pat; | |
| 445 } | |
| 446 | |
| 447 | |
| 448 | |
| 449 =head1 revcom | |
| 450 | |
| 451 Title : revcom | |
| 452 Usage : revcom([1]); | |
| 453 Purpose : Forms a pattern capable of recognizing the reverse complement | |
| 454 : version of a nucleotide sequence pattern. | |
| 455 Example : $pattern_object->revcom(); | |
| 456 : $pattern_object->revcom(1); ## returns expanded rev complement pattern. | |
| 457 Returns : Object reference for a new Bio::Tools::SeqPattern containing | |
| 458 : the revcom of the current pattern as its sequence. | |
| 459 Argument : (1) boolean (optional) (default= false) | |
| 460 : true : expand the pattern before rev-complementing. | |
| 461 : false: don't expand pattern before or after rev-complementing. | |
| 462 Throws : Exception if called for amino acid sequence pattern. | |
| 463 Comments : This method permits the simultaneous searching of both | |
| 464 : sense and anti-sense versions of a nucleotide pattern | |
| 465 : by means of a grep-type of functionality in which any | |
| 466 : number of patterns may be or-ed into the recognition | |
| 467 : pattern. | |
| 468 : Overrides Bio::Seq::revcom() and calls it first thing. | |
| 469 : The order of _fixpat() calls is critical. | |
| 470 | |
| 471 See Also : B<Bio::Seq::revcom()>, L<_fixpat_1>(), L<_fixpat_2>(), L<_fixpat_3>(), L<_fixpat_4>(), L<_fixpat_5>() | |
| 472 | |
| 473 =cut | |
| 474 | |
| 475 #-----------' | |
| 476 sub revcom { | |
| 477 #----------- | |
| 478 my($self,$expand) = @_; | |
| 479 | |
| 480 if ($self->type !~ /Dna|Rna/i) { | |
| 481 $self->throw("Can't get revcom for ${\$self->type} sequence types.\n"); | |
| 482 } | |
| 483 # return $self->{'_rev'} if defined $self->{'_rev'}; | |
| 484 | |
| 485 $expand ||= 0; | |
| 486 my $str = $self->str; | |
| 487 $str =~ tr/acgtrymkswhbvdnxACGTRYMKSWHBVDNX/tgcayrkmswdvbhnxTGCAYRKMSWDVBHNX/; | |
| 488 my $rev = CORE::reverse $str; | |
| 489 $rev =~ tr/[](){}<>/][)(}{></; | |
| 490 | |
| 491 if($expand) { | |
| 492 $rev = $self->_expand_nuc($rev); | |
| 493 # print "\nExpanded: $rev\n"; | |
| 494 } | |
| 495 | |
| 496 %Processed_braces = (); | |
| 497 %Processed_asterics = (); | |
| 498 | |
| 499 my $fixrev = _fixpat_1($rev); | |
| 500 # print "FIX 1: $fixrev";<STDIN>; | |
| 501 | |
| 502 $fixrev = _fixpat_2($fixrev); | |
| 503 # print "FIX 2: $fixrev";<STDIN>; | |
| 504 | |
| 505 $fixrev = _fixpat_3($fixrev); | |
| 506 # print "FIX 3: $fixrev";<STDIN>; | |
| 507 | |
| 508 $fixrev = _fixpat_4($fixrev); | |
| 509 # print "FIX 4: $fixrev";<STDIN>; | |
| 510 | |
| 511 $fixrev = _fixpat_5($fixrev); | |
| 512 # print "FIX 5: $fixrev";<STDIN>; | |
| 513 | |
| 514 ##### Added by ps 8/7/00 to allow non-greedy matching | |
| 515 $fixrev = _fixpat_6($fixrev); | |
| 516 # print "FIX 6: $fixrev";<STDIN>; | |
| 517 | |
| 518 # $self->{'_rev'} = $fixrev; | |
| 519 | |
| 520 return new Bio::Tools::SeqPattern(-seq =>$fixrev, -type =>$self->type); | |
| 521 } | |
| 522 | |
| 523 | |
| 524 | |
| 525 =head1 _fixpat_1 | |
| 526 | |
| 527 Title : _fixpat_1 | |
| 528 Usage : n/a; called automatically by revcom() | |
| 529 Purpose : Utility method for revcom() | |
| 530 : Converts all {7,5} --> {5,7} (Part I) | |
| 531 : and [T^] --> [^T] (Part II) | |
| 532 : and *N --> N* (Part III) | |
| 533 Returns : String (the new, partially reversed pattern) | |
| 534 Argument : String (the expanded pattern) | |
| 535 Throws : n/a | |
| 536 | |
| 537 See Also : L<revcom>() | |
| 538 | |
| 539 =cut | |
| 540 | |
| 541 #-------------- | |
| 542 sub _fixpat_1 { | |
| 543 #-------------- | |
| 544 my $pat = shift; | |
| 545 | |
| 546 ## Part I: | |
| 547 my (@done,@parts); | |
| 548 while(1) { | |
| 549 $pat =~ /(.*)\{(\S+?)\}(.*)/ or do{ push @done, $pat; last; }; | |
| 550 $pat = $1.'#{'.reverse($2).'}'.$3; | |
| 551 # print "1: $1\n2: $2\n3: $3\n"; | |
| 552 # print "modified pat: $pat";<STDIN>; | |
| 553 @parts = split '#', $pat; | |
| 554 push @done, $parts[1]; | |
| 555 $pat = $parts[0]; | |
| 556 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; | |
| 557 last if not $pat; | |
| 558 } | |
| 559 $pat = join('', reverse @done); | |
| 560 | |
| 561 ## Part II: | |
| 562 @done = (); | |
| 563 while(1) { | |
| 564 $pat =~ /(.*)\[(\S+?)\](.*)/ or do{ push @done, $pat; last; }; | |
| 565 $pat = $1.'#['.reverse($2).']'.$3; | |
| 566 # print "1: $1\n2: $2\n3: $3\n"; | |
| 567 # print "modified pat: $pat";<STDIN>; | |
| 568 @parts = split '#', $pat; | |
| 569 push @done, $parts[1]; | |
| 570 $pat = $parts[0]; | |
| 571 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; | |
| 572 last if not $pat; | |
| 573 } | |
| 574 $pat = join('', reverse @done); | |
| 575 | |
| 576 ## Part III: | |
| 577 @done = (); | |
| 578 while(1) { | |
| 579 $pat =~ /(.*)\*([\w.])(.*)/ or do{ push @done, $pat; last; }; | |
| 580 $pat = $1.'#'.$2.'*'.$3; | |
| 581 $Processed_asterics{$2}++; | |
| 582 # print "1: $1\n2: $2\n3: $3\n"; | |
| 583 # print "modified pat: $pat";<STDIN>; | |
| 584 @parts = split '#', $pat; | |
| 585 push @done, $parts[1]; | |
| 586 $pat = $parts[0]; | |
| 587 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; | |
| 588 last if not $pat; | |
| 589 } | |
| 590 return join('', reverse @done); | |
| 591 | |
| 592 } | |
| 593 | |
| 594 | |
| 595 =head1 _fixpat_2 | |
| 596 | |
| 597 Title : _fixpat_2 | |
| 598 Usage : n/a; called automatically by revcom() | |
| 599 Purpose : Utility method for revcom() | |
| 600 : Converts all {5,7}Y ---> Y{5,7} | |
| 601 : and {10,}. ---> .{10,} | |
| 602 Returns : String (the new, partially reversed pattern) | |
| 603 Argument : String (the expanded, partially reversed pattern) | |
| 604 Throws : n/a | |
| 605 | |
| 606 See Also : L<revcom>() | |
| 607 | |
| 608 =cut | |
| 609 | |
| 610 #-------------- | |
| 611 sub _fixpat_2 { | |
| 612 #-------------- | |
| 613 my $pat = shift; | |
| 614 | |
| 615 local($^W) = 0; | |
| 616 my (@done,@parts,$braces); | |
| 617 while(1) { | |
| 618 # $pat =~ s/(.*)([^])])(\{\S+?\})([\w.])(.*)/$1$2#$4$3$5/ or do{ push @done, $pat; last; }; | |
| 619 $pat =~ s/(.*)(\{\S+?\})([\w.])(.*)/$1#$3$2$4/ or do{ push @done, $pat; last; }; | |
| 620 $braces = $2; | |
| 621 $braces =~ s/[{}]//g; | |
| 622 $Processed_braces{"$3$braces"}++; | |
| 623 # print "modified pat: $pat";<STDIN>; | |
| 624 @parts = split '#', $pat; | |
| 625 push @done, $parts[1]; | |
| 626 $pat = $parts[0]; | |
| 627 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; | |
| 628 last if not $pat; | |
| 629 } | |
| 630 return join('', reverse @done); | |
| 631 } | |
| 632 | |
| 633 | |
| 634 =head1 _fixpat_3 | |
| 635 | |
| 636 Title : _fixpat_3 | |
| 637 Usage : n/a; called automatically by revcom() | |
| 638 Purpose : Utility method for revcom() | |
| 639 : Converts all {5,7}(XXX) ---> (XXX){5,7} | |
| 640 Returns : String (the new, partially reversed pattern) | |
| 641 Argument : String (the expanded, partially reversed pattern) | |
| 642 Throws : n/a | |
| 643 | |
| 644 See Also : L<revcom>() | |
| 645 | |
| 646 =cut | |
| 647 | |
| 648 #------------- | |
| 649 sub _fixpat_3 { | |
| 650 #------------- | |
| 651 my $pat = shift; | |
| 652 | |
| 653 my (@done,@parts,$braces,$newpat,$oldpat); | |
| 654 while(1) { | |
| 655 # $pat =~ s/(.+)(\{\S+\})(\(\w+\))(.*)/$1#$3$2$4/ or do{ push @done, $pat; last; }; | |
| 656 if( $pat =~ /(.*)(.)(\{\S+\})(\(\w+\))(.*)/) { | |
| 657 $newpat = "$1#$2$4$3$5"; | |
| 658 ##ps $oldpat = "$1#$2$3$4$5"; | |
| 659 # print "1: $1\n2: $2\n3: $3\n4: $4\n5: $5\n"; | |
| 660 ##ps $braces = $3; | |
| 661 ##ps $braces =~ s/[{}]//g; | |
| 662 ##ps if( exists $Processed_braces{"$2$braces"} || exists $Processed_asterics{$2}) { | |
| 663 ##ps $pat = $oldpat; # Don't change it. Already processed. | |
| 664 # print "saved pat: $pat";<STDIN>; | |
| 665 ##ps } else { | |
| 666 # print "new pat: $newpat";<STDIN>; | |
| 667 $pat = $newpat; # Change it. | |
| 668 ##ps } | |
| 669 } elsif( $pat =~ /^(\{\S+\})(\(\w+\))(.*)/) { | |
| 670 $pat = "#$2$1$3"; | |
| 671 } else { | |
| 672 push @done, $pat; last; | |
| 673 } | |
| 674 @parts = split '#', $pat; | |
| 675 push @done, $parts[1]; | |
| 676 $pat = $parts[0]; | |
| 677 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; | |
| 678 last if not $pat; | |
| 679 } | |
| 680 return join('', reverse @done); | |
| 681 } | |
| 682 | |
| 683 | |
| 684 =head1 _fixpat_4 | |
| 685 | |
| 686 Title : _fixpat_4 | |
| 687 Usage : n/a; called automatically by revcom() | |
| 688 Purpose : Utility method for revcom() | |
| 689 : Converts all {5,7}[XXX] ---> [XXX]{5,7} | |
| 690 Returns : String (the new, partially reversed pattern) | |
| 691 Argument : String (the expanded, partially reversed pattern) | |
| 692 Throws : n/a | |
| 693 | |
| 694 See Also : L<revcom>() | |
| 695 | |
| 696 =cut | |
| 697 | |
| 698 #--------------- | |
| 699 sub _fixpat_4 { | |
| 700 #--------------- | |
| 701 my $pat = shift; | |
| 702 | |
| 703 my (@done,@parts,$braces,$newpat,$oldpat); | |
| 704 while(1) { | |
| 705 # $pat =~ s/(.*)(\{\S+\})(\[\w+\])(.*)/$1#$3$2$4/ or do{ push @done, $pat; last; }; | |
| 706 # $pat =~ s/(.*)([^\w.])(\{\S+\})(\[\w+\])(.*)/$1$2#$4$3$5/ or do{ push @done, $pat; last; }; | |
| 707 if( $pat =~ /(.*)(.)(\{\S+\})(\[\w+\])(.*)/) { | |
| 708 $newpat = "$1#$2$4$3$5"; | |
| 709 $oldpat = "$1#$2$3$4$5"; | |
| 710 # print "1: $1\n2: $2\n3: $3\n4: $4\n5: $5\n"; | |
| 711 $braces = $3; | |
| 712 $braces =~ s/[{}]//g; | |
| 713 if( (defined $braces and defined $2) and | |
| 714 exists $Processed_braces{"$2$braces"} || exists $Processed_asterics{$2}) { | |
| 715 $pat = $oldpat; # Don't change it. Already processed. | |
| 716 # print "saved pat: $pat";<STDIN>; | |
| 717 } else { | |
| 718 $pat = $newpat; # Change it. | |
| 719 # print "new pat: $pat";<STDIN>; | |
| 720 } | |
| 721 } elsif( $pat =~ /^(\{\S+\})(\[\w+\])(.*)/) { | |
| 722 $pat = "#$2$1$3"; | |
| 723 } else { | |
| 724 push @done, $pat; last; | |
| 725 } | |
| 726 | |
| 727 @parts = split '#', $pat; | |
| 728 push @done, $parts[1]; | |
| 729 $pat = $parts[0]; | |
| 730 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; | |
| 731 last if not $pat; | |
| 732 } | |
| 733 return join('', reverse @done); | |
| 734 } | |
| 735 | |
| 736 | |
| 737 =head1 _fixpat_5 | |
| 738 | |
| 739 Title : _fixpat_5 | |
| 740 Usage : n/a; called automatically by revcom() | |
| 741 Purpose : Utility method for revcom() | |
| 742 : Converts all *[XXX] ---> [XXX]* | |
| 743 : and *(XXX) ---> (XXX)* | |
| 744 Returns : String (the new, partially reversed pattern) | |
| 745 Argument : String (the expanded, partially reversed pattern) | |
| 746 Throws : n/a | |
| 747 | |
| 748 See Also : L<revcom>() | |
| 749 | |
| 750 =cut | |
| 751 | |
| 752 #-------------- | |
| 753 sub _fixpat_5 { | |
| 754 #-------------- | |
| 755 my $pat = shift; | |
| 756 | |
| 757 my (@done,@parts,$newpat,$oldpat); | |
| 758 while(1) { | |
| 759 # $pat =~ s/(.*)(\{\S+\})(\[\w+\])(.*)/$1#$3$2$4/ or do{ push @done, $pat; last; }; | |
| 760 # $pat =~ s/(.*)([^\w.])(\{\S+\})(\[\w+\])(.*)/$1$2#$4$3$5/ or do{ push @done, $pat; last; }; | |
| 761 if( $pat =~ /(.*)(.)\*(\[\w+\]|\(\w+\))(.*)/) { | |
| 762 $newpat = "$1#$2$3*$4"; | |
| 763 $oldpat = "$1#$2*$3$4"; | |
| 764 # print "1: $1\n2: $2\n3: $3\n4: $4\n"; | |
| 765 if( exists $Processed_asterics{$2}) { | |
| 766 $pat = $oldpat; # Don't change it. Already processed. | |
| 767 # print "saved pat: $pat";<STDIN>; | |
| 768 } else { | |
| 769 $pat = $newpat; # Change it. | |
| 770 # print "new pat: $pat";<STDIN>; | |
| 771 } | |
| 772 } elsif( $pat =~ /^\*(\[\w+\]|\(\w+\))(.*)/) { | |
| 773 $pat = "#$1*$3"; | |
| 774 } else { | |
| 775 push @done, $pat; last; | |
| 776 } | |
| 777 | |
| 778 @parts = split '#', $pat; | |
| 779 push @done, $parts[1]; | |
| 780 $pat = $parts[0]; | |
| 781 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; | |
| 782 last if not $pat; | |
| 783 } | |
| 784 return join('', reverse @done); | |
| 785 } | |
| 786 | |
| 787 | |
| 788 | |
| 789 | |
| 790 | |
| 791 ############################ | |
| 792 # | |
| 793 # PS: Added 8/7/00 to allow non-greedy matching patterns | |
| 794 # | |
| 795 ###################################### | |
| 796 | |
| 797 =head1 _fixpat_6 | |
| 798 | |
| 799 Title : _fixpat_6 | |
| 800 Usage : n/a; called automatically by revcom() | |
| 801 Purpose : Utility method for revcom() | |
| 802 : Converts all ?Y{5,7} ---> Y{5,7}? | |
| 803 : and ?(XXX){5,7} ---> (XXX){5,7}? | |
| 804 : and ?[XYZ]{5,7} ---> [XYZ]{5,7}? | |
| 805 Returns : String (the new, partially reversed pattern) | |
| 806 Argument : String (the expanded, partially reversed pattern) | |
| 807 Throws : n/a | |
| 808 | |
| 809 See Also : L<revcom>() | |
| 810 | |
| 811 =cut | |
| 812 | |
| 813 #-------------- | |
| 814 sub _fixpat_6 { | |
| 815 #-------------- | |
| 816 my $pat = shift; | |
| 817 my (@done,@parts); | |
| 818 | |
| 819 @done = (); | |
| 820 while(1) { | |
| 821 $pat =~ /(.*)\?(\[\w+\]|\(\w+\)|\w)(\{\S+?\})?(.*)/ or do{ push @done, $pat; last; }; | |
| 822 my $quantifier = $3 ? $3 : ""; # Shut up warning if no explicit quantifier | |
| 823 $pat = $1.'#'.$2.$quantifier.'?'.$4; | |
| 824 # $pat = $1.'#'.$2.$3.'?'.$4; | |
| 825 | |
| 826 # print "1: $1\n2: $2\n3: $3\n"; | |
| 827 # print "modified pat: $pat";<STDIN>; | |
| 828 @parts = split '#', $pat; | |
| 829 push @done, $parts[1]; | |
| 830 $pat = $parts[0]; | |
| 831 # print "done: $parts[1]<---\nnew pat: $pat<---";<STDIN>; | |
| 832 last if not $pat; | |
| 833 } | |
| 834 return join('', reverse @done); | |
| 835 | |
| 836 } | |
| 837 | |
| 838 =head2 str | |
| 839 | |
| 840 Title : str | |
| 841 Usage : $obj->str($newval) | |
| 842 Function: | |
| 843 Returns : value of str | |
| 844 Args : newvalue (optional) | |
| 845 | |
| 846 | |
| 847 =cut | |
| 848 | |
| 849 sub str{ | |
| 850 my $obj = shift; | |
| 851 if( @_ ) { | |
| 852 my $value = shift; | |
| 853 $obj->{'str'} = $value; | |
| 854 } | |
| 855 return $obj->{'str'}; | |
| 856 | |
| 857 } | |
| 858 | |
| 859 =head2 type | |
| 860 | |
| 861 Title : type | |
| 862 Usage : $obj->type($newval) | |
| 863 Function: | |
| 864 Returns : value of type | |
| 865 Args : newvalue (optional) | |
| 866 | |
| 867 | |
| 868 =cut | |
| 869 | |
| 870 sub type{ | |
| 871 my $obj = shift; | |
| 872 if( @_ ) { | |
| 873 my $value = shift; | |
| 874 $obj->{'type'} = $value; | |
| 875 } | |
| 876 return $obj->{'type'}; | |
| 877 | |
| 878 } | |
| 879 | |
| 880 1; | |
| 881 | |
| 882 __END__ | |
| 883 | |
| 884 ######################################################################### | |
| 885 # End of class | |
| 886 ######################################################################### | |
| 887 | |
| 888 =head1 FOR DEVELOPERS ONLY | |
| 889 | |
| 890 =head2 Data Members | |
| 891 | |
| 892 Information about the various data members of this module is provided | |
| 893 for those wishing to modify or understand the code. Two things to bear | |
| 894 in mind: | |
| 895 | |
| 896 =over 2 | |
| 897 | |
| 898 =item 1 Do NOT rely on these in any code outside of this module. | |
| 899 | |
| 900 All data members are prefixed with an underscore to signify that they | |
| 901 are private. Always use accessor methods. If the accessor doesn't | |
| 902 exist or is inadequate, create or modify an accessor (and let me know, | |
| 903 too!). | |
| 904 | |
| 905 =item 2 This documentation may be incomplete and out of date. | |
| 906 | |
| 907 It is easy for this documentation to become obsolete as this module is | |
| 908 still evolving. Always double check this info and search for members | |
| 909 not described here. | |
| 910 | |
| 911 =back | |
| 912 | |
| 913 An instance of Bio::Tools::RestrictionEnzyme.pm is a blessed reference | |
| 914 to a hash containing all or some of the following fields: | |
| 915 | |
| 916 FIELD VALUE | |
| 917 ------------------------------------------------------------------------ | |
| 918 _rev : The corrected reverse complement of the fully expanded pattern. | |
| 919 | |
| 920 INHERITED DATA MEMBERS: | |
| 921 | |
| 922 _seq : (From Bio::Seq.pm) The original, unexpanded input sequence after untainting. | |
| 923 _type : (From Bio::Seq.pm) 'Dna' or 'Amino' | |
| 924 | |
| 925 | |
| 926 =cut |
