comparison docs/modules/txt/MACCSKeys.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:4816e4a8ae95
1 NAME
2 MACCSKeys
3
4 SYNOPSIS
5 use Fingerprints::MACCSKeys;
6
7 use Fingerprints::MACCSKeys qw(:all);
8
9 DESCRIPTION
10 MACCSKeys [ Ref 45-47 ] class provides the following methods:
11
12 new, GenerateFingerprints, GenerateMACCSKeys, GetDescription, SetSize,
13 SetType, StringifyMACCSKeys
14
15 MACCSKeys is derived from Fingerprints class which in turn is derived
16 from ObjectProperty base class that provides methods not explicitly
17 defined in MACCSKeys, Fingerprints or ObjectProperty classes using
18 Perl's AUTOLOAD functionality. These methods are generated on-the-fly
19 for a specified object property:
20
21 Set<PropertyName>(<PropertyValue>);
22 $PropertyValue = Get<PropertyName>();
23 Delete<PropertyName>();
24
25 For each MACCS (Molecular ACCess System) keys definition, atoms are
26 processed to determine their membership to the key and the appropriate
27 molecular fingerprints strings are generated. An atom can belong to
28 multiple MACCS keys.
29
30 For *MACCSKeyBits* value of Type option, a fingerprint bit-vector string
31 containing zeros and ones is generated and for *MACCSKeyCount* value, a
32 fingerprint vector string corresponding to number of MACCS keys [ Ref
33 45-47 ] is generated.
34
35 *MACCSKeyBits or MACCSKeyCount* values for Type along with two possible
36 *166 | 322* values of Size supports generation of four different types
37 of MACCS keys fingerprint: *MACCS166KeyBits, MACCS166KeyCount,
38 MACCS322KeyBits, MACCS322KeyCount*.
39
40 The current release of MayaChemTools generates the following types of
41 MACCS keys fingerprints bit-vector and vector strings:
42
43 FingerprintsBitVector;MACCSKeyBits;166;BinaryString;Ascending;00000000
44 0000000000000000000000000000000001001000010010000000010010000000011100
45 0100101010111100011011000100110110000011011110100110111111111111011111
46 11111111111110111000
47
48 FingerprintsBitVector;MACCSKeyBits;166;HexadecimalString;Ascending;000
49 000000021210210e845f8d8c60b79dffbffffd1
50
51 FingerprintsBitVector;MACCSKeyBits;322;BinaryString;Ascending;11101011
52 1110011111100101111111000111101100110000000000000011100010000000000000
53 0000000000000000000000000000000000000000000000101000000000000000000000
54 0000000000000000000000000000000000000000000000000000000000000000000000
55 0000000000000000000000000000000000000011000000000000000000000000000000
56 0000000000000000000000000000000000000000
57
58 FingerprintsBitVector;MACCSKeyBits;322;HexadecimalString;Ascending;7d7
59 e7af3edc000c1100000000000000500000000000000000000000000000000300000000
60 000000000
61
62 FingerprintsVector;MACCSKeyCount;166;OrderedNumericalValues;ValuesStri
63 ng;0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
64 0 0 0 0 0 0 0 1 0 0 3 0 0 0 0 4 0 0 2 0 0 0 0 0 0 0 0 2 0 0 2 0 0 0 0
65 0 0 0 0 1 1 8 0 0 0 1 0 0 1 0 1 0 1 0 3 1 3 1 0 0 0 1 2 0 11 1 0 0 0
66 5 0 0 1 2 0 1 1 0 0 0 0 0 1 1 0 1 1 1 1 0 4 0 0 1 1 0 4 6 1 1 1 2 1 1
67 3 5 2 2 0 5 3 5 1 1 2 5 1 2 1 2 4 8 3 5 5 2 2 0 3 5 4 1
68
69 FingerprintsVector;MACCSKeyCount;322;OrderedNumericalValues;ValuesStri
70 ng;14 8 2 0 2 0 4 4 2 1 4 0 0 2 5 10 5 2 1 0 0 2 0 5 13 3 28 5 5 3 0 0
71 0 4 2 1 1 0 1 1 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 5 3 0 0 0 1 0
72 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
73 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 2 0 0 0 0 0 0 0 0 0
74 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
75
76 METHODS
77 new
78 $NewMACCSKeys = new MACCSKeys(%NamesAndValues);
79
80 Using specified *MACCSKeys* property names and values hash, new
81 method creates a new object and returns a reference to newly created
82 PathLengthFingerprints object. By default, the following properties
83 are initialized:
84
85 Molecule = '';
86 Type = ''
87 Size = ''
88
89 Examples:
90
91 $MACCSKeys = new MACCSKeys('Molecule' => $Molecule,
92 'Type' => 'MACCSKeyBits',
93 'Size' => 166);
94
95 $MACCSKeys = new MACCSKeys('Molecule' => $Molecule,
96 'Type' => 'MACCSKeyCount',
97 'Size' => 166);
98
99 $MACCSKeys = new MACCSKeys('Molecule' => $Molecule,
100 'Type' => 'MACCSKeyBit',
101 'Size' => 322);
102
103 $MACCSKeys = new MACCSKeys('Molecule' => $Molecule,
104 'Type' => 'MACCSKeyCount',
105 'Size' => 322);
106
107 $MACCSKeys->GenerateMACCSKeys();
108 print "$MACCSKeys\n";
109
110 GetDescription
111 $Description = $MACCSKeys->GetDescription();
112
113 Returns a string containing description of MACCS keys fingerprints.
114
115 GenerateMACCSKeys or GenerateFingerprints
116 $MACCSKeys = $MACCSKeys->GenerateMACCSKeys();
117
118 Generates MACCS keys fingerprints and returns *MACCSKeys*.
119
120 For *MACCSKeyBits* value of Type, a fingerprint bit-vector string
121 containing zeros and ones is generated and for *MACCSKeyCount*
122 value, a fingerprint vector string corresponding to number of MACCS
123 keys is generated.
124
125 *MACCSKeyBits or MACCSKeyCount* values for Type option along with
126 two possible *166 | 322* values of Size supports generation of four
127 different types of MACCS keys fingerprint: *MACCS166KeyBits,
128 MACCS166KeyCount, MACCS322KeyBits, MACCS322KeyCount*.
129
130 Definition of MACCS keys uses the following atom and bond symbols to
131 define atom and bond environments:
132
133 Atom symbols for 166 keys [ Ref 47 ]:
134
135 A : Any valid periodic table element symbol
136 Q : Hetro atoms; any non-C or non-H atom
137 X : Halogens; F, Cl, Br, I
138 Z : Others; other than H, C, N, O, Si, P, S, F, Cl, Br, I
139
140 Atom symbols for 322 keys [ Ref 46 ]:
141
142 A : Any valid periodic table element symbol
143 Q : Hetro atoms; any non-C or non-H atom
144 X : Others; other than H, C, N, O, Si, P, S, F, Cl, Br, I
145 Z is neither defined nor used
146
147 Bond types:
148
149 - : Single
150 = : Double
151 T : Triple
152 # : Triple
153 ~ : Single or double query bond
154 % : An aromatic query bond
155
156 None : Any bond type; no explicit bond specified
157
158 $ : Ring bond; $ before a bond type specifies ring bond
159 ! : Chain or non-ring bond; ! before a bond type specifies chain bond
160
161 @ : A ring linkage and the number following it specifies the
162 atoms position in the line, thus @1 means linked back to the first
163 atom in the list.
164
165 Aromatic: Kekule or Arom5
166
167 Kekule: Bonds in 6-membered rings with alternate single/double bonds
168 or perimeter bonds
169 Arom5: Bonds in 5-membered rings with two double bonds and a hetro
170 atom at the apex of the ring.
171
172 MACCS 166 keys [ Ref 45-47 ] are defined as follows:
173
174 Key Description
175
176 1 ISOTOPE
177 2 103 < ATOMIC NO. < 256
178 3 GROUP IVA,VA,VIA PERIODS 4-6 (Ge...)
179 4 ACTINIDE
180 5 GROUP IIIB,IVB (Sc...)
181 6 LANTHANIDE
182 7 GROUP VB,VIB,VIIB (V...)
183 8 QAAA@1
184 9 GROUP VIII (Fe...)
185 10 GROUP IIA (ALKALINE EARTH)
186 11 4M RING
187 12 GROUP IB,IIB (Cu...)
188 13 ON(C)C
189 14 S-S
190 15 OC(O)O
191 16 QAA@1
192 17 CTC
193 18 GROUP IIIA (B...)
194 19 7M RING
195 20 SI
196 21 C=C(Q)Q
197 22 3M RING
198 23 NC(O)O
199 24 N-O
200 25 NC(N)N
201 26 C$=C($A)$A
202 27 I
203 28 QCH2Q
204 29 P
205 30 CQ(C)(C)A
206 31 QX
207 32 CSN
208 33 NS
209 34 CH2=A
210 35 GROUP IA (ALKALI METAL)
211 36 S HETEROCYCLE
212 37 NC(O)N
213 38 NC(C)N
214 39 OS(O)O
215 40 S-O
216 41 CTN
217 42 F
218 43 QHAQH
219 44 OTHER
220 45 C=CN
221 46 BR
222 47 SAN
223 48 OQ(O)O
224 49 CHARGE
225 50 C=C(C)C
226 51 CSO
227 52 NN
228 53 QHAAAQH
229 54 QHAAQH
230 55 OSO
231 56 ON(O)C
232 57 O HETEROCYCLE
233 58 QSQ
234 59 Snot%A%A
235 60 S=O
236 61 AS(A)A
237 62 A$A!A$A
238 63 N=O
239 64 A$A!S
240 65 C%N
241 66 CC(C)(C)A
242 67 QS
243 68 QHQH (&...)
244 69 QQH
245 70 QNQ
246 71 NO
247 72 OAAO
248 73 S=A
249 74 CH3ACH3
250 75 A!N$A
251 76 C=C(A)A
252 77 NAN
253 78 C=N
254 79 NAAN
255 80 NAAAN
256 81 SA(A)A
257 82 ACH2QH
258 83 QAAAA@1
259 84 NH2
260 85 CN(C)C
261 86 CH2QCH2
262 87 X!A$A
263 88 S
264 89 OAAAO
265 90 QHAACH2A
266 91 QHAAACH2A
267 92 OC(N)C
268 93 QCH3
269 94 QN
270 95 NAAO
271 96 5M RING
272 97 NAAAO
273 98 QAAAAA@1
274 99 C=C
275 100 ACH2N
276 101 8M RING
277 102 QO
278 103 CL
279 104 QHACH2A
280 105 A$A($A)$A
281 106 QA(Q)Q
282 107 XA(A)A
283 108 CH3AAACH2A
284 109 ACH2O
285 110 NCO
286 111 NACH2A
287 112 AA(A)(A)A
288 113 Onot%A%A
289 114 CH3CH2A
290 115 CH3ACH2A
291 116 CH3AACH2A
292 117 NAO
293 118 ACH2CH2A > 1
294 119 N=A
295 120 HETEROCYCLIC ATOM > 1 (&...)
296 121 N HETEROCYCLE
297 122 AN(A)A
298 123 OCO
299 124 QQ
300 125 AROMATIC RING > 1
301 126 A!O!A
302 127 A$A!O > 1 (&...)
303 128 ACH2AAACH2A
304 129 ACH2AACH2A
305 130 QQ > 1 (&...)
306 131 QH > 1
307 132 OACH2A
308 133 A$A!N
309 134 X (HALOGEN)
310 135 Nnot%A%A
311 136 O=A > 1
312 137 HETEROCYCLE
313 138 QCH2A > 1 (&...)
314 139 OH
315 140 O > 3 (&...)
316 141 CH3 > 2 (&...)
317 142 N > 1
318 143 A$A!O
319 144 Anot%A%Anot%A
320 145 6M RING > 1
321 146 O > 2
322 147 ACH2CH2A
323 148 AQ(A)A
324 149 CH3 > 1
325 150 A!A$A!A
326 151 NH
327 152 OC(C)C
328 153 QCH2A
329 154 C=O
330 155 A!CH2!A
331 156 NA(A)A
332 157 C-O
333 158 C-N
334 159 O > 1
335 160 CH3
336 161 N
337 162 AROMATIC
338 163 6M RING
339 164 O
340 165 RING
341 166 FRAGMENTS
342
343 MACCS 322 keys set as defined in tables 1, 2 and 3 [ Ref 46 ]
344 include:
345
346 o 26 atom properties of type P, as listed in Table 1
347 o 32 one-atom environments, as listed in Table 3
348 o 264 atom-bond-atom combinations listed in Table 4
349
350 Total number of keys in three tables is : 322
351
352 Atom symbol, X, used for 322 keys [ Ref 46 ] doesn't refer to
353 Halogens as it does for 166 keys. In order to keep the definition of
354 322 keys consistent with the published definitions, the symbol X is
355 used to imply "others" atoms, but it's internally mapped to symbol X
356 as defined for 166 keys during the generation of key values.
357
358 Atom properties-based keys (26):
359
360 Key Description
361 1 A(AAA) or AA(A)A - atom with at least three neighbors
362 2 Q - heteroatom
363 3 Anot%not-A - atom involved in one or more multiple bonds, not aromatic
364 4 A(AAAA) or AA(A)(A)A - atom with at least four neighbors
365 5 A(QQ) or QA(Q) - atom with at least two heteroatom neighbors
366 6 A(QQQ) or QA(Q)Q - atom with at least three heteroatom neighbors
367 7 QH - heteroatom with at least one hydrogen attached
368 8 CH2(AA) or ACH2A - carbon with at least two single bonds and at least
369 two hydrogens attached
370 9 CH3(A) or ACH3 - carbon with at least one single bond and at least three
371 hydrogens attached
372 10 Halogen
373 11 A(-A-A-A) or A-A(-A)-A - atom has at least three single bonds
374 12 AAAAAA@1 > 2 - atom is in at least two different six-membered rings
375 13 A($A$A$A) or A$A($A)$A - atom has more than two ring bonds
376 14 A$A!A$A - atom is at a ring/chain boundary. When a comparison is done
377 with another atom the path passes through the chain bond.
378 15 Anot%A%Anot%A - atom is at an aromatic/nonaromatic boundary. When a
379 comparison is done with another atom the path
380 passes through the aromatic bond.
381 16 A!A!A - atom with more than one chain bond
382 17 A!A$A!A - atom is at a ring/chain boundary. When a comparison is done
383 with another atom the path passes through the ring bond.
384 18 A%Anot%A%A - atom is at an aromatic/nonaromatic boundary. When a
385 comparison is done with another atom the
386 path passes through the nonaromatic bond.
387 19 HETEROCYCLE - atom is a heteroatom in a ring.
388 20 rare properties: atom with five or more neighbors, atom in
389 four or more rings, or atom types other than
390 H, C, N, O, S, F, Cl, Br, or I
391 21 rare properties: atom has a charge, is an isotope, has two or
392 more multiple bonds, or has a triple bond.
393 22 N - nitrogen
394 23 S - sulfur
395 24 O - oxygen
396 25 A(AA)A(A)A(AA) - atom has two neighbors, each with three or
397 more neighbors (including the central atom).
398 26 CHACH2 - atom has two hydrocarbon (CH2) neighbors
399
400 Atomic environments properties-based keys (32):
401
402 Key Description
403 27 C(CC)
404 28 C(CCC)
405 29 C(CN)
406 30 C(CCN)
407 31 C(NN)
408 32 C(NNC)
409 33 C(NNN)
410 34 C(CO)
411 35 C(CCO)
412 36 C(NO)
413 37 C(NCO)
414 38 C(NNO)
415 39 C(OO)
416 40 C(COO)
417 41 C(NOO)
418 42 C(OOO)
419 43 Q(CC)
420 44 Q(CCC)
421 45 Q(CN)
422 46 Q(CCN)
423 47 Q(NN)
424 48 Q(CNN)
425 49 Q(NNN)
426 50 Q(CO)
427 51 Q(CCO)
428 52 Q(NO)
429 53 Q(CNO)
430 54 Q(NNO)
431 55 Q(OO)
432 56 Q(COO)
433 57 Q(NOO)
434 58 Q(OOO)
435
436 Note: The first symbol is the central atom, with atoms bonded to the
437 central atom listed in parentheses. Q is any non-C, non-H atom. If
438 only two atoms are in parentheses, there is no implication
439 concerning the other atoms bonded to the central atom.
440
441 Atom-Bond-Atom properties-based keys: (264)
442
443 Key Description
444 59 C-C
445 60 C-N
446 61 C-O
447 62 C-S
448 63 C-Cl
449 64 C-P
450 65 C-F
451 66 C-Br
452 67 C-Si
453 68 C-I
454 69 C-X
455 70 N-N
456 71 N-O
457 72 N-S
458 73 N-Cl
459 74 N-P
460 75 N-F
461 76 N-Br
462 77 N-Si
463 78 N-I
464 79 N-X
465 80 O-O
466 81 O-S
467 82 O-Cl
468 83 O-P
469 84 O-F
470 85 O-Br
471 86 O-Si
472 87 O-I
473 88 O-X
474 89 S-S
475 90 S-Cl
476 91 S-P
477 92 S-F
478 93 S-Br
479 94 S-Si
480 95 S-I
481 96 S-X
482 97 Cl-Cl
483 98 Cl-P
484 99 Cl-F
485 100 Cl-Br
486 101 Cl-Si
487 102 Cl-I
488 103 Cl-X
489 104 P-P
490 105 P-F
491 106 P-Br
492 107 P-Si
493 108 P-I
494 109 P-X
495 110 F-F
496 111 F-Br
497 112 F-Si
498 113 F-I
499 114 F-X
500 115 Br-Br
501 116 Br-Si
502 117 Br-I
503 118 Br-X
504 119 Si-Si
505 120 Si-I
506 121 Si-X
507 122 I-I
508 123 I-X
509 124 X-X
510 125 C=C
511 126 C=N
512 127 C=O
513 128 C=S
514 129 C=Cl
515 130 C=P
516 131 C=F
517 132 C=Br
518 133 C=Si
519 134 C=I
520 135 C=X
521 136 N=N
522 137 N=O
523 138 N=S
524 139 N=Cl
525 140 N=P
526 141 N=F
527 142 N=Br
528 143 N=Si
529 144 N=I
530 145 N=X
531 146 O=O
532 147 O=S
533 148 O=Cl
534 149 O=P
535 150 O=F
536 151 O=Br
537 152 O=Si
538 153 O=I
539 154 O=X
540 155 S=S
541 156 S=Cl
542 157 S=P
543 158 S=F
544 159 S=Br
545 160 S=Si
546 161 S=I
547 162 S=X
548 163 Cl=Cl
549 164 Cl=P
550 165 Cl=F
551 166 Cl=Br
552 167 Cl=Si
553 168 Cl=I
554 169 Cl=X
555 170 P=P
556 171 P=F
557 172 P=Br
558 173 P=Si
559 174 P=I
560 175 P=X
561 176 F=F
562 177 F=Br
563 178 F=Si
564 179 F=I
565 180 F=X
566 181 Br=Br
567 182 Br=Si
568 183 Br=I
569 184 Br=X
570 185 Si=Si
571 186 Si=I
572 187 Si=X
573 188 I=I
574 189 I=X
575 190 X=X
576 191 C#C
577 192 C#N
578 193 C#O
579 194 C#S
580 195 C#Cl
581 196 C#P
582 197 C#F
583 198 C#Br
584 199 C#Si
585 200 C#I
586 201 C#X
587 202 N#N
588 203 N#O
589 204 N#S
590 205 N#Cl
591 206 N#P
592 207 N#F
593 208 N#Br
594 209 N#Si
595 210 N#I
596 211 N#X
597 212 O#O
598 213 O#S
599 214 O#Cl
600 215 O#P
601 216 O#F
602 217 O#Br
603 218 O#Si
604 219 O#I
605 220 O#X
606 221 S#S
607 222 S#Cl
608 223 S#P
609 224 S#F
610 225 S#Br
611 226 S#Si
612 227 S#I
613 228 S#X
614 229 Cl#Cl
615 230 Cl#P
616 231 Cl#F
617 232 Cl#Br
618 233 Cl#Si
619 234 Cl#I
620 235 Cl#X
621 236 P#P
622 237 P#F
623 238 P#Br
624 239 P#Si
625 240 P#I
626 241 P#X
627 242 F#F
628 243 F#Br
629 244 F#Si
630 245 F#I
631 246 F#X
632 247 Br#Br
633 248 Br#Si
634 249 Br#I
635 250 Br#X
636 251 Si#Si
637 252 Si#I
638 253 Si#X
639 254 I#I
640 255 I#X
641 256 X#X
642 257 C$C
643 258 C$N
644 259 C$O
645 260 C$S
646 261 C$Cl
647 262 C$P
648 263 C$F
649 264 C$Br
650 265 C$Si
651 266 C$I
652 267 C$X
653 268 N$N
654 269 N$O
655 270 N$S
656 271 N$Cl
657 272 N$P
658 273 N$F
659 274 N$Br
660 275 N$Si
661 276 N$I
662 277 N$X
663 278 O$O
664 279 O$S
665 280 O$Cl
666 281 O$P
667 282 O$F
668 283 O$Br
669 284 O$Si
670 285 O$I
671 286 O$X
672 287 S$S
673 288 S$Cl
674 289 S$P
675 290 S$F
676 291 S$Br
677 292 S$Si
678 293 S$I
679 294 S$X
680 295 Cl$Cl
681 296 Cl$P
682 297 Cl$F
683 298 Cl$Br
684 299 Cl$Si
685 300 Cl$I
686 301 Cl$X
687 302 P$P
688 303 P$F
689 304 P$Br
690 305 P$Si
691 306 P$I
692 307 P$X
693 308 F$F
694 309 F$Br
695 310 F$Si
696 311 F$I
697 312 F$X
698 313 Br$Br
699 314 Br$Si
700 315 Br$I
701 316 Br$X
702 317 Si$Si
703 318 Si$I
704 319 Si$X
705 320 I$I
706 321 I$X
707 322 X$X
708
709 SetSize
710 $MACCSKeys->SetSize($Size);
711
712 Sets size of MACCS keys and returns *MACCSKeys*. Possible values:
713 *166 or 322*.
714
715 SetType
716 $MACCSKeys->SetType($Type);
717
718 Sets type of MACCS keys and returns *MACCSKeys*. Possible values:
719 *MACCSKeysBits or MACCSKeysCount*.
720
721 StringifyMACCSKeys
722 $String = $MACCSKeys->StringifyMACCSKeys();
723
724 Returns a string containing information about *MACCSKeys* object.
725
726 AUTHOR
727 Manish Sud <msud@san.rr.com>
728
729 SEE ALSO
730 Fingerprints.pm, FingerprintsStringUtil.pm,
731 AtomNeighborhoodsFingerprints.pm, AtomTypesFingerprints.pm,
732 EStateIndiciesFingerprints.pm, ExtendedConnectivityFingerprints.pm,
733 PathLengthFingerprints.pm, TopologicalAtomPairsFingerprints.pm,
734 TopologicalAtomTripletsFingerprints.pm,
735 TopologicalAtomTorsionsFingerprints.pm,
736 TopologicalPharmacophoreAtomPairsFingerprints.pm,
737 TopologicalPharmacophoreAtomTripletsFingerprints.pm
738
739 COPYRIGHT
740 Copyright (C) 2015 Manish Sud. All rights reserved.
741
742 This file is part of MayaChemTools.
743
744 MayaChemTools is free software; you can redistribute it and/or modify it
745 under the terms of the GNU Lesser General Public License as published by
746 the Free Software Foundation; either version 3 of the License, or (at
747 your option) any later version.
748