0
|
1 NAME
|
|
2 MolecularComplexityDescriptors
|
|
3
|
|
4 SYNOPSIS
|
|
5 use MolecularDescriptors::MolecularComplexityDescriptors;
|
|
6
|
|
7 use MolecularDescriptors::MolecularComplexityDescriptors qw(:all);
|
|
8
|
|
9 DESCRIPTION
|
|
10 MolecularComplexityDescriptors class provides the following methods:
|
|
11
|
|
12 new, GenerateDescriptors, GetDescriptorNames,
|
|
13 GetMolecularComplexityTypeAbbreviation, MACCSKeysSize,
|
|
14 SetAtomIdentifierType, SetAtomicInvariantsToUse, SetDistanceBinSize,
|
|
15 SetFunctionalClassesToUse, SetMaxDistance, SetMaxPathLength,
|
|
16 SetMinDistance, SetMinPathLength, SetMolecularComplexityType,
|
|
17 SetNeighborhoodRadius, SetNormalizationMethodology,
|
|
18 StringifyMolecularComplexityDescriptors
|
|
19
|
|
20 MolecularComplexityDescriptors is derived from MolecularDescriptors
|
|
21 class which in turn is derived from ObjectProperty base class that
|
|
22 provides methods not explicitly defined in
|
|
23 MolecularComplexityDescriptors, MolecularDescriptors or ObjectProperty
|
|
24 classes using Perl's AUTOLOAD functionality. These methods are generated
|
|
25 on-the-fly for a specified object property:
|
|
26
|
|
27 Set<PropertyName>(<PropertyValue>);
|
|
28 $PropertyValue = Get<PropertyName>();
|
|
29 Delete<PropertyName>();
|
|
30
|
|
31 The current release of MayaChemTools supports calculation of molecular
|
|
32 complexity using *MolecularComplexityType* parameter corresponding to
|
|
33 number of bits-set or unique keys [ Ref 117-119 ] in molecular
|
|
34 fingerprints. The valid values for *MolecularComplexityType* are:
|
|
35
|
|
36 AtomTypesFingerprints
|
|
37 ExtendedConnectivityFingerprints
|
|
38 MACCSKeys
|
|
39 PathLengthFingerprints
|
|
40 TopologicalAtomPairsFingerprints
|
|
41 TopologicalAtomTripletsFingerprints
|
|
42 TopologicalAtomTorsionsFingerprints
|
|
43 TopologicalPharmacophoreAtomPairsFingerprints
|
|
44 TopologicalPharmacophoreAtomTripletsFingerprints
|
|
45
|
|
46 Default value for *MolecularComplexityType*: *MACCSKeys*.
|
|
47
|
|
48 *AtomIdentifierType* parameter name corresponds to atom types used
|
|
49 during generation of fingerprints. The valid values for
|
|
50 *AtomIdentifierType* are: *AtomicInvariantsAtomTypes, DREIDINGAtomTypes,
|
|
51 EStateAtomTypes, FunctionalClassAtomTypes, MMFF94AtomTypes,
|
|
52 SLogPAtomTypes, SYBYLAtomTypes, TPSAAtomTypes, UFFAtomTypes*.
|
|
53 *AtomicInvariantsAtomTypes* is not supported for following values of
|
|
54 *MolecularComplexityType*: *MACCSKeys,
|
|
55 TopologicalPharmacophoreAtomPairsFingerprints,
|
|
56 TopologicalPharmacophoreAtomTripletsFingerprints*.
|
|
57 *FunctionalClassAtomTypes* is the only valid value of
|
|
58 *AtomIdentifierType* for topological pharmacophore fingerprints.
|
|
59
|
|
60 Default value for *AtomIdentifierType*: *AtomicInvariantsAtomTypes* for
|
|
61 all fingerprints; *FunctionalClassAtomTypes* for topological
|
|
62 pharmacophore fingerprints.
|
|
63
|
|
64 *AtomicInvariantsToUse* parameter name and values are used during
|
|
65 *AtomicInvariantsAtomTypes* value of parameter *AtomIdentifierType*.
|
|
66 It's a list of space separated valid atomic invariant atom types.
|
|
67
|
|
68 Possible values for atomic invariants are: *AS, X, BO, LBO, SB, DB, TB,
|
|
69 H, Ar, RA, FC, MN, SM*. Default value for *AtomicInvariantsToUse*
|
|
70 parameter are set differently for different fingerprints using
|
|
71 *MolecularComplexityType* parameter as shown below:
|
|
72
|
|
73 MolecularComplexityType AtomicInvariantsToUse
|
|
74
|
|
75 AtomTypesFingerprints AS X BO H FC
|
|
76 TopologicalAtomPairsFingerprints AS X BO H FC
|
|
77 TopologicalAtomTripletsFingerprints AS X BO H FC
|
|
78 TopologicalAtomTorsionsFingerprints AS X BO H FC
|
|
79
|
|
80 ExtendedConnectivityFingerprints AS X BO H FC MN
|
|
81 PathLengthFingerprints AS
|
|
82
|
|
83 *FunctionalClassesToUse* parameter name and values are used during
|
|
84 *FunctionalClassAtomTypes* value of parameter *AtomIdentifierType*. It's
|
|
85 a list of space separated valid atomic invariant atom types.
|
|
86
|
|
87 Possible values for atom functional classes are: *Ar, CA, H, HBA, HBD,
|
|
88 Hal, NI, PI, RA*.
|
|
89
|
|
90 Default value for *FunctionalClassesToUse* parameter is set to:
|
|
91
|
|
92 HBD HBA PI NI Ar Hal
|
|
93
|
|
94 for all fingerprints except for the following two
|
|
95 *MolecularComplexityType* fingerints:
|
|
96
|
|
97 MolecularComplexityType FunctionalClassesToUse
|
|
98
|
|
99 TopologicalPharmacophoreAtomPairsFingerprints HBD HBA P, NI H
|
|
100 TopologicalPharmacophoreAtomTripletsFingerprints HBD HBA PI NI H Ar
|
|
101
|
|
102 *MACCSKeysSize* parameter name is only used during *MACCSKeys* value of
|
|
103 *MolecularComplexityType* and corresponds to size of MACCS key set.
|
|
104 Possible values: *166 or 322*. Default value: *166*.
|
|
105
|
|
106 *NeighborhoodRadius* parameter name is only used during
|
|
107 *ExtendedConnectivityFingerprints* value of *MolecularComplexityType*
|
|
108 and corresponds to atomic neighborhoods radius for generating extended
|
|
109 connectivity fingerprints. Possible values: positive integer. Default
|
|
110 value: *2*.
|
|
111
|
|
112 *MinPathLength* and *MaxPathLength* parameters are only used during
|
|
113 *PathLengthFingerprints* value of *MolecularComplexityType* and
|
|
114 correspond to minimum and maximum path lengths to use for generating
|
|
115 path length fingerprints. Possible values: positive integers. Default
|
|
116 value: *MinPathLength - 1*; *MaxPathLength - 8*.
|
|
117
|
|
118 *UseBondSymbols* parameter is only used during *PathLengthFingerprints*
|
|
119 value of *MolecularComplexityType* and indicates whether bond symbols
|
|
120 are included in atom path strings used to generate path length
|
|
121 fingerprints. Possible value: *Yes or No*. Default value: *Yes*.
|
|
122
|
|
123 *MinDistance* and *MaxDistance* parameters are only used during
|
|
124 *TopologicalAtomPairsFingerprints* and
|
|
125 *TopologicalAtomTripletsFingerprints* values of
|
|
126 *MolecularComplexityType* and correspond to minimum and maximum bond
|
|
127 distance between atom pairs during topological pharmacophore
|
|
128 fingerprints. Possible values: positive integers. Default value:
|
|
129 *MinDistance - 1*; *MaxDistance - 10*.
|
|
130
|
|
131 *UseTriangleInequality* parameter is used during these values for
|
|
132 *MolecularComplexityType*: *TopologicalAtomTripletsFingerprints* and
|
|
133 *TopologicalPharmacophoreAtomTripletsFingerprints*. Possible values:
|
|
134 *Yes or No*. It determines wheter to apply triangle inequality to
|
|
135 distance triplets. Default value: *TopologicalAtomTripletsFingerprints -
|
|
136 No*; *TopologicalPharmacophoreAtomTripletsFingerprints - Yes*.
|
|
137
|
|
138 *DistanceBinSize* parameter is used during
|
|
139 *TopologicalPharmacophoreAtomTripletsFingerprints* value of
|
|
140 *MolecularComplexityType* and corresponds to distance bin size used for
|
|
141 binning distances during generation of topological pharmacophore atom
|
|
142 triplets fingerprints. Possible value: positive integer. Default value:
|
|
143 *2*.
|
|
144
|
|
145 *NormalizationMethodology* is only used for these values for
|
|
146 *MolecularComplexityType*: *ExtendedConnectivityFingerprints*,
|
|
147 *TopologicalPharmacophoreAtomPairsFingerprints* and
|
|
148 *TopologicalPharmacophoreAtomTripletsFingerprints*. It corresponds to
|
|
149 normalization methodology to use for scaling the number of bits-set or
|
|
150 unique keys during generation of fingerprints. Possible values during
|
|
151 *ExtendedConnectivityFingerprints*: *None or ByHeavyAtomsCount*; Default
|
|
152 value: *None*. Possible values during topological pharmacophore atom
|
|
153 pairs and triplets fingerprints: *None or ByPossibleKeysCount*; Default
|
|
154 value: *None*. *ByPossibleKeysCount* corresponds to total number of
|
|
155 possible topological pharmacophore atom pairs or triplets in a molecule.
|
|
156
|
|
157 METHODS
|
|
158 new
|
|
159 $NewMolecularComplexityDescriptors = new MolecularDescriptors::
|
|
160 MolecularComplexityDescriptors(
|
|
161 %NamesAndValues);
|
|
162
|
|
163 Using specified *MolecularComplexityDescriptors* property names and
|
|
164 values hash, new method creates a new object and returns a reference
|
|
165 to newly created MolecularComplexityDescriptors object. By default,
|
|
166 the following properties are initialized:
|
|
167
|
|
168 Molecule = ''
|
|
169 Type = 'MolecularComplexity'
|
|
170 MolecularComplexityType = 'MACCSKeys'
|
|
171 AtomIdentifierType = ''
|
|
172 MACCSKeysSize = 166
|
|
173 NeighborhoodRadius = 2
|
|
174 MinPathLength = 1
|
|
175 MaxPathLength = 8
|
|
176 UseBondSymbols = 1
|
|
177 MinDistance = 1
|
|
178 MaxDistance = 10
|
|
179 UseTriangleInequality = ''
|
|
180 DistanceBinSize = 2
|
|
181 NormalizationMethodology = 'None'
|
|
182 @DescriptorNames = ('MolecularComplexity')
|
|
183 @DescriptorValues = ('None')
|
|
184
|
|
185 Examples:
|
|
186
|
|
187 $MolecularComplexityDescriptors = new MolecularDescriptors::
|
|
188 MolecularComplexityDescriptors(
|
|
189 'Molecule' => $Molecule);
|
|
190
|
|
191 $MolecularComplexityDescriptors = new MolecularDescriptors::
|
|
192 MolecularComplexityDescriptors();
|
|
193
|
|
194 $MolecularComplexityDescriptors->SetMolecule($Molecule);
|
|
195 $MolecularComplexityDescriptors->GenerateDescriptors();
|
|
196 print "MolecularComplexityDescriptors: $MolecularComplexityDescriptors\n";
|
|
197
|
|
198 GenerateDescriptors
|
|
199 $MolecularComplexityDescriptors->GenerateDescriptors();
|
|
200
|
|
201 Calculates MolecularComplexity value for a molecule and returns
|
|
202 *MolecularComplexityDescriptors*.
|
|
203
|
|
204 GetDescriptorNames
|
|
205 @DescriptorNames = $MolecularComplexityDescriptors->GetDescriptorNames();
|
|
206 @DescriptorNames = MolecularDescriptors::MolecularComplexityDescriptors::
|
|
207 GetDescriptorNames();
|
|
208
|
|
209 Returns all available descriptor names as an array.
|
|
210
|
|
211 GetMolecularComplexityTypeAbbreviation
|
|
212 $Abbrev = $MolecularComplexityDescriptors->
|
|
213 GetMolecularComplexityTypeAbbreviation();
|
|
214 $Abbrev = MolecularDescriptors::MolecularComplexityDescriptors::
|
|
215 GetMolecularComplexityTypeAbbreviation($ComplexityType);
|
|
216
|
|
217 Returns abbreviation for a specified molecular complexity type or
|
|
218 corresponding to *MolecularComplexityDescriptors* object.
|
|
219
|
|
220 SetMACCSKeysSize
|
|
221 $MolecularComplexityDescriptors->MACCSKeysSize($Size);
|
|
222
|
|
223 Sets MACCS keys size and returns *MolecularComplexityDescriptors*.
|
|
224
|
|
225 SetAtomIdentifierType
|
|
226 $MolecularComplexityDescriptors->SetAtomIdentifierType($IdentifierType);
|
|
227
|
|
228 Sets atom *IdentifierType* to use during fingerprints generation
|
|
229 corresponding to *MolecularComplexityType* and returns
|
|
230 *MolecularComplexityDescriptors*.
|
|
231
|
|
232 Possible values: *AtomicInvariantsAtomTypes, DREIDINGAtomTypes,
|
|
233 EStateAtomTypes, FunctionalClassAtomTypes, MMFF94AtomTypes,
|
|
234 SLogPAtomTypes, SYBYLAtomTypes, TPSAAtomTypes, UFFAtomTypes*.
|
|
235
|
|
236 SetAtomicInvariantsToUse
|
|
237 $MolecularComplexityDescriptors->SetAtomicInvariantsToUse($ValuesRef);
|
|
238 $MolecularComplexityDescriptors->SetAtomicInvariantsToUse(@Values);
|
|
239
|
|
240 Sets atomic invariants to use during *AtomicInvariantsAtomTypes*
|
|
241 value of *AtomIdentifierType* for fingerprints generation and
|
|
242 returns *MolecularComplexityDescriptors*.
|
|
243
|
|
244 Possible values for atomic invariants are: *AS, X, BO, LBO, SB, DB,
|
|
245 TB, H, Ar, RA, FC, MN, SM*. Default value [ Ref 24 ]:
|
|
246 *AS,X,BO,H,FC,MN*.
|
|
247
|
|
248 The atomic invariants abbreviations correspond to:
|
|
249
|
|
250 AS = Atom symbol corresponding to element symbol
|
|
251
|
|
252 X<n> = Number of non-hydrogen atom neighbors or heavy atoms
|
|
253 BO<n> = Sum of bond orders to non-hydrogen atom neighbors or heavy atoms
|
|
254 LBO<n> = Largest bond order of non-hydrogen atom neighbors or heavy atoms
|
|
255 SB<n> = Number of single bonds to non-hydrogen atom neighbors or heavy atoms
|
|
256 DB<n> = Number of double bonds to non-hydrogen atom neighbors or heavy atoms
|
|
257 TB<n> = Number of triple bonds to non-hydrogen atom neighbors or heavy atoms
|
|
258 H<n> = Number of implicit and explicit hydrogens for atom
|
|
259 Ar = Aromatic annotation indicating whether atom is aromatic
|
|
260 RA = Ring atom annotation indicating whether atom is a ring
|
|
261 FC<+n/-n> = Formal charge assigned to atom
|
|
262 MN<n> = Mass number indicating isotope other than most abundant isotope
|
|
263 SM<n> = Spin multiplicity of atom. Possible values: 1 (singlet), 2 (doublet) or
|
|
264 3 (triplet)
|
|
265
|
|
266 Atom type generated by AtomTypes::AtomicInvariantsAtomTypes class
|
|
267 corresponds to:
|
|
268
|
|
269 AS.X<n>.BO<n>.LBO<n>.<SB><n>.<DB><n>.<TB><n>.H<n>.Ar.RA.FC<+n/-n>.MN<n>.SM<n>
|
|
270
|
|
271 Except for AS which is a required atomic invariant in atom types,
|
|
272 all other atomic invariants are optional. Atom type specification
|
|
273 doesn't include atomic invariants with zero or undefined values.
|
|
274
|
|
275 In addition to usage of abbreviations for specifying atomic
|
|
276 invariants, the following descriptive words are also allowed:
|
|
277
|
|
278 X : NumOfNonHydrogenAtomNeighbors or NumOfHeavyAtomNeighbors
|
|
279 BO : SumOfBondOrdersToNonHydrogenAtoms or SumOfBondOrdersToHeavyAtoms
|
|
280 LBO : LargestBondOrderToNonHydrogenAtoms or LargestBondOrderToHeavyAtoms
|
|
281 SB : NumOfSingleBondsToNonHydrogenAtoms or NumOfSingleBondsToHeavyAtoms
|
|
282 DB : NumOfDoubleBondsToNonHydrogenAtoms or NumOfDoubleBondsToHeavyAtoms
|
|
283 TB : NumOfTripleBondsToNonHydrogenAtoms or NumOfTripleBondsToHeavyAtoms
|
|
284 H : NumOfImplicitAndExplicitHydrogens
|
|
285 Ar : Aromatic
|
|
286 RA : RingAtom
|
|
287 FC : FormalCharge
|
|
288 MN : MassNumber
|
|
289 SM : SpinMultiplicity
|
|
290
|
|
291 *AtomTypes::AtomicInvariantsAtomTypes* module is used to assign
|
|
292 atomic invariant atom types.
|
|
293
|
|
294 SetDistanceBinSize
|
|
295 $MolecularComplexityDescriptors->SetDistanceBinSize($BinSize);
|
|
296
|
|
297 Sets distance bin size used to bin distances between atom pairs in
|
|
298 atom triplets for topological pharmacophore atom triplets
|
|
299 fingerprints generation and returns
|
|
300 *MolecularComplexityDescriptors*.
|
|
301
|
|
302 SetFunctionalClassesToUse
|
|
303 $MolecularComplexityDescriptors->SetFunctionalClassesToUse($ValuesRef);
|
|
304 $MolecularComplexityDescriptors->SetFunctionalClassesToUse(@Values);
|
|
305
|
|
306 Sets functional classes invariants to use during
|
|
307 *FunctionalClassAtomTypes* value of *AtomIdentifierType* for
|
|
308 fingerprints generation and returns
|
|
309 *MolecularComplexityDescriptors*.
|
|
310
|
|
311 Possible values for atom functional classes are: *Ar, CA, H, HBA,
|
|
312 HBD, Hal, NI, PI, RA*. Default value [ Ref 24 ]:
|
|
313 *HBD,HBA,PI,NI,Ar,Hal*.
|
|
314
|
|
315 The functional class abbreviations correspond to:
|
|
316
|
|
317 HBD: HydrogenBondDonor
|
|
318 HBA: HydrogenBondAcceptor
|
|
319 PI : PositivelyIonizable
|
|
320 NI : NegativelyIonizable
|
|
321 Ar : Aromatic
|
|
322 Hal : Halogen
|
|
323 H : Hydrophobic
|
|
324 RA : RingAtom
|
|
325 CA : ChainAtom
|
|
326
|
|
327 Functional class atom type specification for an atom corresponds to:
|
|
328
|
|
329 Ar.CA.H.HBA.HBD.Hal.NI.PI.RA or None
|
|
330
|
|
331 *AtomTypes::FunctionalClassAtomTypes* module is used to assign
|
|
332 functional class atom types. It uses following definitions [ Ref
|
|
333 60-61, Ref 65-66 ]:
|
|
334
|
|
335 HydrogenBondDonor: NH, NH2, OH
|
|
336 HydrogenBondAcceptor: N[!H], O
|
|
337 PositivelyIonizable: +, NH2
|
|
338 NegativelyIonizable: -, C(=O)OH, S(=O)OH, P(=O)OH
|
|
339
|
|
340 SetMaxDistance
|
|
341 $MolecularComplexityDescriptors->SetMaxDistance($MaxDistance);
|
|
342
|
|
343 Sets maximum distance to use during topological atom pairs and
|
|
344 triplets fingerprints generation and returns
|
|
345 *MolecularComplexityDescriptors*.
|
|
346
|
|
347 SetMaxPathLength
|
|
348 $MolecularComplexityDescriptors->SetMaxPathLength($Length);
|
|
349
|
|
350 Sets maximum path length to use during path length fingerprints
|
|
351 generation and returns *MolecularComplexityDescriptors*.
|
|
352
|
|
353 SetMinDistance
|
|
354 $MolecularComplexityDescriptors->SetMinDistance($MinDistance);
|
|
355
|
|
356 Sets minimum distance to use during topological atom pairs and
|
|
357 triplets fingerprints generation and returns
|
|
358 *MolecularComplexityDescriptors*.
|
|
359
|
|
360 SetMinPathLength
|
|
361 $MolecularComplexityDescriptors->SetMinPathLength($MinPathLength);
|
|
362
|
|
363 Sets minimum path length to use during path length fingerprints
|
|
364 generation and returns *MolecularComplexityDescriptors*.
|
|
365
|
|
366 SetMolecularComplexityType
|
|
367 $MolecularComplexityDescriptors->SetMolecularComplexityType($ComplexityType);
|
|
368
|
|
369 Sets molecular complexity type to use for calculating its value and
|
|
370 returns *MolecularComplexityDescriptors*.
|
|
371
|
|
372 SetNeighborhoodRadius
|
|
373 $MolecularComplexityDescriptors->SetNeighborhoodRadius($Radius);
|
|
374
|
|
375 Sets neighborhood radius to use during extended connectivity
|
|
376 fingerprints generation and returns
|
|
377 *MolecularComplexityDescriptors*.
|
|
378
|
|
379 SetNormalizationMethodology
|
|
380 $MolecularComplexityDescriptors->SetNormalizationMethodology($Methodology);
|
|
381
|
|
382 Sets normalization methodology to use during calculation of
|
|
383 molecular complexity corresponding to extended connectivity,
|
|
384 topological pharmacophore atom pairs and tripletes fingerprints
|
|
385 returns *MolecularComplexityDescriptors*.
|
|
386
|
|
387 StringifyMolecularComplexityDescriptors
|
|
388 $String = $MolecularComplexityDescriptors->
|
|
389 StringifyMolecularComplexityDescriptors();
|
|
390
|
|
391 Returns a string containing information about
|
|
392 *MolecularComplexityDescriptors* object.
|
|
393
|
|
394 AUTHOR
|
|
395 Manish Sud <msud@san.rr.com>
|
|
396
|
|
397 SEE ALSO
|
|
398 MolecularDescriptors.pm, MolecularDescriptorsGenerator.pm
|
|
399
|
|
400 COPYRIGHT
|
|
401 Copyright (C) 2015 Manish Sud. All rights reserved.
|
|
402
|
|
403 This file is part of MayaChemTools.
|
|
404
|
|
405 MayaChemTools is free software; you can redistribute it and/or modify it
|
|
406 under the terms of the GNU Lesser General Public License as published by
|
|
407 the Free Software Foundation; either version 3 of the License, or (at
|
|
408 your option) any later version.
|
|
409
|