0
|
1 NAME
|
|
2 CalculatePhysicochemicalProperties.pl - Calculate physicochemical
|
|
3 properties for SD files
|
|
4
|
|
5 SYNOPSIS
|
|
6 CalculatePhysicochemicalProperties.pl SDFile(s)...
|
|
7
|
|
8 PhysicochemicalProperties.pl [--AromaticityModel *AromaticityModelType*]
|
|
9 [--CompoundID DataFieldName or LabelPrefixString] [--CompoundIDLabel
|
|
10 text] [--CompoundIDMode] [--DataFields "FieldLabel1, FieldLabel2,..."]
|
|
11 [-d, --DataFieldsMode All | Common | Specify | CompoundID] [-f, --Filter
|
|
12 Yes | No] [-h, --help] [--HydrogenBonds HBondsType1 | HBondsType2] [-k,
|
|
13 --KeepLargestComponent Yes | No] [-m, --mode All | RuleOf5 | RuleOf3 |
|
|
14 "name1, [name2,...]"] [--MolecularComplexity *Name,Value,
|
|
15 [Name,Value,...]*] [--OutDelim comma | tab | semicolon] [--output SD |
|
|
16 text | both] [-o, --overwrite] [--Precision
|
|
17 Name,Number,[Name,Number,..]] [--RotatableBonds Name,Value,
|
|
18 [Name,Value,...]] [--RuleOf3Violations Yes | No] [--RuleOf5Violations
|
|
19 Yes | No] [-q, --quote Yes | No] [-r, --root RootName] [-w, --WorkingDir
|
|
20 dirname] SDFile(s)...
|
|
21
|
|
22 DESCRIPTION
|
|
23 Calculate physicochemical properties for *SDFile(s)* and create
|
|
24 appropriate SD or CSV/TSV text file(s) containing calculated properties.
|
|
25
|
|
26 The current release of MayaChemTools supports the calculation of these
|
|
27 physicochemical properties:
|
|
28
|
|
29 MolecularWeight, ExactMass, HeavyAtoms, Rings, AromaticRings,
|
|
30 van der Waals MolecularVolume [ Ref 93 ], RotatableBonds,
|
|
31 HydrogenBondDonors, HydrogenBondAcceptors, LogP and
|
|
32 Molar Refractivity (SLogP and SMR) [ Ref 89 ], Topological Polar
|
|
33 Surface Area (TPSA) [ Ref 90 ], Fraction of SP3 carbons (Fsp3Carbons)
|
|
34 and SP3 carbons (Sp3Carbons) [ Ref 115-116, Ref 119 ],
|
|
35 MolecularComplexity [ Ref 117-119 ]
|
|
36
|
|
37 Multiple SDFile names are separated by spaces. The valid file extensions
|
|
38 are *.sdf* and *.sd*. All other file names are ignored. All the SD files
|
|
39 in a current directory can be specified either by **.sdf* or the current
|
|
40 directory name.
|
|
41
|
|
42 The calculation of molecular complexity using *MolecularComplexityType*
|
|
43 parameter corresponds to the number of bits-set or unique keys [ Ref
|
|
44 117-119 ] in molecular fingerprints. Default value for
|
|
45 *MolecularComplexityType*: *MACCSKeys* of size 166. The calculation of
|
|
46 MACCSKeys is relatively expensive and can take rather substantial amount
|
|
47 of time.
|
|
48
|
|
49 OPTIONS
|
|
50 --AromaticityModel *MDLAromaticityModel | TriposAromaticityModel |
|
|
51 MMFFAromaticityModel | ChemAxonBasicAromaticityModel |
|
|
52 ChemAxonGeneralAromaticityModel | DaylightAromaticityModel |
|
|
53 MayaChemToolsAromaticityModel*
|
|
54 Specify aromaticity model to use during detection of aromaticity.
|
|
55 Possible values in the current release are: *MDLAromaticityModel,
|
|
56 TriposAromaticityModel, MMFFAromaticityModel,
|
|
57 ChemAxonBasicAromaticityModel, ChemAxonGeneralAromaticityModel,
|
|
58 DaylightAromaticityModel or MayaChemToolsAromaticityModel*. Default
|
|
59 value: *MayaChemToolsAromaticityModel*.
|
|
60
|
|
61 The supported aromaticity model names along with model specific
|
|
62 control parameters are defined in AromaticityModelsData.csv, which
|
|
63 is distributed with the current release and is available under
|
|
64 lib/data directory. Molecule.pm module retrieves data from this file
|
|
65 during class instantiation and makes it available to method
|
|
66 DetectAromaticity for detecting aromaticity corresponding to a
|
|
67 specific model.
|
|
68
|
|
69 --CompoundID *DataFieldName or LabelPrefixString*
|
|
70 This value is --CompoundIDMode specific and indicates how compound
|
|
71 ID is generated.
|
|
72
|
|
73 For *DataField* value of --CompoundIDMode option, it corresponds to
|
|
74 datafield label name whose value is used as compound ID; otherwise,
|
|
75 it's a prefix string used for generating compound IDs like
|
|
76 LabelPrefixString<Number>. Default value, *Cmpd*, generates compound
|
|
77 IDs which look like Cmpd<Number>.
|
|
78
|
|
79 Examples for *DataField* value of --CompoundIDMode:
|
|
80
|
|
81 MolID
|
|
82 ExtReg
|
|
83
|
|
84 Examples for *LabelPrefix* or *MolNameOrLabelPrefix* value of
|
|
85 --CompoundIDMode:
|
|
86
|
|
87 Compound
|
|
88
|
|
89 The value specified above generates compound IDs which correspond to
|
|
90 Compound<Number> instead of default value of Cmpd<Number>.
|
|
91
|
|
92 --CompoundIDLabel *text*
|
|
93 Specify compound ID column label for CSV/TSV text file(s) used
|
|
94 during *CompoundID* value of --DataFieldsMode option. Default value:
|
|
95 *CompoundID*.
|
|
96
|
|
97 --CompoundIDMode *DataField | MolName | LabelPrefix |
|
|
98 MolNameOrLabelPrefix*
|
|
99 Specify how to generate compound IDs and write to CSV/TSV text
|
|
100 file(s) along with calculated physicochemical properties for *text |
|
|
101 both* values of --output option: use a *SDFile(s)* datafield value;
|
|
102 use molname line from *SDFile(s)*; generate a sequential ID with
|
|
103 specific prefix; use combination of both MolName and LabelPrefix
|
|
104 with usage of LabelPrefix values for empty molname lines.
|
|
105
|
|
106 Possible values: *DataField | MolName | LabelPrefix |
|
|
107 MolNameOrLabelPrefix*. Default value: *LabelPrefix*.
|
|
108
|
|
109 For *MolNameAndLabelPrefix* value of --CompoundIDMode, molname line
|
|
110 in *SDFile(s)* takes precedence over sequential compound IDs
|
|
111 generated using *LabelPrefix* and only empty molname values are
|
|
112 replaced with sequential compound IDs.
|
|
113
|
|
114 This is only used for *CompoundID* value of --DataFieldsMode option.
|
|
115
|
|
116 --DataFields *"FieldLabel1,FieldLabel2,..."*
|
|
117 Comma delimited list of *SDFiles(s)* data fields to extract and
|
|
118 write to CSV/TSV text file(s) along with calculated physicochemical
|
|
119 properties for *text | both* values of --output option.
|
|
120
|
|
121 This is only used for *Specify* value of --DataFieldsMode option.
|
|
122
|
|
123 Examples:
|
|
124
|
|
125 Extreg
|
|
126 MolID,CompoundName
|
|
127
|
|
128 -d, --DataFieldsMode *All | Common | Specify | CompoundID*
|
|
129 Specify how data fields in *SDFile(s)* are transferred to output
|
|
130 CSV/TSV text file(s) along with calculated physicochemical
|
|
131 properties for *text | both* values of --output option: transfer all
|
|
132 SD data field; transfer SD data files common to all compounds;
|
|
133 extract specified data fields; generate a compound ID using molname
|
|
134 line, a compound prefix, or a combination of both. Possible values:
|
|
135 *All | Common | specify | CompoundID*. Default value: *CompoundID*.
|
|
136
|
|
137 -f, --Filter *Yes | No*
|
|
138 Specify whether to check and filter compound data in SDFile(s).
|
|
139 Possible values: *Yes or No*. Default value: *Yes*.
|
|
140
|
|
141 By default, compound data is checked before calculating
|
|
142 physiochemical properties and compounds containing atom data
|
|
143 corresponding to non-element symbols or no atom data are ignored.
|
|
144
|
|
145 -h, --help
|
|
146 Print this help message.
|
|
147
|
|
148 --HydrogenBonds *HBondsType1 | HBondsType2*
|
|
149 Parameters to control calculation of hydrogen bond donors and
|
|
150 acceptors. Possible values: *HBondsType1, HydrogenBondsType1,
|
|
151 HBondsType2, HydrogenBondsType2*. Default value: *HBondsType2* which
|
|
152 corresponds to RuleOf5 definition for number of hydrogen bond donors
|
|
153 and acceptors.
|
|
154
|
|
155 The current release of MayaChemTools supports identification of two
|
|
156 types of hydrogen bond donor and acceptor atoms with these names:
|
|
157
|
|
158 HBondsType1 or HydrogenBondsType1
|
|
159 HBondsType2 or HydrogenBondsType2
|
|
160
|
|
161 The names of these hydrogen bond types are rather arbitrary.
|
|
162 However, their definitions have specific meaning and are as follows:
|
|
163
|
|
164 HydrogenBondsType1 [ Ref 60-61, Ref 65-66 ]:
|
|
165
|
|
166 Donor: NH, NH2, OH - Any N and O with available H
|
|
167 Acceptor: N[!H], O - Any N without available H and any O
|
|
168
|
|
169 HydrogenBondsType2 [ Ref 91 ]:
|
|
170
|
|
171 Donor: NH, NH2, OH - N and O with available H
|
|
172 Acceptor: N, O - And N and O
|
|
173
|
|
174 -k, --KeepLargestComponent *Yes | No*
|
|
175 Calculate physicochemical properties for only the largest component
|
|
176 in molecule. Possible values: *Yes or No*. Default value: *Yes*.
|
|
177
|
|
178 For molecules containing multiple connected components,
|
|
179 physicochemical properties can be calculated in two different ways:
|
|
180 use all connected components or just the largest connected
|
|
181 component. By default, all atoms except for the largest connected
|
|
182 component are deleted before calculation of physicochemical
|
|
183 properties.
|
|
184
|
|
185 -m, --mode *All | RuleOf5 | RuleOf3 | "name1, [name2,...]"*
|
|
186 Specify physicochemical properties to calculate for SDFile(s):
|
|
187 calculate all available physical chemical properties; calculate
|
|
188 properties corresponding to Rule of 5; or use a comma delimited list
|
|
189 of supported physicochemical properties. Possible values: *All |
|
|
190 RuleOf5 | RuleOf3 | "name1, [name2,...]"*.
|
|
191
|
|
192 Default value: *MolecularWeight, HeavyAtoms, MolecularVolume,
|
|
193 RotatableBonds, HydrogenBondDonors, HydrogenBondAcceptors, SLogP,
|
|
194 TPSA*. These properties are calculated by default.
|
|
195
|
|
196 *RuleOf5* [ Ref 91 ] includes these properties: *MolecularWeight,
|
|
197 HydrogenBondDonors, HydrogenBondAcceptors, SLogP*. *RuleOf5* states:
|
|
198 MolecularWeight <= 500, HydrogenBondDonors <= 5,
|
|
199 HydrogenBondAcceptors <= 10, and logP <= 5.
|
|
200
|
|
201 *RuleOf3* [ Ref 92 ] includes these properties: *MolecularWeight,
|
|
202 RotatableBonds, HydrogenBondDonors, HydrogenBondAcceptors, SLogP,
|
|
203 TPSA*. *RuleOf3* states: MolecularWeight <= 300, RotatableBonds <=
|
|
204 3, HydrogenBondDonors <= 3, HydrogenBondAcceptors <= 3, logP <= 3,
|
|
205 and TPSA <= 60.
|
|
206
|
|
207 *All* calculates all supported physicochemical properties:
|
|
208 *MolecularWeight, ExactMass, HeavyAtoms, Rings, AromaticRings,
|
|
209 MolecularVolume, RotatableBonds, HydrogenBondDonors,
|
|
210 HydrogenBondAcceptors, SLogP, SMR, TPSA, Fsp3Carbons, Sp3Carbons,
|
|
211 MolecularComplexity*.
|
|
212
|
|
213 --MolecularComplexity *Name,Value, [Name,Value,...]*
|
|
214 Parameters to control calculation of molecular complexity: it's a
|
|
215 comma delimited list of parameter name and value pairs.
|
|
216
|
|
217 Possible parameter names: *MolecularComplexityType,
|
|
218 AtomIdentifierType, AtomicInvariantsToUse, FunctionalClassesToUse,
|
|
219 MACCSKeysSize, NeighborhoodRadius, MinPathLength, MaxPathLength,
|
|
220 UseBondSymbols, MinDistance, MaxDistance, UseTriangleInequality,
|
|
221 DistanceBinSize, NormalizationMethodology*.
|
|
222
|
|
223 The valid paramater valuse for each parameter name are described in
|
|
224 the following sections.
|
|
225
|
|
226 The current release of MayaChemTools supports calculation of
|
|
227 molecular complexity using *MolecularComplexityType* parameter
|
|
228 corresponding to the number of bits-set or unique keys [ Ref 117-119
|
|
229 ] in molecular fingerprints. The valid values for
|
|
230 *MolecularComplexityType* are:
|
|
231
|
|
232 AtomTypesFingerprints
|
|
233 ExtendedConnectivityFingerprints
|
|
234 MACCSKeys
|
|
235 PathLengthFingerprints
|
|
236 TopologicalAtomPairsFingerprints
|
|
237 TopologicalAtomTripletsFingerprints
|
|
238 TopologicalAtomTorsionsFingerprints
|
|
239 TopologicalPharmacophoreAtomPairsFingerprints
|
|
240 TopologicalPharmacophoreAtomTripletsFingerprints
|
|
241
|
|
242 Default value for *MolecularComplexityType*: *MACCSKeys*.
|
|
243
|
|
244 *AtomIdentifierType* parameter name correspods to atom types used
|
|
245 during generation of fingerprints. The valid values for
|
|
246 *AtomIdentifierType* are: *AtomicInvariantsAtomTypes,
|
|
247 DREIDINGAtomTypes, EStateAtomTypes, FunctionalClassAtomTypes,
|
|
248 MMFF94AtomTypes, SLogPAtomTypes, SYBYLAtomTypes, TPSAAtomTypes,
|
|
249 UFFAtomTypes*. *AtomicInvariantsAtomTypes* is not supported for
|
|
250 during the following values of *MolecularComplexityType*:
|
|
251 *MACCSKeys, TopologicalPharmacophoreAtomPairsFingerprints,
|
|
252 TopologicalPharmacophoreAtomTripletsFingerprints*.
|
|
253 *FunctionalClassAtomTypes* is the only valid value for
|
|
254 *AtomIdentifierType* for topological pharmacophore fingerprints.
|
|
255
|
|
256 Default value for *AtomIdentifierType*: *AtomicInvariantsAtomTypes*
|
|
257 for all except topological pharmacophore fingerprints where it is
|
|
258 *FunctionalClassAtomTypes*.
|
|
259
|
|
260 *AtomicInvariantsToUse* parameter name and values are used during
|
|
261 *AtomicInvariantsAtomTypes* value of parameter *AtomIdentifierType*.
|
|
262 It's a list of space separated valid atomic invariant atom types.
|
|
263
|
|
264 Possible values for atomic invariants are: *AS, X, BO, LBO, SB, DB,
|
|
265 TB, H, Ar, RA, FC, MN, SM*. Default value for
|
|
266 *AtomicInvariantsToUse* parameter are set differently for different
|
|
267 fingerprints using *MolecularComplexityType* parameter as shown
|
|
268 below:
|
|
269
|
|
270 MolecularComplexityType AtomicInvariantsToUse
|
|
271
|
|
272 AtomTypesFingerprints AS X BO H FC
|
|
273 TopologicalAtomPairsFingerprints AS X BO H FC
|
|
274 TopologicalAtomTripletsFingerprints AS X BO H FC
|
|
275 TopologicalAtomTorsionsFingerprints AS X BO H FC
|
|
276
|
|
277 ExtendedConnectivityFingerprints AS X BO H FC MN
|
|
278 PathLengthFingerprints AS
|
|
279
|
|
280 The atomic invariants abbreviations correspond to:
|
|
281
|
|
282 AS = Atom symbol corresponding to element symbol
|
|
283
|
|
284 X<n> = Number of non-hydrogen atom neighbors or heavy atoms
|
|
285 BO<n> = Sum of bond orders to non-hydrogen atom neighbors or heavy atoms
|
|
286 LBO<n> = Largest bond order of non-hydrogen atom neighbors or heavy atoms
|
|
287 SB<n> = Number of single bonds to non-hydrogen atom neighbors or heavy atoms
|
|
288 DB<n> = Number of double bonds to non-hydrogen atom neighbors or heavy atoms
|
|
289 TB<n> = Number of triple bonds to non-hydrogen atom neighbors or heavy atoms
|
|
290 H<n> = Number of implicit and explicit hydrogens for atom
|
|
291 Ar = Aromatic annotation indicating whether atom is aromatic
|
|
292 RA = Ring atom annotation indicating whether atom is a ring
|
|
293 FC<+n/-n> = Formal charge assigned to atom
|
|
294 MN<n> = Mass number indicating isotope other than most abundant isotope
|
|
295 SM<n> = Spin multiplicity of atom. Possible values: 1 (singlet), 2 (doublet) or
|
|
296 3 (triplet)
|
|
297
|
|
298 Atom type generated by AtomTypes::AtomicInvariantsAtomTypes class
|
|
299 corresponds to:
|
|
300
|
|
301 AS.X<n>.BO<n>.LBO<n>.<SB><n>.<DB><n>.<TB><n>.H<n>.Ar.RA.FC<+n/-n>.MN<n>.SM<n>
|
|
302
|
|
303 Except for AS which is a required atomic invariant in atom types,
|
|
304 all other atomic invariants are optional. Atom type specification
|
|
305 doesn't include atomic invariants with zero or undefined values.
|
|
306
|
|
307 In addition to usage of abbreviations for specifying atomic
|
|
308 invariants, the following descriptive words are also allowed:
|
|
309
|
|
310 X : NumOfNonHydrogenAtomNeighbors or NumOfHeavyAtomNeighbors
|
|
311 BO : SumOfBondOrdersToNonHydrogenAtoms or SumOfBondOrdersToHeavyAtoms
|
|
312 LBO : LargestBondOrderToNonHydrogenAtoms or LargestBondOrderToHeavyAtoms
|
|
313 SB : NumOfSingleBondsToNonHydrogenAtoms or NumOfSingleBondsToHeavyAtoms
|
|
314 DB : NumOfDoubleBondsToNonHydrogenAtoms or NumOfDoubleBondsToHeavyAtoms
|
|
315 TB : NumOfTripleBondsToNonHydrogenAtoms or NumOfTripleBondsToHeavyAtoms
|
|
316 H : NumOfImplicitAndExplicitHydrogens
|
|
317 Ar : Aromatic
|
|
318 RA : RingAtom
|
|
319 FC : FormalCharge
|
|
320 MN : MassNumber
|
|
321 SM : SpinMultiplicity
|
|
322
|
|
323 *AtomTypes::AtomicInvariantsAtomTypes* module is used to assign
|
|
324 atomic invariant atom types.
|
|
325
|
|
326 *FunctionalClassesToUse* parameter name and values are used during
|
|
327 *FunctionalClassAtomTypes* value of parameter *AtomIdentifierType*.
|
|
328 It's a list of space separated valid atomic invariant atom types.
|
|
329
|
|
330 Possible values for atom functional classes are: *Ar, CA, H, HBA,
|
|
331 HBD, Hal, NI, PI, RA*.
|
|
332
|
|
333 Default value for *FunctionalClassesToUse* parameter is set to:
|
|
334
|
|
335 HBD HBA PI NI Ar Hal
|
|
336
|
|
337 for all fingerprints except for the following two
|
|
338 *MolecularComplexityType* fingerints:
|
|
339
|
|
340 MolecularComplexityType FunctionalClassesToUse
|
|
341
|
|
342 TopologicalPharmacophoreAtomPairsFingerprints HBD HBA P, NI H
|
|
343 TopologicalPharmacophoreAtomTripletsFingerprints HBD HBA PI NI H Ar
|
|
344
|
|
345 The functional class abbreviations correspond to:
|
|
346
|
|
347 HBD: HydrogenBondDonor
|
|
348 HBA: HydrogenBondAcceptor
|
|
349 PI : PositivelyIonizable
|
|
350 NI : NegativelyIonizable
|
|
351 Ar : Aromatic
|
|
352 Hal : Halogen
|
|
353 H : Hydrophobic
|
|
354 RA : RingAtom
|
|
355 CA : ChainAtom
|
|
356
|
|
357 Functional class atom type specification for an atom corresponds to:
|
|
358
|
|
359 Ar.CA.H.HBA.HBD.Hal.NI.PI.RA
|
|
360
|
|
361 *AtomTypes::FunctionalClassAtomTypes* module is used to assign
|
|
362 functional class atom types. It uses following definitions [ Ref
|
|
363 60-61, Ref 65-66 ]:
|
|
364
|
|
365 HydrogenBondDonor: NH, NH2, OH
|
|
366 HydrogenBondAcceptor: N[!H], O
|
|
367 PositivelyIonizable: +, NH2
|
|
368 NegativelyIonizable: -, C(=O)OH, S(=O)OH, P(=O)OH
|
|
369
|
|
370 *MACCSKeysSize* parameter name is only used during *MACCSKeys* value
|
|
371 of *MolecularComplexityType* and corresponds to the size of MACCS
|
|
372 key set. Possible values: *166 or 322*. Default value: *166*.
|
|
373
|
|
374 *NeighborhoodRadius* parameter name is only used during
|
|
375 *ExtendedConnectivityFingerprints* value of
|
|
376 *MolecularComplexityType* and corresponds to atomic neighborhoods
|
|
377 radius for generating extended connectivity fingerprints. Possible
|
|
378 values: positive integer. Default value: *2*.
|
|
379
|
|
380 *MinPathLength* and *MaxPathLength* parameters are only used during
|
|
381 *PathLengthFingerprints* value of *MolecularComplexityType* and
|
|
382 correspond to minimum and maximum path lengths to use for generating
|
|
383 path length fingerprints. Possible values: positive integers.
|
|
384 Default value: *MinPathLength - 1*; *MaxPathLength - 8*.
|
|
385
|
|
386 *UseBondSymbols* parameter is only used during
|
|
387 *PathLengthFingerprints* value of *MolecularComplexityType* and
|
|
388 indicates whether bond symbols are included in atom path strings
|
|
389 used to generate path length fingerprints. Possible value: *Yes or
|
|
390 No*. Default value: *Yes*.
|
|
391
|
|
392 *MinDistance* and *MaxDistance* parameters are only used during
|
|
393 *TopologicalAtomPairsFingerprints* and
|
|
394 *TopologicalAtomTripletsFingerprints* values of
|
|
395 *MolecularComplexityType* and correspond to minimum and maximum bond
|
|
396 distance between atom pairs during topological pharmacophore
|
|
397 fingerprints. Possible values: positive integers. Default value:
|
|
398 *MinDistance - 1*; *MaxDistance - 10*.
|
|
399
|
|
400 *UseTriangleInequality* parameter is used during these values for
|
|
401 *MolecularComplexityType*: *TopologicalAtomTripletsFingerprints* and
|
|
402 *TopologicalPharmacophoreAtomTripletsFingerprints*. Possible values:
|
|
403 *Yes or No*. It determines wheter to apply triangle inequality to
|
|
404 distance triplets. Default value:
|
|
405 *TopologicalAtomTripletsFingerprints - No*;
|
|
406 *TopologicalPharmacophoreAtomTripletsFingerprints - Yes*.
|
|
407
|
|
408 *DistanceBinSize* parameter is used during
|
|
409 *TopologicalPharmacophoreAtomTripletsFingerprints* value of
|
|
410 *MolecularComplexityType* and correspons to distance bin size used
|
|
411 for binning distances during generation of topological pharmacophore
|
|
412 atom triplets fingerprints. Possible value: positive integer.
|
|
413 Default value: *2*.
|
|
414
|
|
415 *NormalizationMethodology* is only used for these values for
|
|
416 *MolecularComplexityType*: *ExtendedConnectivityFingerprints*,
|
|
417 *TopologicalPharmacophoreAtomPairsFingerprints* and
|
|
418 *TopologicalPharmacophoreAtomTripletsFingerprints*. It corresponds
|
|
419 to normalization methodology to use for scaling the number of
|
|
420 bits-set or unique keys during generation of fingerprints. Possible
|
|
421 values during *ExtendedConnectivityFingerprints*: *None or
|
|
422 ByHeavyAtomsCount*; Default value: *None*. Possible values during
|
|
423 topological pharmacophore atom pairs and tripletes fingerprints:
|
|
424 *None or ByPossibleKeysCount*; Default value: *None*.
|
|
425 *ByPossibleKeysCount* corresponds to total number of possible
|
|
426 topological pharmacophore atom pairs or triplets in a molecule.
|
|
427
|
|
428 Examples of *MolecularComplexity* name and value parameters:
|
|
429
|
|
430 MolecularComplexityType,AtomTypesFingerprints,AtomIdentifierType,
|
|
431 AtomicInvariantsAtomTypes,AtomicInvariantsToUse,AS X BO H FC
|
|
432
|
|
433 MolecularComplexityType,ExtendedConnectivityFingerprints,
|
|
434 AtomIdentifierType,AtomicInvariantsAtomTypes,
|
|
435 AtomicInvariantsToUse,AS X BO H FC MN,NeighborhoodRadius,2,
|
|
436 NormalizationMethodology,None
|
|
437
|
|
438 MolecularComplexityType,MACCSKeys,MACCSKeysSize,166
|
|
439
|
|
440 MolecularComplexityType,PathLengthFingerprints,AtomIdentifierType,
|
|
441 AtomicInvariantsAtomTypes,AtomicInvariantsToUse,AS,MinPathLength,
|
|
442 1,MaxPathLength,8,UseBondSymbols,Yes
|
|
443
|
|
444 MolecularComplexityType,TopologicalAtomPairsFingerprints,
|
|
445 AtomIdentifierType,AtomicInvariantsAtomTypes,AtomicInvariantsToUse,
|
|
446 AS X BO H FC,MinDistance,1,MaxDistance,10
|
|
447
|
|
448 MolecularComplexityType,TopologicalAtomTripletsFingerprints,
|
|
449 AtomIdentifierType,AtomicInvariantsAtomTypes,AtomicInvariantsToUse,
|
|
450 AS X BO H FC,MinDistance,1,MaxDistance,10,UseTriangleInequality,No
|
|
451
|
|
452 MolecularComplexityType,TopologicalAtomTorsionsFingerprints,
|
|
453 AtomIdentifierType,AtomicInvariantsAtomTypes,AtomicInvariantsToUse,
|
|
454 AS X BO H FC
|
|
455
|
|
456 MolecularComplexityType,TopologicalPharmacophoreAtomPairsFingerprints,
|
|
457 AtomIdentifierType,FunctionalClassAtomTypes,FunctionalClassesToUse,
|
|
458 HBD HBA PI NI H,MinDistance,1,MaxDistance,10,NormalizationMethodology,
|
|
459 None
|
|
460
|
|
461 MolecularComplexityType,TopologicalPharmacophoreAtomTripletsFingerprints,
|
|
462 AtomIdentifierType,FunctionalClassAtomTypes,FunctionalClassesToUse,
|
|
463 HBD HBA PI NI H Ar,MinDistance,1,MaxDistance,10,NormalizationMethodology,
|
|
464 None,UseTriangleInequality,Yes,NormalizationMethodology,None,
|
|
465 DistanceBinSize,2
|
|
466
|
|
467 --OutDelim *comma | tab | semicolon*
|
|
468 Delimiter for output CSV/TSV text file(s). Possible values: *comma,
|
|
469 tab, or semicolon* Default value: *comma*.
|
|
470
|
|
471 --output *SD | text | both*
|
|
472 Type of output files to generate. Possible values: *SD, text, or
|
|
473 both*. Default value: *text*.
|
|
474
|
|
475 -o, --overwrite
|
|
476 Overwrite existing files.
|
|
477
|
|
478 --Precision *Name,Number,[Name,Number,..]*
|
|
479 Precision of calculated property values in the output file: it's a
|
|
480 comma delimited list of property name and precision value pairs.
|
|
481 Possible property names: *MolecularWeight, ExactMass*. Possible
|
|
482 values: positive intergers. Default value: *MolecularWeight,2,
|
|
483 ExactMass,4*.
|
|
484
|
|
485 Examples:
|
|
486
|
|
487 ExactMass,3
|
|
488 MolecularWeight,1,ExactMass,2
|
|
489
|
|
490 -q, --quote *Yes | No*
|
|
491 Put quote around column values in output CSV/TSV text file(s).
|
|
492 Possible values: *Yes or No*. Default value: *Yes*.
|
|
493
|
|
494 -r, --root *RootName*
|
|
495 New file name is generated using the root: <Root>.<Ext>. Default for
|
|
496 new file names: <SDFileName><PhysicochemicalProperties>.<Ext>. The
|
|
497 file type determines <Ext> value. The sdf, csv, and tsv <Ext> values
|
|
498 are used for SD, comma/semicolon, and tab delimited text files,
|
|
499 respectively.This option is ignored for multiple input files.
|
|
500
|
|
501 --RotatableBonds *Name,Value, [Name,Value,...]*
|
|
502 Parameters to control calculation of rotatable bonds [ Ref 92 ]:
|
|
503 it's a comma delimited list of parameter name and value pairs.
|
|
504 Possible parameter names: *IgnoreTerminalBonds,
|
|
505 IgnoreBondsToTripleBonds, IgnoreAmideBonds, IgnoreThioamideBonds,
|
|
506 IgnoreSulfonamideBonds*. Possible parameter values: *Yes or No*. By
|
|
507 default, value of all parameters is set to *Yes*.
|
|
508
|
|
509 --RuleOf3Violations *Yes | No*
|
|
510 Specify whether to calculate RuleOf3Violations for SDFile(s).
|
|
511 Possible values: *Yes or No*. Default value: *No*.
|
|
512
|
|
513 For *Yes* value of RuleOf3Violations, in addition to calculating
|
|
514 total number of RuleOf3 violations, individual violations for
|
|
515 compounds are also written to output files.
|
|
516
|
|
517 RuleOf3 [ Ref 92 ] states: MolecularWeight <= 300, RotatableBonds <=
|
|
518 3, HydrogenBondDonors <= 3, HydrogenBondAcceptors <= 3, logP <= 3,
|
|
519 and TPSA <= 60.
|
|
520
|
|
521 --RuleOf5Violations *Yes | No*
|
|
522 Specify whether to calculate RuleOf5Violations for SDFile(s).
|
|
523 Possible values: *Yes or No*. Default value: *No*.
|
|
524
|
|
525 For *Yes* value of RuleOf5Violations, in addition to calculating
|
|
526 total number of RuleOf5 violations, individual violations for
|
|
527 compounds are also written to output files.
|
|
528
|
|
529 RuleOf5 [ Ref 91 ] states: MolecularWeight <= 500,
|
|
530 HydrogenBondDonors <= 5, HydrogenBondAcceptors <= 10, and logP <= 5.
|
|
531
|
|
532 --TPSA *Name,Value, [Name,Value,...]*
|
|
533 Parameters to control calculation of TPSA: it's a comma delimited
|
|
534 list of parameter name and value pairs. Possible parameter names:
|
|
535 *IgnorePhosphorus, IgnoreSulfur*. Possible parameter values: *Yes or
|
|
536 No*. By default, value of all parameters is set to *Yes*.
|
|
537
|
|
538 By default, TPSA atom contributions from Phosphorus and Sulfur atoms
|
|
539 are not included during TPSA calculations. [ Ref 91 ]
|
|
540
|
|
541 -w, --WorkingDir *DirName*
|
|
542 Location of working directory. Default value: current directory.
|
|
543
|
|
544 EXAMPLES
|
|
545 To calculate default set of physicochemical properties -
|
|
546 MolecularWeight, HeavyAtoms, MolecularVolume, RotatableBonds,
|
|
547 HydrogenBondDonor, HydrogenBondAcceptors, SLogP, TPSA - and generate a
|
|
548 SamplePhysicochemicalProperties.csv file containing sequential compound
|
|
549 IDs along with properties data, type:
|
|
550
|
|
551 % CalculatePhysicochemicalProperties.pl -o Sample.sdf
|
|
552
|
|
553 To calculate all available physicochemical properties and generate both
|
|
554 SampleAllProperties.csv and SampleAllProperties.sdf files containing
|
|
555 sequential compound IDs in CSV file along with properties data, type:
|
|
556
|
|
557 % CalculatePhysicochemicalProperties.pl -m All --output both
|
|
558 -r SampleAllProperties -o Sample.sdf
|
|
559
|
|
560 To calculate RuleOf5 physicochemical properties and generate a
|
|
561 SampleRuleOf5Properties.csv file containing sequential compound IDs
|
|
562 along with properties data, type:
|
|
563
|
|
564 % CalculatePhysicochemicalProperties.pl -m RuleOf5
|
|
565 -r SampleRuleOf5Properties -o Sample.sdf
|
|
566
|
|
567 To calculate RuleOf5 physicochemical properties along with counting
|
|
568 RuleOf5 violations and generate a SampleRuleOf5Properties.csv file
|
|
569 containing sequential compound IDs along with properties data, type:
|
|
570
|
|
571 % CalculatePhysicochemicalProperties.pl -m RuleOf5 --RuleOf5Violations Yes
|
|
572 -r SampleRuleOf5Properties -o Sample.sdf
|
|
573
|
|
574 To calculate RuleOf3 physicochemical properties and generate a
|
|
575 SampleRuleOf3Properties.csv file containing sequential compound IDs
|
|
576 along with properties data, type:
|
|
577
|
|
578 % CalculatePhysicochemicalProperties.pl -m RuleOf3
|
|
579 -r SampleRuleOf3Properties -o Sample.sdf
|
|
580
|
|
581 To calculate RuleOf3 physicochemical properties along with counting
|
|
582 RuleOf3 violations and generate a SampleRuleOf3Properties.csv file
|
|
583 containing sequential compound IDs along with properties data, type:
|
|
584
|
|
585 % CalculatePhysicochemicalProperties.pl -m RuleOf3 --RuleOf3Violations Yes
|
|
586 -r SampleRuleOf3Properties -o Sample.sdf
|
|
587
|
|
588 To calculate a specific set of physicochemical properties and generate a
|
|
589 SampleProperties.csv file containing sequential compound IDs along with
|
|
590 properties data, type:
|
|
591
|
|
592 % CalculatePhysicochemicalProperties.pl -m "Rings,AromaticRings"
|
|
593 -r SampleProperties -o Sample.sdf
|
|
594
|
|
595 To calculate HydrogenBondDonors and HydrogenBondAcceptors using
|
|
596 HydrogenBondsType1 definition and generate a SampleProperties.csv file
|
|
597 containing sequential compound IDs along with properties data, type:
|
|
598
|
|
599 % CalculatePhysicochemicalProperties.pl -m "HydrogenBondDonors,HydrogenBondAcceptors"
|
|
600 --HydrogenBonds HBondsType1 -r SampleProperties -o Sample.sdf
|
|
601
|
|
602 To calculate TPSA using sulfur and phosphorus atoms along with nitrogen
|
|
603 and oxygen atoms and generate a SampleProperties.csv file containing
|
|
604 sequential compound IDs along with properties data, type:
|
|
605
|
|
606 % CalculatePhysicochemicalProperties.pl -m "TPSA" --TPSA "IgnorePhosphorus,No,
|
|
607 IgnoreSulfur,No" -r SampleProperties -o Sample.sdf
|
|
608
|
|
609 To calculate MolecularComplexity using extendend connectivity
|
|
610 fingerprints corresponding to atom neighborhood radius of 2 with atomic
|
|
611 invariant atom types without any scaling and generate a
|
|
612 SampleProperties.csv file containing sequential compound IDs along with
|
|
613 properties data, type:
|
|
614
|
|
615 % CalculatePhysicochemicalProperties.pl -m MolecularComplexity --MolecularComplexity
|
|
616 "MolecularComplexityType,ExtendedConnectivityFingerprints,NeighborhoodRadius,2,
|
|
617 AtomIdentifierType, AtomicInvariantsAtomTypes,
|
|
618 AtomicInvariantsToUse,AS X BO H FC MN,NormalizationMethodology,None"
|
|
619 -r SampleProperties -o Sample.sdf
|
|
620
|
|
621 To calculate RuleOf5 physicochemical properties along with counting
|
|
622 RuleOf5 violations and generate a SampleRuleOf5Properties.csv file
|
|
623 containing compound IDs from molecule name line along with properties
|
|
624 data, type:
|
|
625
|
|
626 % CalculatePhysicochemicalProperties.pl -m RuleOf5 --RuleOf5Violations Yes
|
|
627 --DataFieldsMode CompoundID --CompoundIDMode MolName
|
|
628 -r SampleRuleOf5Properties -o Sample.sdf
|
|
629
|
|
630 To calculate all available physicochemical properties and generate a
|
|
631 SampleAllProperties.csv file containing compound ID using specified data
|
|
632 field along with along with properties data, type:
|
|
633
|
|
634 % CalculatePhysicochemicalProperties.pl -m All
|
|
635 --DataFieldsMode CompoundID --CompoundIDMode DataField --CompoundID Mol_ID
|
|
636 -r SampleAllProperties -o Sample.sdf
|
|
637
|
|
638 To calculate all available physicochemical properties and generate a
|
|
639 SampleAllProperties.csv file containing compound ID using combination of
|
|
640 molecule name line and an explicit compound prefix along with properties
|
|
641 data, type:
|
|
642
|
|
643 % CalculatePhysicochemicalProperties.pl -m All
|
|
644 --DataFieldsMode CompoundID --CompoundIDMode MolnameOrLabelPrefix
|
|
645 --CompoundID Cmpd --CompoundIDLabel MolID -r SampleAllProperties
|
|
646 -o Sample.sdf
|
|
647
|
|
648 To calculate all available physicochemical properties and generate a
|
|
649 SampleAllProperties.csv file containing specific data fields columns
|
|
650 along with with properties data, type:
|
|
651
|
|
652 % CalculatePhysicochemicalProperties.pl -m All
|
|
653 --DataFieldsMode Specify --DataFields Mol_ID -r SampleAllProperties
|
|
654 -o Sample.sdf
|
|
655
|
|
656 To calculate all available physicochemical properties and generate a
|
|
657 SampleAllProperties.csv file containing common data fields columns along
|
|
658 with with properties data, type:
|
|
659
|
|
660 % CalculatePhysicochemicalProperties.pl -m All
|
|
661 --DataFieldsMode Common -r SampleAllProperties -o Sample.sdf
|
|
662
|
|
663 To calculate all available physicochemical properties and generate both
|
|
664 SampleAllProperties.csv and CSV files containing all data fields columns
|
|
665 in CSV files along with with properties data, type:
|
|
666
|
|
667 % CalculatePhysicochemicalProperties.pl -m All
|
|
668 --DataFieldsMode All --output both -r SampleAllProperties
|
|
669 -o Sample.sdf
|
|
670
|
|
671 AUTHOR
|
|
672 Manish Sud <msud@san.rr.com>
|
|
673
|
|
674 SEE ALSO
|
|
675 ExtractFromSDtFiles.pl, ExtractFromTextFiles.pl, InfoSDFiles.pl,
|
|
676 InfoTextFiles.pl
|
|
677
|
|
678 COPYRIGHT
|
|
679 Copyright (C) 2015 Manish Sud. All rights reserved.
|
|
680
|
|
681 This file is part of MayaChemTools.
|
|
682
|
|
683 MayaChemTools is free software; you can redistribute it and/or modify it
|
|
684 under the terms of the GNU Lesser General Public License as published by
|
|
685 the Free Software Foundation; either version 3 of the License, or (at
|
|
686 your option) any later version.
|
|
687
|