comparison docs/scripts/txt/CalculatePhysicochemicalProperties.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:4816e4a8ae95
1 NAME
2 CalculatePhysicochemicalProperties.pl - Calculate physicochemical
3 properties for SD files
4
5 SYNOPSIS
6 CalculatePhysicochemicalProperties.pl SDFile(s)...
7
8 PhysicochemicalProperties.pl [--AromaticityModel *AromaticityModelType*]
9 [--CompoundID DataFieldName or LabelPrefixString] [--CompoundIDLabel
10 text] [--CompoundIDMode] [--DataFields "FieldLabel1, FieldLabel2,..."]
11 [-d, --DataFieldsMode All | Common | Specify | CompoundID] [-f, --Filter
12 Yes | No] [-h, --help] [--HydrogenBonds HBondsType1 | HBondsType2] [-k,
13 --KeepLargestComponent Yes | No] [-m, --mode All | RuleOf5 | RuleOf3 |
14 "name1, [name2,...]"] [--MolecularComplexity *Name,Value,
15 [Name,Value,...]*] [--OutDelim comma | tab | semicolon] [--output SD |
16 text | both] [-o, --overwrite] [--Precision
17 Name,Number,[Name,Number,..]] [--RotatableBonds Name,Value,
18 [Name,Value,...]] [--RuleOf3Violations Yes | No] [--RuleOf5Violations
19 Yes | No] [-q, --quote Yes | No] [-r, --root RootName] [-w, --WorkingDir
20 dirname] SDFile(s)...
21
22 DESCRIPTION
23 Calculate physicochemical properties for *SDFile(s)* and create
24 appropriate SD or CSV/TSV text file(s) containing calculated properties.
25
26 The current release of MayaChemTools supports the calculation of these
27 physicochemical properties:
28
29 MolecularWeight, ExactMass, HeavyAtoms, Rings, AromaticRings,
30 van der Waals MolecularVolume [ Ref 93 ], RotatableBonds,
31 HydrogenBondDonors, HydrogenBondAcceptors, LogP and
32 Molar Refractivity (SLogP and SMR) [ Ref 89 ], Topological Polar
33 Surface Area (TPSA) [ Ref 90 ], Fraction of SP3 carbons (Fsp3Carbons)
34 and SP3 carbons (Sp3Carbons) [ Ref 115-116, Ref 119 ],
35 MolecularComplexity [ Ref 117-119 ]
36
37 Multiple SDFile names are separated by spaces. The valid file extensions
38 are *.sdf* and *.sd*. All other file names are ignored. All the SD files
39 in a current directory can be specified either by **.sdf* or the current
40 directory name.
41
42 The calculation of molecular complexity using *MolecularComplexityType*
43 parameter corresponds to the number of bits-set or unique keys [ Ref
44 117-119 ] in molecular fingerprints. Default value for
45 *MolecularComplexityType*: *MACCSKeys* of size 166. The calculation of
46 MACCSKeys is relatively expensive and can take rather substantial amount
47 of time.
48
49 OPTIONS
50 --AromaticityModel *MDLAromaticityModel | TriposAromaticityModel |
51 MMFFAromaticityModel | ChemAxonBasicAromaticityModel |
52 ChemAxonGeneralAromaticityModel | DaylightAromaticityModel |
53 MayaChemToolsAromaticityModel*
54 Specify aromaticity model to use during detection of aromaticity.
55 Possible values in the current release are: *MDLAromaticityModel,
56 TriposAromaticityModel, MMFFAromaticityModel,
57 ChemAxonBasicAromaticityModel, ChemAxonGeneralAromaticityModel,
58 DaylightAromaticityModel or MayaChemToolsAromaticityModel*. Default
59 value: *MayaChemToolsAromaticityModel*.
60
61 The supported aromaticity model names along with model specific
62 control parameters are defined in AromaticityModelsData.csv, which
63 is distributed with the current release and is available under
64 lib/data directory. Molecule.pm module retrieves data from this file
65 during class instantiation and makes it available to method
66 DetectAromaticity for detecting aromaticity corresponding to a
67 specific model.
68
69 --CompoundID *DataFieldName or LabelPrefixString*
70 This value is --CompoundIDMode specific and indicates how compound
71 ID is generated.
72
73 For *DataField* value of --CompoundIDMode option, it corresponds to
74 datafield label name whose value is used as compound ID; otherwise,
75 it's a prefix string used for generating compound IDs like
76 LabelPrefixString<Number>. Default value, *Cmpd*, generates compound
77 IDs which look like Cmpd<Number>.
78
79 Examples for *DataField* value of --CompoundIDMode:
80
81 MolID
82 ExtReg
83
84 Examples for *LabelPrefix* or *MolNameOrLabelPrefix* value of
85 --CompoundIDMode:
86
87 Compound
88
89 The value specified above generates compound IDs which correspond to
90 Compound<Number> instead of default value of Cmpd<Number>.
91
92 --CompoundIDLabel *text*
93 Specify compound ID column label for CSV/TSV text file(s) used
94 during *CompoundID* value of --DataFieldsMode option. Default value:
95 *CompoundID*.
96
97 --CompoundIDMode *DataField | MolName | LabelPrefix |
98 MolNameOrLabelPrefix*
99 Specify how to generate compound IDs and write to CSV/TSV text
100 file(s) along with calculated physicochemical properties for *text |
101 both* values of --output option: use a *SDFile(s)* datafield value;
102 use molname line from *SDFile(s)*; generate a sequential ID with
103 specific prefix; use combination of both MolName and LabelPrefix
104 with usage of LabelPrefix values for empty molname lines.
105
106 Possible values: *DataField | MolName | LabelPrefix |
107 MolNameOrLabelPrefix*. Default value: *LabelPrefix*.
108
109 For *MolNameAndLabelPrefix* value of --CompoundIDMode, molname line
110 in *SDFile(s)* takes precedence over sequential compound IDs
111 generated using *LabelPrefix* and only empty molname values are
112 replaced with sequential compound IDs.
113
114 This is only used for *CompoundID* value of --DataFieldsMode option.
115
116 --DataFields *"FieldLabel1,FieldLabel2,..."*
117 Comma delimited list of *SDFiles(s)* data fields to extract and
118 write to CSV/TSV text file(s) along with calculated physicochemical
119 properties for *text | both* values of --output option.
120
121 This is only used for *Specify* value of --DataFieldsMode option.
122
123 Examples:
124
125 Extreg
126 MolID,CompoundName
127
128 -d, --DataFieldsMode *All | Common | Specify | CompoundID*
129 Specify how data fields in *SDFile(s)* are transferred to output
130 CSV/TSV text file(s) along with calculated physicochemical
131 properties for *text | both* values of --output option: transfer all
132 SD data field; transfer SD data files common to all compounds;
133 extract specified data fields; generate a compound ID using molname
134 line, a compound prefix, or a combination of both. Possible values:
135 *All | Common | specify | CompoundID*. Default value: *CompoundID*.
136
137 -f, --Filter *Yes | No*
138 Specify whether to check and filter compound data in SDFile(s).
139 Possible values: *Yes or No*. Default value: *Yes*.
140
141 By default, compound data is checked before calculating
142 physiochemical properties and compounds containing atom data
143 corresponding to non-element symbols or no atom data are ignored.
144
145 -h, --help
146 Print this help message.
147
148 --HydrogenBonds *HBondsType1 | HBondsType2*
149 Parameters to control calculation of hydrogen bond donors and
150 acceptors. Possible values: *HBondsType1, HydrogenBondsType1,
151 HBondsType2, HydrogenBondsType2*. Default value: *HBondsType2* which
152 corresponds to RuleOf5 definition for number of hydrogen bond donors
153 and acceptors.
154
155 The current release of MayaChemTools supports identification of two
156 types of hydrogen bond donor and acceptor atoms with these names:
157
158 HBondsType1 or HydrogenBondsType1
159 HBondsType2 or HydrogenBondsType2
160
161 The names of these hydrogen bond types are rather arbitrary.
162 However, their definitions have specific meaning and are as follows:
163
164 HydrogenBondsType1 [ Ref 60-61, Ref 65-66 ]:
165
166 Donor: NH, NH2, OH - Any N and O with available H
167 Acceptor: N[!H], O - Any N without available H and any O
168
169 HydrogenBondsType2 [ Ref 91 ]:
170
171 Donor: NH, NH2, OH - N and O with available H
172 Acceptor: N, O - And N and O
173
174 -k, --KeepLargestComponent *Yes | No*
175 Calculate physicochemical properties for only the largest component
176 in molecule. Possible values: *Yes or No*. Default value: *Yes*.
177
178 For molecules containing multiple connected components,
179 physicochemical properties can be calculated in two different ways:
180 use all connected components or just the largest connected
181 component. By default, all atoms except for the largest connected
182 component are deleted before calculation of physicochemical
183 properties.
184
185 -m, --mode *All | RuleOf5 | RuleOf3 | "name1, [name2,...]"*
186 Specify physicochemical properties to calculate for SDFile(s):
187 calculate all available physical chemical properties; calculate
188 properties corresponding to Rule of 5; or use a comma delimited list
189 of supported physicochemical properties. Possible values: *All |
190 RuleOf5 | RuleOf3 | "name1, [name2,...]"*.
191
192 Default value: *MolecularWeight, HeavyAtoms, MolecularVolume,
193 RotatableBonds, HydrogenBondDonors, HydrogenBondAcceptors, SLogP,
194 TPSA*. These properties are calculated by default.
195
196 *RuleOf5* [ Ref 91 ] includes these properties: *MolecularWeight,
197 HydrogenBondDonors, HydrogenBondAcceptors, SLogP*. *RuleOf5* states:
198 MolecularWeight <= 500, HydrogenBondDonors <= 5,
199 HydrogenBondAcceptors <= 10, and logP <= 5.
200
201 *RuleOf3* [ Ref 92 ] includes these properties: *MolecularWeight,
202 RotatableBonds, HydrogenBondDonors, HydrogenBondAcceptors, SLogP,
203 TPSA*. *RuleOf3* states: MolecularWeight <= 300, RotatableBonds <=
204 3, HydrogenBondDonors <= 3, HydrogenBondAcceptors <= 3, logP <= 3,
205 and TPSA <= 60.
206
207 *All* calculates all supported physicochemical properties:
208 *MolecularWeight, ExactMass, HeavyAtoms, Rings, AromaticRings,
209 MolecularVolume, RotatableBonds, HydrogenBondDonors,
210 HydrogenBondAcceptors, SLogP, SMR, TPSA, Fsp3Carbons, Sp3Carbons,
211 MolecularComplexity*.
212
213 --MolecularComplexity *Name,Value, [Name,Value,...]*
214 Parameters to control calculation of molecular complexity: it's a
215 comma delimited list of parameter name and value pairs.
216
217 Possible parameter names: *MolecularComplexityType,
218 AtomIdentifierType, AtomicInvariantsToUse, FunctionalClassesToUse,
219 MACCSKeysSize, NeighborhoodRadius, MinPathLength, MaxPathLength,
220 UseBondSymbols, MinDistance, MaxDistance, UseTriangleInequality,
221 DistanceBinSize, NormalizationMethodology*.
222
223 The valid paramater valuse for each parameter name are described in
224 the following sections.
225
226 The current release of MayaChemTools supports calculation of
227 molecular complexity using *MolecularComplexityType* parameter
228 corresponding to the number of bits-set or unique keys [ Ref 117-119
229 ] in molecular fingerprints. The valid values for
230 *MolecularComplexityType* are:
231
232 AtomTypesFingerprints
233 ExtendedConnectivityFingerprints
234 MACCSKeys
235 PathLengthFingerprints
236 TopologicalAtomPairsFingerprints
237 TopologicalAtomTripletsFingerprints
238 TopologicalAtomTorsionsFingerprints
239 TopologicalPharmacophoreAtomPairsFingerprints
240 TopologicalPharmacophoreAtomTripletsFingerprints
241
242 Default value for *MolecularComplexityType*: *MACCSKeys*.
243
244 *AtomIdentifierType* parameter name correspods to atom types used
245 during generation of fingerprints. The valid values for
246 *AtomIdentifierType* are: *AtomicInvariantsAtomTypes,
247 DREIDINGAtomTypes, EStateAtomTypes, FunctionalClassAtomTypes,
248 MMFF94AtomTypes, SLogPAtomTypes, SYBYLAtomTypes, TPSAAtomTypes,
249 UFFAtomTypes*. *AtomicInvariantsAtomTypes* is not supported for
250 during the following values of *MolecularComplexityType*:
251 *MACCSKeys, TopologicalPharmacophoreAtomPairsFingerprints,
252 TopologicalPharmacophoreAtomTripletsFingerprints*.
253 *FunctionalClassAtomTypes* is the only valid value for
254 *AtomIdentifierType* for topological pharmacophore fingerprints.
255
256 Default value for *AtomIdentifierType*: *AtomicInvariantsAtomTypes*
257 for all except topological pharmacophore fingerprints where it is
258 *FunctionalClassAtomTypes*.
259
260 *AtomicInvariantsToUse* parameter name and values are used during
261 *AtomicInvariantsAtomTypes* value of parameter *AtomIdentifierType*.
262 It's a list of space separated valid atomic invariant atom types.
263
264 Possible values for atomic invariants are: *AS, X, BO, LBO, SB, DB,
265 TB, H, Ar, RA, FC, MN, SM*. Default value for
266 *AtomicInvariantsToUse* parameter are set differently for different
267 fingerprints using *MolecularComplexityType* parameter as shown
268 below:
269
270 MolecularComplexityType AtomicInvariantsToUse
271
272 AtomTypesFingerprints AS X BO H FC
273 TopologicalAtomPairsFingerprints AS X BO H FC
274 TopologicalAtomTripletsFingerprints AS X BO H FC
275 TopologicalAtomTorsionsFingerprints AS X BO H FC
276
277 ExtendedConnectivityFingerprints AS X BO H FC MN
278 PathLengthFingerprints AS
279
280 The atomic invariants abbreviations correspond to:
281
282 AS = Atom symbol corresponding to element symbol
283
284 X<n> = Number of non-hydrogen atom neighbors or heavy atoms
285 BO<n> = Sum of bond orders to non-hydrogen atom neighbors or heavy atoms
286 LBO<n> = Largest bond order of non-hydrogen atom neighbors or heavy atoms
287 SB<n> = Number of single bonds to non-hydrogen atom neighbors or heavy atoms
288 DB<n> = Number of double bonds to non-hydrogen atom neighbors or heavy atoms
289 TB<n> = Number of triple bonds to non-hydrogen atom neighbors or heavy atoms
290 H<n> = Number of implicit and explicit hydrogens for atom
291 Ar = Aromatic annotation indicating whether atom is aromatic
292 RA = Ring atom annotation indicating whether atom is a ring
293 FC<+n/-n> = Formal charge assigned to atom
294 MN<n> = Mass number indicating isotope other than most abundant isotope
295 SM<n> = Spin multiplicity of atom. Possible values: 1 (singlet), 2 (doublet) or
296 3 (triplet)
297
298 Atom type generated by AtomTypes::AtomicInvariantsAtomTypes class
299 corresponds to:
300
301 AS.X<n>.BO<n>.LBO<n>.<SB><n>.<DB><n>.<TB><n>.H<n>.Ar.RA.FC<+n/-n>.MN<n>.SM<n>
302
303 Except for AS which is a required atomic invariant in atom types,
304 all other atomic invariants are optional. Atom type specification
305 doesn't include atomic invariants with zero or undefined values.
306
307 In addition to usage of abbreviations for specifying atomic
308 invariants, the following descriptive words are also allowed:
309
310 X : NumOfNonHydrogenAtomNeighbors or NumOfHeavyAtomNeighbors
311 BO : SumOfBondOrdersToNonHydrogenAtoms or SumOfBondOrdersToHeavyAtoms
312 LBO : LargestBondOrderToNonHydrogenAtoms or LargestBondOrderToHeavyAtoms
313 SB : NumOfSingleBondsToNonHydrogenAtoms or NumOfSingleBondsToHeavyAtoms
314 DB : NumOfDoubleBondsToNonHydrogenAtoms or NumOfDoubleBondsToHeavyAtoms
315 TB : NumOfTripleBondsToNonHydrogenAtoms or NumOfTripleBondsToHeavyAtoms
316 H : NumOfImplicitAndExplicitHydrogens
317 Ar : Aromatic
318 RA : RingAtom
319 FC : FormalCharge
320 MN : MassNumber
321 SM : SpinMultiplicity
322
323 *AtomTypes::AtomicInvariantsAtomTypes* module is used to assign
324 atomic invariant atom types.
325
326 *FunctionalClassesToUse* parameter name and values are used during
327 *FunctionalClassAtomTypes* value of parameter *AtomIdentifierType*.
328 It's a list of space separated valid atomic invariant atom types.
329
330 Possible values for atom functional classes are: *Ar, CA, H, HBA,
331 HBD, Hal, NI, PI, RA*.
332
333 Default value for *FunctionalClassesToUse* parameter is set to:
334
335 HBD HBA PI NI Ar Hal
336
337 for all fingerprints except for the following two
338 *MolecularComplexityType* fingerints:
339
340 MolecularComplexityType FunctionalClassesToUse
341
342 TopologicalPharmacophoreAtomPairsFingerprints HBD HBA P, NI H
343 TopologicalPharmacophoreAtomTripletsFingerprints HBD HBA PI NI H Ar
344
345 The functional class abbreviations correspond to:
346
347 HBD: HydrogenBondDonor
348 HBA: HydrogenBondAcceptor
349 PI : PositivelyIonizable
350 NI : NegativelyIonizable
351 Ar : Aromatic
352 Hal : Halogen
353 H : Hydrophobic
354 RA : RingAtom
355 CA : ChainAtom
356
357 Functional class atom type specification for an atom corresponds to:
358
359 Ar.CA.H.HBA.HBD.Hal.NI.PI.RA
360
361 *AtomTypes::FunctionalClassAtomTypes* module is used to assign
362 functional class atom types. It uses following definitions [ Ref
363 60-61, Ref 65-66 ]:
364
365 HydrogenBondDonor: NH, NH2, OH
366 HydrogenBondAcceptor: N[!H], O
367 PositivelyIonizable: +, NH2
368 NegativelyIonizable: -, C(=O)OH, S(=O)OH, P(=O)OH
369
370 *MACCSKeysSize* parameter name is only used during *MACCSKeys* value
371 of *MolecularComplexityType* and corresponds to the size of MACCS
372 key set. Possible values: *166 or 322*. Default value: *166*.
373
374 *NeighborhoodRadius* parameter name is only used during
375 *ExtendedConnectivityFingerprints* value of
376 *MolecularComplexityType* and corresponds to atomic neighborhoods
377 radius for generating extended connectivity fingerprints. Possible
378 values: positive integer. Default value: *2*.
379
380 *MinPathLength* and *MaxPathLength* parameters are only used during
381 *PathLengthFingerprints* value of *MolecularComplexityType* and
382 correspond to minimum and maximum path lengths to use for generating
383 path length fingerprints. Possible values: positive integers.
384 Default value: *MinPathLength - 1*; *MaxPathLength - 8*.
385
386 *UseBondSymbols* parameter is only used during
387 *PathLengthFingerprints* value of *MolecularComplexityType* and
388 indicates whether bond symbols are included in atom path strings
389 used to generate path length fingerprints. Possible value: *Yes or
390 No*. Default value: *Yes*.
391
392 *MinDistance* and *MaxDistance* parameters are only used during
393 *TopologicalAtomPairsFingerprints* and
394 *TopologicalAtomTripletsFingerprints* values of
395 *MolecularComplexityType* and correspond to minimum and maximum bond
396 distance between atom pairs during topological pharmacophore
397 fingerprints. Possible values: positive integers. Default value:
398 *MinDistance - 1*; *MaxDistance - 10*.
399
400 *UseTriangleInequality* parameter is used during these values for
401 *MolecularComplexityType*: *TopologicalAtomTripletsFingerprints* and
402 *TopologicalPharmacophoreAtomTripletsFingerprints*. Possible values:
403 *Yes or No*. It determines wheter to apply triangle inequality to
404 distance triplets. Default value:
405 *TopologicalAtomTripletsFingerprints - No*;
406 *TopologicalPharmacophoreAtomTripletsFingerprints - Yes*.
407
408 *DistanceBinSize* parameter is used during
409 *TopologicalPharmacophoreAtomTripletsFingerprints* value of
410 *MolecularComplexityType* and correspons to distance bin size used
411 for binning distances during generation of topological pharmacophore
412 atom triplets fingerprints. Possible value: positive integer.
413 Default value: *2*.
414
415 *NormalizationMethodology* is only used for these values for
416 *MolecularComplexityType*: *ExtendedConnectivityFingerprints*,
417 *TopologicalPharmacophoreAtomPairsFingerprints* and
418 *TopologicalPharmacophoreAtomTripletsFingerprints*. It corresponds
419 to normalization methodology to use for scaling the number of
420 bits-set or unique keys during generation of fingerprints. Possible
421 values during *ExtendedConnectivityFingerprints*: *None or
422 ByHeavyAtomsCount*; Default value: *None*. Possible values during
423 topological pharmacophore atom pairs and tripletes fingerprints:
424 *None or ByPossibleKeysCount*; Default value: *None*.
425 *ByPossibleKeysCount* corresponds to total number of possible
426 topological pharmacophore atom pairs or triplets in a molecule.
427
428 Examples of *MolecularComplexity* name and value parameters:
429
430 MolecularComplexityType,AtomTypesFingerprints,AtomIdentifierType,
431 AtomicInvariantsAtomTypes,AtomicInvariantsToUse,AS X BO H FC
432
433 MolecularComplexityType,ExtendedConnectivityFingerprints,
434 AtomIdentifierType,AtomicInvariantsAtomTypes,
435 AtomicInvariantsToUse,AS X BO H FC MN,NeighborhoodRadius,2,
436 NormalizationMethodology,None
437
438 MolecularComplexityType,MACCSKeys,MACCSKeysSize,166
439
440 MolecularComplexityType,PathLengthFingerprints,AtomIdentifierType,
441 AtomicInvariantsAtomTypes,AtomicInvariantsToUse,AS,MinPathLength,
442 1,MaxPathLength,8,UseBondSymbols,Yes
443
444 MolecularComplexityType,TopologicalAtomPairsFingerprints,
445 AtomIdentifierType,AtomicInvariantsAtomTypes,AtomicInvariantsToUse,
446 AS X BO H FC,MinDistance,1,MaxDistance,10
447
448 MolecularComplexityType,TopologicalAtomTripletsFingerprints,
449 AtomIdentifierType,AtomicInvariantsAtomTypes,AtomicInvariantsToUse,
450 AS X BO H FC,MinDistance,1,MaxDistance,10,UseTriangleInequality,No
451
452 MolecularComplexityType,TopologicalAtomTorsionsFingerprints,
453 AtomIdentifierType,AtomicInvariantsAtomTypes,AtomicInvariantsToUse,
454 AS X BO H FC
455
456 MolecularComplexityType,TopologicalPharmacophoreAtomPairsFingerprints,
457 AtomIdentifierType,FunctionalClassAtomTypes,FunctionalClassesToUse,
458 HBD HBA PI NI H,MinDistance,1,MaxDistance,10,NormalizationMethodology,
459 None
460
461 MolecularComplexityType,TopologicalPharmacophoreAtomTripletsFingerprints,
462 AtomIdentifierType,FunctionalClassAtomTypes,FunctionalClassesToUse,
463 HBD HBA PI NI H Ar,MinDistance,1,MaxDistance,10,NormalizationMethodology,
464 None,UseTriangleInequality,Yes,NormalizationMethodology,None,
465 DistanceBinSize,2
466
467 --OutDelim *comma | tab | semicolon*
468 Delimiter for output CSV/TSV text file(s). Possible values: *comma,
469 tab, or semicolon* Default value: *comma*.
470
471 --output *SD | text | both*
472 Type of output files to generate. Possible values: *SD, text, or
473 both*. Default value: *text*.
474
475 -o, --overwrite
476 Overwrite existing files.
477
478 --Precision *Name,Number,[Name,Number,..]*
479 Precision of calculated property values in the output file: it's a
480 comma delimited list of property name and precision value pairs.
481 Possible property names: *MolecularWeight, ExactMass*. Possible
482 values: positive intergers. Default value: *MolecularWeight,2,
483 ExactMass,4*.
484
485 Examples:
486
487 ExactMass,3
488 MolecularWeight,1,ExactMass,2
489
490 -q, --quote *Yes | No*
491 Put quote around column values in output CSV/TSV text file(s).
492 Possible values: *Yes or No*. Default value: *Yes*.
493
494 -r, --root *RootName*
495 New file name is generated using the root: <Root>.<Ext>. Default for
496 new file names: <SDFileName><PhysicochemicalProperties>.<Ext>. The
497 file type determines <Ext> value. The sdf, csv, and tsv <Ext> values
498 are used for SD, comma/semicolon, and tab delimited text files,
499 respectively.This option is ignored for multiple input files.
500
501 --RotatableBonds *Name,Value, [Name,Value,...]*
502 Parameters to control calculation of rotatable bonds [ Ref 92 ]:
503 it's a comma delimited list of parameter name and value pairs.
504 Possible parameter names: *IgnoreTerminalBonds,
505 IgnoreBondsToTripleBonds, IgnoreAmideBonds, IgnoreThioamideBonds,
506 IgnoreSulfonamideBonds*. Possible parameter values: *Yes or No*. By
507 default, value of all parameters is set to *Yes*.
508
509 --RuleOf3Violations *Yes | No*
510 Specify whether to calculate RuleOf3Violations for SDFile(s).
511 Possible values: *Yes or No*. Default value: *No*.
512
513 For *Yes* value of RuleOf3Violations, in addition to calculating
514 total number of RuleOf3 violations, individual violations for
515 compounds are also written to output files.
516
517 RuleOf3 [ Ref 92 ] states: MolecularWeight <= 300, RotatableBonds <=
518 3, HydrogenBondDonors <= 3, HydrogenBondAcceptors <= 3, logP <= 3,
519 and TPSA <= 60.
520
521 --RuleOf5Violations *Yes | No*
522 Specify whether to calculate RuleOf5Violations for SDFile(s).
523 Possible values: *Yes or No*. Default value: *No*.
524
525 For *Yes* value of RuleOf5Violations, in addition to calculating
526 total number of RuleOf5 violations, individual violations for
527 compounds are also written to output files.
528
529 RuleOf5 [ Ref 91 ] states: MolecularWeight <= 500,
530 HydrogenBondDonors <= 5, HydrogenBondAcceptors <= 10, and logP <= 5.
531
532 --TPSA *Name,Value, [Name,Value,...]*
533 Parameters to control calculation of TPSA: it's a comma delimited
534 list of parameter name and value pairs. Possible parameter names:
535 *IgnorePhosphorus, IgnoreSulfur*. Possible parameter values: *Yes or
536 No*. By default, value of all parameters is set to *Yes*.
537
538 By default, TPSA atom contributions from Phosphorus and Sulfur atoms
539 are not included during TPSA calculations. [ Ref 91 ]
540
541 -w, --WorkingDir *DirName*
542 Location of working directory. Default value: current directory.
543
544 EXAMPLES
545 To calculate default set of physicochemical properties -
546 MolecularWeight, HeavyAtoms, MolecularVolume, RotatableBonds,
547 HydrogenBondDonor, HydrogenBondAcceptors, SLogP, TPSA - and generate a
548 SamplePhysicochemicalProperties.csv file containing sequential compound
549 IDs along with properties data, type:
550
551 % CalculatePhysicochemicalProperties.pl -o Sample.sdf
552
553 To calculate all available physicochemical properties and generate both
554 SampleAllProperties.csv and SampleAllProperties.sdf files containing
555 sequential compound IDs in CSV file along with properties data, type:
556
557 % CalculatePhysicochemicalProperties.pl -m All --output both
558 -r SampleAllProperties -o Sample.sdf
559
560 To calculate RuleOf5 physicochemical properties and generate a
561 SampleRuleOf5Properties.csv file containing sequential compound IDs
562 along with properties data, type:
563
564 % CalculatePhysicochemicalProperties.pl -m RuleOf5
565 -r SampleRuleOf5Properties -o Sample.sdf
566
567 To calculate RuleOf5 physicochemical properties along with counting
568 RuleOf5 violations and generate a SampleRuleOf5Properties.csv file
569 containing sequential compound IDs along with properties data, type:
570
571 % CalculatePhysicochemicalProperties.pl -m RuleOf5 --RuleOf5Violations Yes
572 -r SampleRuleOf5Properties -o Sample.sdf
573
574 To calculate RuleOf3 physicochemical properties and generate a
575 SampleRuleOf3Properties.csv file containing sequential compound IDs
576 along with properties data, type:
577
578 % CalculatePhysicochemicalProperties.pl -m RuleOf3
579 -r SampleRuleOf3Properties -o Sample.sdf
580
581 To calculate RuleOf3 physicochemical properties along with counting
582 RuleOf3 violations and generate a SampleRuleOf3Properties.csv file
583 containing sequential compound IDs along with properties data, type:
584
585 % CalculatePhysicochemicalProperties.pl -m RuleOf3 --RuleOf3Violations Yes
586 -r SampleRuleOf3Properties -o Sample.sdf
587
588 To calculate a specific set of physicochemical properties and generate a
589 SampleProperties.csv file containing sequential compound IDs along with
590 properties data, type:
591
592 % CalculatePhysicochemicalProperties.pl -m "Rings,AromaticRings"
593 -r SampleProperties -o Sample.sdf
594
595 To calculate HydrogenBondDonors and HydrogenBondAcceptors using
596 HydrogenBondsType1 definition and generate a SampleProperties.csv file
597 containing sequential compound IDs along with properties data, type:
598
599 % CalculatePhysicochemicalProperties.pl -m "HydrogenBondDonors,HydrogenBondAcceptors"
600 --HydrogenBonds HBondsType1 -r SampleProperties -o Sample.sdf
601
602 To calculate TPSA using sulfur and phosphorus atoms along with nitrogen
603 and oxygen atoms and generate a SampleProperties.csv file containing
604 sequential compound IDs along with properties data, type:
605
606 % CalculatePhysicochemicalProperties.pl -m "TPSA" --TPSA "IgnorePhosphorus,No,
607 IgnoreSulfur,No" -r SampleProperties -o Sample.sdf
608
609 To calculate MolecularComplexity using extendend connectivity
610 fingerprints corresponding to atom neighborhood radius of 2 with atomic
611 invariant atom types without any scaling and generate a
612 SampleProperties.csv file containing sequential compound IDs along with
613 properties data, type:
614
615 % CalculatePhysicochemicalProperties.pl -m MolecularComplexity --MolecularComplexity
616 "MolecularComplexityType,ExtendedConnectivityFingerprints,NeighborhoodRadius,2,
617 AtomIdentifierType, AtomicInvariantsAtomTypes,
618 AtomicInvariantsToUse,AS X BO H FC MN,NormalizationMethodology,None"
619 -r SampleProperties -o Sample.sdf
620
621 To calculate RuleOf5 physicochemical properties along with counting
622 RuleOf5 violations and generate a SampleRuleOf5Properties.csv file
623 containing compound IDs from molecule name line along with properties
624 data, type:
625
626 % CalculatePhysicochemicalProperties.pl -m RuleOf5 --RuleOf5Violations Yes
627 --DataFieldsMode CompoundID --CompoundIDMode MolName
628 -r SampleRuleOf5Properties -o Sample.sdf
629
630 To calculate all available physicochemical properties and generate a
631 SampleAllProperties.csv file containing compound ID using specified data
632 field along with along with properties data, type:
633
634 % CalculatePhysicochemicalProperties.pl -m All
635 --DataFieldsMode CompoundID --CompoundIDMode DataField --CompoundID Mol_ID
636 -r SampleAllProperties -o Sample.sdf
637
638 To calculate all available physicochemical properties and generate a
639 SampleAllProperties.csv file containing compound ID using combination of
640 molecule name line and an explicit compound prefix along with properties
641 data, type:
642
643 % CalculatePhysicochemicalProperties.pl -m All
644 --DataFieldsMode CompoundID --CompoundIDMode MolnameOrLabelPrefix
645 --CompoundID Cmpd --CompoundIDLabel MolID -r SampleAllProperties
646 -o Sample.sdf
647
648 To calculate all available physicochemical properties and generate a
649 SampleAllProperties.csv file containing specific data fields columns
650 along with with properties data, type:
651
652 % CalculatePhysicochemicalProperties.pl -m All
653 --DataFieldsMode Specify --DataFields Mol_ID -r SampleAllProperties
654 -o Sample.sdf
655
656 To calculate all available physicochemical properties and generate a
657 SampleAllProperties.csv file containing common data fields columns along
658 with with properties data, type:
659
660 % CalculatePhysicochemicalProperties.pl -m All
661 --DataFieldsMode Common -r SampleAllProperties -o Sample.sdf
662
663 To calculate all available physicochemical properties and generate both
664 SampleAllProperties.csv and CSV files containing all data fields columns
665 in CSV files along with with properties data, type:
666
667 % CalculatePhysicochemicalProperties.pl -m All
668 --DataFieldsMode All --output both -r SampleAllProperties
669 -o Sample.sdf
670
671 AUTHOR
672 Manish Sud <msud@san.rr.com>
673
674 SEE ALSO
675 ExtractFromSDtFiles.pl, ExtractFromTextFiles.pl, InfoSDFiles.pl,
676 InfoTextFiles.pl
677
678 COPYRIGHT
679 Copyright (C) 2015 Manish Sud. All rights reserved.
680
681 This file is part of MayaChemTools.
682
683 MayaChemTools is free software; you can redistribute it and/or modify it
684 under the terms of the GNU Lesser General Public License as published by
685 the Free Software Foundation; either version 3 of the License, or (at
686 your option) any later version.
687