| 0 | 1 NAME | 
|  | 2     CalculatePhysicochemicalProperties.pl - Calculate physicochemical | 
|  | 3     properties for SD files | 
|  | 4 | 
|  | 5 SYNOPSIS | 
|  | 6     CalculatePhysicochemicalProperties.pl SDFile(s)... | 
|  | 7 | 
|  | 8     PhysicochemicalProperties.pl [--AromaticityModel *AromaticityModelType*] | 
|  | 9     [--CompoundID DataFieldName or LabelPrefixString] [--CompoundIDLabel | 
|  | 10     text] [--CompoundIDMode] [--DataFields "FieldLabel1, FieldLabel2,..."] | 
|  | 11     [-d, --DataFieldsMode All | Common | Specify | CompoundID] [-f, --Filter | 
|  | 12     Yes | No] [-h, --help] [--HydrogenBonds HBondsType1 | HBondsType2] [-k, | 
|  | 13     --KeepLargestComponent Yes | No] [-m, --mode All | RuleOf5 | RuleOf3 | | 
|  | 14     "name1, [name2,...]"] [--MolecularComplexity *Name,Value, | 
|  | 15     [Name,Value,...]*] [--OutDelim comma | tab | semicolon] [--output SD | | 
|  | 16     text | both] [-o, --overwrite] [--Precision | 
|  | 17     Name,Number,[Name,Number,..]] [--RotatableBonds Name,Value, | 
|  | 18     [Name,Value,...]] [--RuleOf3Violations Yes | No] [--RuleOf5Violations | 
|  | 19     Yes | No] [-q, --quote Yes | No] [-r, --root RootName] [-w, --WorkingDir | 
|  | 20     dirname] SDFile(s)... | 
|  | 21 | 
|  | 22 DESCRIPTION | 
|  | 23     Calculate physicochemical properties for *SDFile(s)* and create | 
|  | 24     appropriate SD or CSV/TSV text file(s) containing calculated properties. | 
|  | 25 | 
|  | 26     The current release of MayaChemTools supports the calculation of these | 
|  | 27     physicochemical properties: | 
|  | 28 | 
|  | 29         MolecularWeight, ExactMass, HeavyAtoms, Rings, AromaticRings, | 
|  | 30         van der Waals MolecularVolume [ Ref 93 ], RotatableBonds, | 
|  | 31         HydrogenBondDonors, HydrogenBondAcceptors, LogP and | 
|  | 32         Molar Refractivity (SLogP and SMR) [ Ref 89 ], Topological Polar | 
|  | 33         Surface Area (TPSA) [ Ref 90 ], Fraction of SP3 carbons (Fsp3Carbons) | 
|  | 34         and SP3 carbons (Sp3Carbons) [ Ref 115-116, Ref 119 ], | 
|  | 35         MolecularComplexity [ Ref 117-119 ] | 
|  | 36 | 
|  | 37     Multiple SDFile names are separated by spaces. The valid file extensions | 
|  | 38     are *.sdf* and *.sd*. All other file names are ignored. All the SD files | 
|  | 39     in a current directory can be specified either by **.sdf* or the current | 
|  | 40     directory name. | 
|  | 41 | 
|  | 42     The calculation of molecular complexity using *MolecularComplexityType* | 
|  | 43     parameter corresponds to the number of bits-set or unique keys [ Ref | 
|  | 44     117-119 ] in molecular fingerprints. Default value for | 
|  | 45     *MolecularComplexityType*: *MACCSKeys* of size 166. The calculation of | 
|  | 46     MACCSKeys is relatively expensive and can take rather substantial amount | 
|  | 47     of time. | 
|  | 48 | 
|  | 49 OPTIONS | 
|  | 50     --AromaticityModel *MDLAromaticityModel | TriposAromaticityModel | | 
|  | 51     MMFFAromaticityModel | ChemAxonBasicAromaticityModel | | 
|  | 52     ChemAxonGeneralAromaticityModel | DaylightAromaticityModel | | 
|  | 53     MayaChemToolsAromaticityModel* | 
|  | 54         Specify aromaticity model to use during detection of aromaticity. | 
|  | 55         Possible values in the current release are: *MDLAromaticityModel, | 
|  | 56         TriposAromaticityModel, MMFFAromaticityModel, | 
|  | 57         ChemAxonBasicAromaticityModel, ChemAxonGeneralAromaticityModel, | 
|  | 58         DaylightAromaticityModel or MayaChemToolsAromaticityModel*. Default | 
|  | 59         value: *MayaChemToolsAromaticityModel*. | 
|  | 60 | 
|  | 61         The supported aromaticity model names along with model specific | 
|  | 62         control parameters are defined in AromaticityModelsData.csv, which | 
|  | 63         is distributed with the current release and is available under | 
|  | 64         lib/data directory. Molecule.pm module retrieves data from this file | 
|  | 65         during class instantiation and makes it available to method | 
|  | 66         DetectAromaticity for detecting aromaticity corresponding to a | 
|  | 67         specific model. | 
|  | 68 | 
|  | 69     --CompoundID *DataFieldName or LabelPrefixString* | 
|  | 70         This value is --CompoundIDMode specific and indicates how compound | 
|  | 71         ID is generated. | 
|  | 72 | 
|  | 73         For *DataField* value of --CompoundIDMode option, it corresponds to | 
|  | 74         datafield label name whose value is used as compound ID; otherwise, | 
|  | 75         it's a prefix string used for generating compound IDs like | 
|  | 76         LabelPrefixString<Number>. Default value, *Cmpd*, generates compound | 
|  | 77         IDs which look like Cmpd<Number>. | 
|  | 78 | 
|  | 79         Examples for *DataField* value of --CompoundIDMode: | 
|  | 80 | 
|  | 81             MolID | 
|  | 82             ExtReg | 
|  | 83 | 
|  | 84         Examples for *LabelPrefix* or *MolNameOrLabelPrefix* value of | 
|  | 85         --CompoundIDMode: | 
|  | 86 | 
|  | 87             Compound | 
|  | 88 | 
|  | 89         The value specified above generates compound IDs which correspond to | 
|  | 90         Compound<Number> instead of default value of Cmpd<Number>. | 
|  | 91 | 
|  | 92     --CompoundIDLabel *text* | 
|  | 93         Specify compound ID column label for CSV/TSV text file(s) used | 
|  | 94         during *CompoundID* value of --DataFieldsMode option. Default value: | 
|  | 95         *CompoundID*. | 
|  | 96 | 
|  | 97     --CompoundIDMode *DataField | MolName | LabelPrefix | | 
|  | 98     MolNameOrLabelPrefix* | 
|  | 99         Specify how to generate compound IDs and write to CSV/TSV text | 
|  | 100         file(s) along with calculated physicochemical properties for *text | | 
|  | 101         both* values of --output option: use a *SDFile(s)* datafield value; | 
|  | 102         use molname line from *SDFile(s)*; generate a sequential ID with | 
|  | 103         specific prefix; use combination of both MolName and LabelPrefix | 
|  | 104         with usage of LabelPrefix values for empty molname lines. | 
|  | 105 | 
|  | 106         Possible values: *DataField | MolName | LabelPrefix | | 
|  | 107         MolNameOrLabelPrefix*. Default value: *LabelPrefix*. | 
|  | 108 | 
|  | 109         For *MolNameAndLabelPrefix* value of --CompoundIDMode, molname line | 
|  | 110         in *SDFile(s)* takes precedence over sequential compound IDs | 
|  | 111         generated using *LabelPrefix* and only empty molname values are | 
|  | 112         replaced with sequential compound IDs. | 
|  | 113 | 
|  | 114         This is only used for *CompoundID* value of --DataFieldsMode option. | 
|  | 115 | 
|  | 116     --DataFields *"FieldLabel1,FieldLabel2,..."* | 
|  | 117         Comma delimited list of *SDFiles(s)* data fields to extract and | 
|  | 118         write to CSV/TSV text file(s) along with calculated physicochemical | 
|  | 119         properties for *text | both* values of --output option. | 
|  | 120 | 
|  | 121         This is only used for *Specify* value of --DataFieldsMode option. | 
|  | 122 | 
|  | 123         Examples: | 
|  | 124 | 
|  | 125             Extreg | 
|  | 126             MolID,CompoundName | 
|  | 127 | 
|  | 128     -d, --DataFieldsMode *All | Common | Specify | CompoundID* | 
|  | 129         Specify how data fields in *SDFile(s)* are transferred to output | 
|  | 130         CSV/TSV text file(s) along with calculated physicochemical | 
|  | 131         properties for *text | both* values of --output option: transfer all | 
|  | 132         SD data field; transfer SD data files common to all compounds; | 
|  | 133         extract specified data fields; generate a compound ID using molname | 
|  | 134         line, a compound prefix, or a combination of both. Possible values: | 
|  | 135         *All | Common | specify | CompoundID*. Default value: *CompoundID*. | 
|  | 136 | 
|  | 137     -f, --Filter *Yes | No* | 
|  | 138         Specify whether to check and filter compound data in SDFile(s). | 
|  | 139         Possible values: *Yes or No*. Default value: *Yes*. | 
|  | 140 | 
|  | 141         By default, compound data is checked before calculating | 
|  | 142         physiochemical properties and compounds containing atom data | 
|  | 143         corresponding to non-element symbols or no atom data are ignored. | 
|  | 144 | 
|  | 145     -h, --help | 
|  | 146         Print this help message. | 
|  | 147 | 
|  | 148     --HydrogenBonds *HBondsType1 | HBondsType2* | 
|  | 149         Parameters to control calculation of hydrogen bond donors and | 
|  | 150         acceptors. Possible values: *HBondsType1, HydrogenBondsType1, | 
|  | 151         HBondsType2, HydrogenBondsType2*. Default value: *HBondsType2* which | 
|  | 152         corresponds to RuleOf5 definition for number of hydrogen bond donors | 
|  | 153         and acceptors. | 
|  | 154 | 
|  | 155         The current release of MayaChemTools supports identification of two | 
|  | 156         types of hydrogen bond donor and acceptor atoms with these names: | 
|  | 157 | 
|  | 158             HBondsType1 or HydrogenBondsType1 | 
|  | 159             HBondsType2 or HydrogenBondsType2 | 
|  | 160 | 
|  | 161         The names of these hydrogen bond types are rather arbitrary. | 
|  | 162         However, their definitions have specific meaning and are as follows: | 
|  | 163 | 
|  | 164             HydrogenBondsType1 [ Ref 60-61, Ref 65-66 ]: | 
|  | 165 | 
|  | 166                 Donor: NH, NH2, OH - Any N and O with available H | 
|  | 167                 Acceptor: N[!H], O - Any N without available H and any O | 
|  | 168 | 
|  | 169             HydrogenBondsType2 [ Ref 91 ]: | 
|  | 170 | 
|  | 171                 Donor: NH, NH2, OH - N and O with available H | 
|  | 172                 Acceptor: N, O - And N and O | 
|  | 173 | 
|  | 174     -k, --KeepLargestComponent *Yes | No* | 
|  | 175         Calculate physicochemical properties for only the largest component | 
|  | 176         in molecule. Possible values: *Yes or No*. Default value: *Yes*. | 
|  | 177 | 
|  | 178         For molecules containing multiple connected components, | 
|  | 179         physicochemical properties can be calculated in two different ways: | 
|  | 180         use all connected components or just the largest connected | 
|  | 181         component. By default, all atoms except for the largest connected | 
|  | 182         component are deleted before calculation of physicochemical | 
|  | 183         properties. | 
|  | 184 | 
|  | 185     -m, --mode *All | RuleOf5 | RuleOf3 | "name1, [name2,...]"* | 
|  | 186         Specify physicochemical properties to calculate for SDFile(s): | 
|  | 187         calculate all available physical chemical properties; calculate | 
|  | 188         properties corresponding to Rule of 5; or use a comma delimited list | 
|  | 189         of supported physicochemical properties. Possible values: *All | | 
|  | 190         RuleOf5 | RuleOf3 | "name1, [name2,...]"*. | 
|  | 191 | 
|  | 192         Default value: *MolecularWeight, HeavyAtoms, MolecularVolume, | 
|  | 193         RotatableBonds, HydrogenBondDonors, HydrogenBondAcceptors, SLogP, | 
|  | 194         TPSA*. These properties are calculated by default. | 
|  | 195 | 
|  | 196         *RuleOf5* [ Ref 91 ] includes these properties: *MolecularWeight, | 
|  | 197         HydrogenBondDonors, HydrogenBondAcceptors, SLogP*. *RuleOf5* states: | 
|  | 198         MolecularWeight <= 500, HydrogenBondDonors <= 5, | 
|  | 199         HydrogenBondAcceptors <= 10, and logP <= 5. | 
|  | 200 | 
|  | 201         *RuleOf3* [ Ref 92 ] includes these properties: *MolecularWeight, | 
|  | 202         RotatableBonds, HydrogenBondDonors, HydrogenBondAcceptors, SLogP, | 
|  | 203         TPSA*. *RuleOf3* states: MolecularWeight <= 300, RotatableBonds <= | 
|  | 204         3, HydrogenBondDonors <= 3, HydrogenBondAcceptors <= 3, logP <= 3, | 
|  | 205         and TPSA <= 60. | 
|  | 206 | 
|  | 207         *All* calculates all supported physicochemical properties: | 
|  | 208         *MolecularWeight, ExactMass, HeavyAtoms, Rings, AromaticRings, | 
|  | 209         MolecularVolume, RotatableBonds, HydrogenBondDonors, | 
|  | 210         HydrogenBondAcceptors, SLogP, SMR, TPSA, Fsp3Carbons, Sp3Carbons, | 
|  | 211         MolecularComplexity*. | 
|  | 212 | 
|  | 213     --MolecularComplexity *Name,Value, [Name,Value,...]* | 
|  | 214         Parameters to control calculation of molecular complexity: it's a | 
|  | 215         comma delimited list of parameter name and value pairs. | 
|  | 216 | 
|  | 217         Possible parameter names: *MolecularComplexityType, | 
|  | 218         AtomIdentifierType, AtomicInvariantsToUse, FunctionalClassesToUse, | 
|  | 219         MACCSKeysSize, NeighborhoodRadius, MinPathLength, MaxPathLength, | 
|  | 220         UseBondSymbols, MinDistance, MaxDistance, UseTriangleInequality, | 
|  | 221         DistanceBinSize, NormalizationMethodology*. | 
|  | 222 | 
|  | 223         The valid paramater valuse for each parameter name are described in | 
|  | 224         the following sections. | 
|  | 225 | 
|  | 226         The current release of MayaChemTools supports calculation of | 
|  | 227         molecular complexity using *MolecularComplexityType* parameter | 
|  | 228         corresponding to the number of bits-set or unique keys [ Ref 117-119 | 
|  | 229         ] in molecular fingerprints. The valid values for | 
|  | 230         *MolecularComplexityType* are: | 
|  | 231 | 
|  | 232             AtomTypesFingerprints | 
|  | 233             ExtendedConnectivityFingerprints | 
|  | 234             MACCSKeys | 
|  | 235             PathLengthFingerprints | 
|  | 236             TopologicalAtomPairsFingerprints | 
|  | 237             TopologicalAtomTripletsFingerprints | 
|  | 238             TopologicalAtomTorsionsFingerprints | 
|  | 239             TopologicalPharmacophoreAtomPairsFingerprints | 
|  | 240             TopologicalPharmacophoreAtomTripletsFingerprints | 
|  | 241 | 
|  | 242         Default value for *MolecularComplexityType*: *MACCSKeys*. | 
|  | 243 | 
|  | 244         *AtomIdentifierType* parameter name correspods to atom types used | 
|  | 245         during generation of fingerprints. The valid values for | 
|  | 246         *AtomIdentifierType* are: *AtomicInvariantsAtomTypes, | 
|  | 247         DREIDINGAtomTypes, EStateAtomTypes, FunctionalClassAtomTypes, | 
|  | 248         MMFF94AtomTypes, SLogPAtomTypes, SYBYLAtomTypes, TPSAAtomTypes, | 
|  | 249         UFFAtomTypes*. *AtomicInvariantsAtomTypes* is not supported for | 
|  | 250         during the following values of *MolecularComplexityType*: | 
|  | 251         *MACCSKeys, TopologicalPharmacophoreAtomPairsFingerprints, | 
|  | 252         TopologicalPharmacophoreAtomTripletsFingerprints*. | 
|  | 253         *FunctionalClassAtomTypes* is the only valid value for | 
|  | 254         *AtomIdentifierType* for topological pharmacophore fingerprints. | 
|  | 255 | 
|  | 256         Default value for *AtomIdentifierType*: *AtomicInvariantsAtomTypes* | 
|  | 257         for all except topological pharmacophore fingerprints where it is | 
|  | 258         *FunctionalClassAtomTypes*. | 
|  | 259 | 
|  | 260         *AtomicInvariantsToUse* parameter name and values are used during | 
|  | 261         *AtomicInvariantsAtomTypes* value of parameter *AtomIdentifierType*. | 
|  | 262         It's a list of space separated valid atomic invariant atom types. | 
|  | 263 | 
|  | 264         Possible values for atomic invariants are: *AS, X, BO, LBO, SB, DB, | 
|  | 265         TB, H, Ar, RA, FC, MN, SM*. Default value for | 
|  | 266         *AtomicInvariantsToUse* parameter are set differently for different | 
|  | 267         fingerprints using *MolecularComplexityType* parameter as shown | 
|  | 268         below: | 
|  | 269 | 
|  | 270             MolecularComplexityType              AtomicInvariantsToUse | 
|  | 271 | 
|  | 272             AtomTypesFingerprints                AS X BO H FC | 
|  | 273             TopologicalAtomPairsFingerprints     AS X BO H FC | 
|  | 274             TopologicalAtomTripletsFingerprints  AS X BO H FC | 
|  | 275             TopologicalAtomTorsionsFingerprints  AS X BO H FC | 
|  | 276 | 
|  | 277             ExtendedConnectivityFingerprints     AS X  BO H FC MN | 
|  | 278             PathLengthFingerprints               AS | 
|  | 279 | 
|  | 280         The atomic invariants abbreviations correspond to: | 
|  | 281 | 
|  | 282             AS = Atom symbol corresponding to element symbol | 
|  | 283 | 
|  | 284             X<n>   = Number of non-hydrogen atom neighbors or heavy atoms | 
|  | 285             BO<n> = Sum of bond orders to non-hydrogen atom neighbors or heavy atoms | 
|  | 286             LBO<n> = Largest bond order of non-hydrogen atom neighbors or heavy atoms | 
|  | 287             SB<n> = Number of single bonds to non-hydrogen atom neighbors or heavy atoms | 
|  | 288             DB<n> = Number of double bonds to non-hydrogen atom neighbors or heavy atoms | 
|  | 289             TB<n> = Number of triple bonds to non-hydrogen atom neighbors or heavy atoms | 
|  | 290             H<n>   = Number of implicit and explicit hydrogens for atom | 
|  | 291             Ar     = Aromatic annotation indicating whether atom is aromatic | 
|  | 292             RA     = Ring atom annotation indicating whether atom is a ring | 
|  | 293             FC<+n/-n> = Formal charge assigned to atom | 
|  | 294             MN<n> = Mass number indicating isotope other than most abundant isotope | 
|  | 295             SM<n> = Spin multiplicity of atom. Possible values: 1 (singlet), 2 (doublet) or | 
|  | 296                     3 (triplet) | 
|  | 297 | 
|  | 298         Atom type generated by AtomTypes::AtomicInvariantsAtomTypes class | 
|  | 299         corresponds to: | 
|  | 300 | 
|  | 301             AS.X<n>.BO<n>.LBO<n>.<SB><n>.<DB><n>.<TB><n>.H<n>.Ar.RA.FC<+n/-n>.MN<n>.SM<n> | 
|  | 302 | 
|  | 303         Except for AS which is a required atomic invariant in atom types, | 
|  | 304         all other atomic invariants are optional. Atom type specification | 
|  | 305         doesn't include atomic invariants with zero or undefined values. | 
|  | 306 | 
|  | 307         In addition to usage of abbreviations for specifying atomic | 
|  | 308         invariants, the following descriptive words are also allowed: | 
|  | 309 | 
|  | 310             X : NumOfNonHydrogenAtomNeighbors or NumOfHeavyAtomNeighbors | 
|  | 311             BO : SumOfBondOrdersToNonHydrogenAtoms or SumOfBondOrdersToHeavyAtoms | 
|  | 312             LBO : LargestBondOrderToNonHydrogenAtoms or LargestBondOrderToHeavyAtoms | 
|  | 313             SB :  NumOfSingleBondsToNonHydrogenAtoms or NumOfSingleBondsToHeavyAtoms | 
|  | 314             DB : NumOfDoubleBondsToNonHydrogenAtoms or NumOfDoubleBondsToHeavyAtoms | 
|  | 315             TB : NumOfTripleBondsToNonHydrogenAtoms or NumOfTripleBondsToHeavyAtoms | 
|  | 316             H :  NumOfImplicitAndExplicitHydrogens | 
|  | 317             Ar : Aromatic | 
|  | 318             RA : RingAtom | 
|  | 319             FC : FormalCharge | 
|  | 320             MN : MassNumber | 
|  | 321             SM : SpinMultiplicity | 
|  | 322 | 
|  | 323         *AtomTypes::AtomicInvariantsAtomTypes* module is used to assign | 
|  | 324         atomic invariant atom types. | 
|  | 325 | 
|  | 326         *FunctionalClassesToUse* parameter name and values are used during | 
|  | 327         *FunctionalClassAtomTypes* value of parameter *AtomIdentifierType*. | 
|  | 328         It's a list of space separated valid atomic invariant atom types. | 
|  | 329 | 
|  | 330         Possible values for atom functional classes are: *Ar, CA, H, HBA, | 
|  | 331         HBD, Hal, NI, PI, RA*. | 
|  | 332 | 
|  | 333         Default value for *FunctionalClassesToUse* parameter is set to: | 
|  | 334 | 
|  | 335             HBD HBA PI NI Ar Hal | 
|  | 336 | 
|  | 337         for all fingerprints except for the following two | 
|  | 338         *MolecularComplexityType* fingerints: | 
|  | 339 | 
|  | 340             MolecularComplexityType                           FunctionalClassesToUse | 
|  | 341 | 
|  | 342             TopologicalPharmacophoreAtomPairsFingerprints     HBD HBA P, NI H | 
|  | 343             TopologicalPharmacophoreAtomTripletsFingerprints  HBD HBA PI NI H Ar | 
|  | 344 | 
|  | 345         The functional class abbreviations correspond to: | 
|  | 346 | 
|  | 347             HBD: HydrogenBondDonor | 
|  | 348             HBA: HydrogenBondAcceptor | 
|  | 349             PI :  PositivelyIonizable | 
|  | 350             NI : NegativelyIonizable | 
|  | 351             Ar : Aromatic | 
|  | 352             Hal : Halogen | 
|  | 353             H : Hydrophobic | 
|  | 354             RA : RingAtom | 
|  | 355             CA : ChainAtom | 
|  | 356 | 
|  | 357          Functional class atom type specification for an atom corresponds to: | 
|  | 358 | 
|  | 359             Ar.CA.H.HBA.HBD.Hal.NI.PI.RA | 
|  | 360 | 
|  | 361         *AtomTypes::FunctionalClassAtomTypes* module is used to assign | 
|  | 362         functional class atom types. It uses following definitions [ Ref | 
|  | 363         60-61, Ref 65-66 ]: | 
|  | 364 | 
|  | 365             HydrogenBondDonor: NH, NH2, OH | 
|  | 366             HydrogenBondAcceptor: N[!H], O | 
|  | 367             PositivelyIonizable: +, NH2 | 
|  | 368             NegativelyIonizable: -, C(=O)OH, S(=O)OH, P(=O)OH | 
|  | 369 | 
|  | 370         *MACCSKeysSize* parameter name is only used during *MACCSKeys* value | 
|  | 371         of *MolecularComplexityType* and corresponds to the size of MACCS | 
|  | 372         key set. Possible values: *166 or 322*. Default value: *166*. | 
|  | 373 | 
|  | 374         *NeighborhoodRadius* parameter name is only used during | 
|  | 375         *ExtendedConnectivityFingerprints* value of | 
|  | 376         *MolecularComplexityType* and corresponds to atomic neighborhoods | 
|  | 377         radius for generating extended connectivity fingerprints. Possible | 
|  | 378         values: positive integer. Default value: *2*. | 
|  | 379 | 
|  | 380         *MinPathLength* and *MaxPathLength* parameters are only used during | 
|  | 381         *PathLengthFingerprints* value of *MolecularComplexityType* and | 
|  | 382         correspond to minimum and maximum path lengths to use for generating | 
|  | 383         path length fingerprints. Possible values: positive integers. | 
|  | 384         Default value: *MinPathLength - 1*; *MaxPathLength - 8*. | 
|  | 385 | 
|  | 386         *UseBondSymbols* parameter is only used during | 
|  | 387         *PathLengthFingerprints* value of *MolecularComplexityType* and | 
|  | 388         indicates whether bond symbols are included in atom path strings | 
|  | 389         used to generate path length fingerprints. Possible value: *Yes or | 
|  | 390         No*. Default value: *Yes*. | 
|  | 391 | 
|  | 392         *MinDistance* and *MaxDistance* parameters are only used during | 
|  | 393         *TopologicalAtomPairsFingerprints* and | 
|  | 394         *TopologicalAtomTripletsFingerprints* values of | 
|  | 395         *MolecularComplexityType* and correspond to minimum and maximum bond | 
|  | 396         distance between atom pairs during topological pharmacophore | 
|  | 397         fingerprints. Possible values: positive integers. Default value: | 
|  | 398         *MinDistance - 1*; *MaxDistance - 10*. | 
|  | 399 | 
|  | 400         *UseTriangleInequality* parameter is used during these values for | 
|  | 401         *MolecularComplexityType*: *TopologicalAtomTripletsFingerprints* and | 
|  | 402         *TopologicalPharmacophoreAtomTripletsFingerprints*. Possible values: | 
|  | 403         *Yes or No*. It determines wheter to apply triangle inequality to | 
|  | 404         distance triplets. Default value: | 
|  | 405         *TopologicalAtomTripletsFingerprints - No*; | 
|  | 406         *TopologicalPharmacophoreAtomTripletsFingerprints - Yes*. | 
|  | 407 | 
|  | 408         *DistanceBinSize* parameter is used during | 
|  | 409         *TopologicalPharmacophoreAtomTripletsFingerprints* value of | 
|  | 410         *MolecularComplexityType* and correspons to distance bin size used | 
|  | 411         for binning distances during generation of topological pharmacophore | 
|  | 412         atom triplets fingerprints. Possible value: positive integer. | 
|  | 413         Default value: *2*. | 
|  | 414 | 
|  | 415         *NormalizationMethodology* is only used for these values for | 
|  | 416         *MolecularComplexityType*: *ExtendedConnectivityFingerprints*, | 
|  | 417         *TopologicalPharmacophoreAtomPairsFingerprints* and | 
|  | 418         *TopologicalPharmacophoreAtomTripletsFingerprints*. It corresponds | 
|  | 419         to normalization methodology to use for scaling the number of | 
|  | 420         bits-set or unique keys during generation of fingerprints. Possible | 
|  | 421         values during *ExtendedConnectivityFingerprints*: *None or | 
|  | 422         ByHeavyAtomsCount*; Default value: *None*. Possible values during | 
|  | 423         topological pharmacophore atom pairs and tripletes fingerprints: | 
|  | 424         *None or ByPossibleKeysCount*; Default value: *None*. | 
|  | 425         *ByPossibleKeysCount* corresponds to total number of possible | 
|  | 426         topological pharmacophore atom pairs or triplets in a molecule. | 
|  | 427 | 
|  | 428         Examples of *MolecularComplexity* name and value parameters: | 
|  | 429 | 
|  | 430             MolecularComplexityType,AtomTypesFingerprints,AtomIdentifierType, | 
|  | 431             AtomicInvariantsAtomTypes,AtomicInvariantsToUse,AS X BO H FC | 
|  | 432 | 
|  | 433             MolecularComplexityType,ExtendedConnectivityFingerprints, | 
|  | 434             AtomIdentifierType,AtomicInvariantsAtomTypes, | 
|  | 435             AtomicInvariantsToUse,AS X BO H FC MN,NeighborhoodRadius,2, | 
|  | 436             NormalizationMethodology,None | 
|  | 437 | 
|  | 438             MolecularComplexityType,MACCSKeys,MACCSKeysSize,166 | 
|  | 439 | 
|  | 440             MolecularComplexityType,PathLengthFingerprints,AtomIdentifierType, | 
|  | 441             AtomicInvariantsAtomTypes,AtomicInvariantsToUse,AS,MinPathLength, | 
|  | 442             1,MaxPathLength,8,UseBondSymbols,Yes | 
|  | 443 | 
|  | 444             MolecularComplexityType,TopologicalAtomPairsFingerprints, | 
|  | 445             AtomIdentifierType,AtomicInvariantsAtomTypes,AtomicInvariantsToUse, | 
|  | 446             AS X BO H FC,MinDistance,1,MaxDistance,10 | 
|  | 447 | 
|  | 448             MolecularComplexityType,TopologicalAtomTripletsFingerprints, | 
|  | 449             AtomIdentifierType,AtomicInvariantsAtomTypes,AtomicInvariantsToUse, | 
|  | 450             AS X BO H FC,MinDistance,1,MaxDistance,10,UseTriangleInequality,No | 
|  | 451 | 
|  | 452             MolecularComplexityType,TopologicalAtomTorsionsFingerprints, | 
|  | 453             AtomIdentifierType,AtomicInvariantsAtomTypes,AtomicInvariantsToUse, | 
|  | 454             AS X BO H FC | 
|  | 455 | 
|  | 456             MolecularComplexityType,TopologicalPharmacophoreAtomPairsFingerprints, | 
|  | 457             AtomIdentifierType,FunctionalClassAtomTypes,FunctionalClassesToUse, | 
|  | 458             HBD HBA PI NI H,MinDistance,1,MaxDistance,10,NormalizationMethodology, | 
|  | 459             None | 
|  | 460 | 
|  | 461             MolecularComplexityType,TopologicalPharmacophoreAtomTripletsFingerprints, | 
|  | 462             AtomIdentifierType,FunctionalClassAtomTypes,FunctionalClassesToUse, | 
|  | 463             HBD HBA PI NI H Ar,MinDistance,1,MaxDistance,10,NormalizationMethodology, | 
|  | 464             None,UseTriangleInequality,Yes,NormalizationMethodology,None, | 
|  | 465             DistanceBinSize,2 | 
|  | 466 | 
|  | 467     --OutDelim *comma | tab | semicolon* | 
|  | 468         Delimiter for output CSV/TSV text file(s). Possible values: *comma, | 
|  | 469         tab, or semicolon* Default value: *comma*. | 
|  | 470 | 
|  | 471     --output *SD | text | both* | 
|  | 472         Type of output files to generate. Possible values: *SD, text, or | 
|  | 473         both*. Default value: *text*. | 
|  | 474 | 
|  | 475     -o, --overwrite | 
|  | 476         Overwrite existing files. | 
|  | 477 | 
|  | 478     --Precision *Name,Number,[Name,Number,..]* | 
|  | 479         Precision of calculated property values in the output file: it's a | 
|  | 480         comma delimited list of property name and precision value pairs. | 
|  | 481         Possible property names: *MolecularWeight, ExactMass*. Possible | 
|  | 482         values: positive intergers. Default value: *MolecularWeight,2, | 
|  | 483         ExactMass,4*. | 
|  | 484 | 
|  | 485         Examples: | 
|  | 486 | 
|  | 487             ExactMass,3 | 
|  | 488             MolecularWeight,1,ExactMass,2 | 
|  | 489 | 
|  | 490     -q, --quote *Yes | No* | 
|  | 491         Put quote around column values in output CSV/TSV text file(s). | 
|  | 492         Possible values: *Yes or No*. Default value: *Yes*. | 
|  | 493 | 
|  | 494     -r, --root *RootName* | 
|  | 495         New file name is generated using the root: <Root>.<Ext>. Default for | 
|  | 496         new file names: <SDFileName><PhysicochemicalProperties>.<Ext>. The | 
|  | 497         file type determines <Ext> value. The sdf, csv, and tsv <Ext> values | 
|  | 498         are used for SD, comma/semicolon, and tab delimited text files, | 
|  | 499         respectively.This option is ignored for multiple input files. | 
|  | 500 | 
|  | 501     --RotatableBonds *Name,Value, [Name,Value,...]* | 
|  | 502         Parameters to control calculation of rotatable bonds [ Ref 92 ]: | 
|  | 503         it's a comma delimited list of parameter name and value pairs. | 
|  | 504         Possible parameter names: *IgnoreTerminalBonds, | 
|  | 505         IgnoreBondsToTripleBonds, IgnoreAmideBonds, IgnoreThioamideBonds, | 
|  | 506         IgnoreSulfonamideBonds*. Possible parameter values: *Yes or No*. By | 
|  | 507         default, value of all parameters is set to *Yes*. | 
|  | 508 | 
|  | 509     --RuleOf3Violations *Yes | No* | 
|  | 510         Specify whether to calculate RuleOf3Violations for SDFile(s). | 
|  | 511         Possible values: *Yes or No*. Default value: *No*. | 
|  | 512 | 
|  | 513         For *Yes* value of RuleOf3Violations, in addition to calculating | 
|  | 514         total number of RuleOf3 violations, individual violations for | 
|  | 515         compounds are also written to output files. | 
|  | 516 | 
|  | 517         RuleOf3 [ Ref 92 ] states: MolecularWeight <= 300, RotatableBonds <= | 
|  | 518         3, HydrogenBondDonors <= 3, HydrogenBondAcceptors <= 3, logP <= 3, | 
|  | 519         and TPSA <= 60. | 
|  | 520 | 
|  | 521     --RuleOf5Violations *Yes | No* | 
|  | 522         Specify whether to calculate RuleOf5Violations for SDFile(s). | 
|  | 523         Possible values: *Yes or No*. Default value: *No*. | 
|  | 524 | 
|  | 525         For *Yes* value of RuleOf5Violations, in addition to calculating | 
|  | 526         total number of RuleOf5 violations, individual violations for | 
|  | 527         compounds are also written to output files. | 
|  | 528 | 
|  | 529         RuleOf5 [ Ref 91 ] states: MolecularWeight <= 500, | 
|  | 530         HydrogenBondDonors <= 5, HydrogenBondAcceptors <= 10, and logP <= 5. | 
|  | 531 | 
|  | 532     --TPSA *Name,Value, [Name,Value,...]* | 
|  | 533         Parameters to control calculation of TPSA: it's a comma delimited | 
|  | 534         list of parameter name and value pairs. Possible parameter names: | 
|  | 535         *IgnorePhosphorus, IgnoreSulfur*. Possible parameter values: *Yes or | 
|  | 536         No*. By default, value of all parameters is set to *Yes*. | 
|  | 537 | 
|  | 538         By default, TPSA atom contributions from Phosphorus and Sulfur atoms | 
|  | 539         are not included during TPSA calculations. [ Ref 91 ] | 
|  | 540 | 
|  | 541     -w, --WorkingDir *DirName* | 
|  | 542         Location of working directory. Default value: current directory. | 
|  | 543 | 
|  | 544 EXAMPLES | 
|  | 545     To calculate default set of physicochemical properties - | 
|  | 546     MolecularWeight, HeavyAtoms, MolecularVolume, RotatableBonds, | 
|  | 547     HydrogenBondDonor, HydrogenBondAcceptors, SLogP, TPSA - and generate a | 
|  | 548     SamplePhysicochemicalProperties.csv file containing sequential compound | 
|  | 549     IDs along with properties data, type: | 
|  | 550 | 
|  | 551         % CalculatePhysicochemicalProperties.pl -o Sample.sdf | 
|  | 552 | 
|  | 553     To calculate all available physicochemical properties and generate both | 
|  | 554     SampleAllProperties.csv and SampleAllProperties.sdf files containing | 
|  | 555     sequential compound IDs in CSV file along with properties data, type: | 
|  | 556 | 
|  | 557         % CalculatePhysicochemicalProperties.pl -m All --output both | 
|  | 558           -r SampleAllProperties -o Sample.sdf | 
|  | 559 | 
|  | 560     To calculate RuleOf5 physicochemical properties and generate a | 
|  | 561     SampleRuleOf5Properties.csv file containing sequential compound IDs | 
|  | 562     along with properties data, type: | 
|  | 563 | 
|  | 564         % CalculatePhysicochemicalProperties.pl -m RuleOf5 | 
|  | 565           -r SampleRuleOf5Properties -o Sample.sdf | 
|  | 566 | 
|  | 567     To calculate RuleOf5 physicochemical properties along with counting | 
|  | 568     RuleOf5 violations and generate a SampleRuleOf5Properties.csv file | 
|  | 569     containing sequential compound IDs along with properties data, type: | 
|  | 570 | 
|  | 571         % CalculatePhysicochemicalProperties.pl -m RuleOf5 --RuleOf5Violations Yes | 
|  | 572           -r SampleRuleOf5Properties -o Sample.sdf | 
|  | 573 | 
|  | 574     To calculate RuleOf3 physicochemical properties and generate a | 
|  | 575     SampleRuleOf3Properties.csv file containing sequential compound IDs | 
|  | 576     along with properties data, type: | 
|  | 577 | 
|  | 578         % CalculatePhysicochemicalProperties.pl -m RuleOf3 | 
|  | 579           -r SampleRuleOf3Properties -o Sample.sdf | 
|  | 580 | 
|  | 581     To calculate RuleOf3 physicochemical properties along with counting | 
|  | 582     RuleOf3 violations and generate a SampleRuleOf3Properties.csv file | 
|  | 583     containing sequential compound IDs along with properties data, type: | 
|  | 584 | 
|  | 585         % CalculatePhysicochemicalProperties.pl -m RuleOf3 --RuleOf3Violations Yes | 
|  | 586           -r SampleRuleOf3Properties -o Sample.sdf | 
|  | 587 | 
|  | 588     To calculate a specific set of physicochemical properties and generate a | 
|  | 589     SampleProperties.csv file containing sequential compound IDs along with | 
|  | 590     properties data, type: | 
|  | 591 | 
|  | 592         % CalculatePhysicochemicalProperties.pl -m "Rings,AromaticRings" | 
|  | 593           -r SampleProperties -o Sample.sdf | 
|  | 594 | 
|  | 595     To calculate HydrogenBondDonors and HydrogenBondAcceptors using | 
|  | 596     HydrogenBondsType1 definition and generate a SampleProperties.csv file | 
|  | 597     containing sequential compound IDs along with properties data, type: | 
|  | 598 | 
|  | 599         % CalculatePhysicochemicalProperties.pl -m "HydrogenBondDonors,HydrogenBondAcceptors" | 
|  | 600           --HydrogenBonds HBondsType1 -r SampleProperties -o Sample.sdf | 
|  | 601 | 
|  | 602     To calculate TPSA using sulfur and phosphorus atoms along with nitrogen | 
|  | 603     and oxygen atoms and generate a SampleProperties.csv file containing | 
|  | 604     sequential compound IDs along with properties data, type: | 
|  | 605 | 
|  | 606         % CalculatePhysicochemicalProperties.pl -m "TPSA" --TPSA "IgnorePhosphorus,No, | 
|  | 607           IgnoreSulfur,No" -r SampleProperties -o Sample.sdf | 
|  | 608 | 
|  | 609     To calculate MolecularComplexity using extendend connectivity | 
|  | 610     fingerprints corresponding to atom neighborhood radius of 2 with atomic | 
|  | 611     invariant atom types without any scaling and generate a | 
|  | 612     SampleProperties.csv file containing sequential compound IDs along with | 
|  | 613     properties data, type: | 
|  | 614 | 
|  | 615         % CalculatePhysicochemicalProperties.pl -m MolecularComplexity --MolecularComplexity | 
|  | 616           "MolecularComplexityType,ExtendedConnectivityFingerprints,NeighborhoodRadius,2, | 
|  | 617           AtomIdentifierType, AtomicInvariantsAtomTypes, | 
|  | 618           AtomicInvariantsToUse,AS X BO H FC MN,NormalizationMethodology,None" | 
|  | 619           -r SampleProperties -o Sample.sdf | 
|  | 620 | 
|  | 621     To calculate RuleOf5 physicochemical properties along with counting | 
|  | 622     RuleOf5 violations and generate a SampleRuleOf5Properties.csv file | 
|  | 623     containing compound IDs from molecule name line along with properties | 
|  | 624     data, type: | 
|  | 625 | 
|  | 626         % CalculatePhysicochemicalProperties.pl -m RuleOf5 --RuleOf5Violations Yes | 
|  | 627           --DataFieldsMode CompoundID --CompoundIDMode MolName | 
|  | 628           -r SampleRuleOf5Properties -o Sample.sdf | 
|  | 629 | 
|  | 630     To calculate all available physicochemical properties and generate a | 
|  | 631     SampleAllProperties.csv file containing compound ID using specified data | 
|  | 632     field along with along with properties data, type: | 
|  | 633 | 
|  | 634         % CalculatePhysicochemicalProperties.pl -m All | 
|  | 635           --DataFieldsMode CompoundID --CompoundIDMode DataField --CompoundID Mol_ID | 
|  | 636           -r SampleAllProperties -o Sample.sdf | 
|  | 637 | 
|  | 638     To calculate all available physicochemical properties and generate a | 
|  | 639     SampleAllProperties.csv file containing compound ID using combination of | 
|  | 640     molecule name line and an explicit compound prefix along with properties | 
|  | 641     data, type: | 
|  | 642 | 
|  | 643         % CalculatePhysicochemicalProperties.pl -m All | 
|  | 644           --DataFieldsMode CompoundID --CompoundIDMode MolnameOrLabelPrefix | 
|  | 645           --CompoundID Cmpd --CompoundIDLabel MolID  -r SampleAllProperties | 
|  | 646           -o Sample.sdf | 
|  | 647 | 
|  | 648     To calculate all available physicochemical properties and generate a | 
|  | 649     SampleAllProperties.csv file containing specific data fields columns | 
|  | 650     along with with properties data, type: | 
|  | 651 | 
|  | 652         % CalculatePhysicochemicalProperties.pl -m All | 
|  | 653           --DataFieldsMode Specify --DataFields Mol_ID -r SampleAllProperties | 
|  | 654           -o Sample.sdf | 
|  | 655 | 
|  | 656     To calculate all available physicochemical properties and generate a | 
|  | 657     SampleAllProperties.csv file containing common data fields columns along | 
|  | 658     with with properties data, type: | 
|  | 659 | 
|  | 660         % CalculatePhysicochemicalProperties.pl -m All | 
|  | 661           --DataFieldsMode Common -r SampleAllProperties -o Sample.sdf | 
|  | 662 | 
|  | 663     To calculate all available physicochemical properties and generate both | 
|  | 664     SampleAllProperties.csv and CSV files containing all data fields columns | 
|  | 665     in CSV files along with with properties data, type: | 
|  | 666 | 
|  | 667         % CalculatePhysicochemicalProperties.pl -m All | 
|  | 668           --DataFieldsMode All  --output both -r SampleAllProperties | 
|  | 669           -o Sample.sdf | 
|  | 670 | 
|  | 671 AUTHOR | 
|  | 672     Manish Sud <msud@san.rr.com> | 
|  | 673 | 
|  | 674 SEE ALSO | 
|  | 675     ExtractFromSDtFiles.pl, ExtractFromTextFiles.pl, InfoSDFiles.pl, | 
|  | 676     InfoTextFiles.pl | 
|  | 677 | 
|  | 678 COPYRIGHT | 
|  | 679     Copyright (C) 2015 Manish Sud. All rights reserved. | 
|  | 680 | 
|  | 681     This file is part of MayaChemTools. | 
|  | 682 | 
|  | 683     MayaChemTools is free software; you can redistribute it and/or modify it | 
|  | 684     under the terms of the GNU Lesser General Public License as published by | 
|  | 685     the Free Software Foundation; either version 3 of the License, or (at | 
|  | 686     your option) any later version. | 
|  | 687 |