Mercurial > repos > deepakjadmin > mayatool3_test2
comparison docs/scripts/txt/TopologicalPharmacophoreAtomPairsFingerprints.txt @ 0:4816e4a8ae95 draft default tip
Uploaded
| author | deepakjadmin |
|---|---|
| date | Wed, 20 Jan 2016 09:23:18 -0500 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| -1:000000000000 | 0:4816e4a8ae95 |
|---|---|
| 1 NAME | |
| 2 TopologicalPharmacophoreAtomPairsFingerprints.pl - Generate topological | |
| 3 pharmacophore atom pairs fingerprints for SD files | |
| 4 | |
| 5 SYNOPSIS | |
| 6 TopologicalPharmacophoreAtomPairsFingerprints.pl SDFile(s)... | |
| 7 | |
| 8 TopologicalPharmacophoreAtomPairsFingerprints.pl [--AromaticityModel | |
| 9 *AromaticityModelType*] [--AtomPairsSetSizeToUse *ArbitrarySize | | |
| 10 FixedSize*] [-a, --AtomTypesToUse *"AtomType1, AtomType2..."*] | |
| 11 [--AtomTypesWeight *"AtomType1, Weight1, AtomType2, Weight2..."*] | |
| 12 [--CompoundID *DataFieldName or LabelPrefixString*] [--CompoundIDLabel | |
| 13 *text*] [--CompoundIDMode] [--DataFields *"FieldLabel1, | |
| 14 FieldLabel2,..."*] [-d, --DataFieldsMode *All | Common | Specify | | |
| 15 CompoundID*] [-f, --Filter *Yes | No*] [--FingerprintsLabelMode | |
| 16 *FingerprintsLabelOnly | FingerprintsLabelWithIDs*] [--FingerprintsLabel | |
| 17 *text*] [--FuzzifyAtomPairsCount *Yes | No*] [--FuzzificationMode | |
| 18 *FuzzyBinning | FuzzyBinSmoothing*] [--FuzzificationMethodology | |
| 19 *FuzzyBinning | FuzzyBinSmoothing*] [--FuzzFactor *number*] [-h, --help] | |
| 20 [-k, --KeepLargestComponent *Yes | No*] [--MinDistance *number*] | |
| 21 [--MaxDistance *number*] [-n, --NormalizationMethodology *None | | |
| 22 ByHeavyAtomsCount | ByAtomTypesCount*] [--OutDelim *comma | tab | | |
| 23 semicolon*] [--output *SD | FP | text | all*] [-o, --overwrite] [-q, | |
| 24 --quote *Yes | No*] [-r, --root *RootName*] [--ValuesPrecision *number*] | |
| 25 [-v, --VectorStringFormat *ValuesString, IDsAndValuesString | | |
| 26 IDsAndValuesPairsString | ValuesAndIDsString | ValuesAndIDsPairsString*] | |
| 27 [-w, --WorkingDir dirname] SDFile(s)... | |
| 28 | |
| 29 DESCRIPTION | |
| 30 Generate topological pharmacophore atom pairs fingerprints [ Ref 60-62, | |
| 31 Ref 65, Ref 68 ] for *SDFile(s)* and create appropriate SD, FP or | |
| 32 CSV/TSV text file(s) containing fingerprints vector strings | |
| 33 corresponding to molecular fingerprints. | |
| 34 | |
| 35 Multiple SDFile names are separated by spaces. The valid file extensions | |
| 36 are *.sdf* and *.sd*. All other file names are ignored. All the SD files | |
| 37 in a current directory can be specified either by **.sdf* or the current | |
| 38 directory name. | |
| 39 | |
| 40 Based on the values specified for --AtomTypesToUse, pharmacophore atom | |
| 41 types are assigned to all non-hydrogen atoms in a molecule and a | |
| 42 distance matrix is generated. A pharmacophore atom pairs basis set is | |
| 43 initialized for all unique possible pairs within --MinDistance and | |
| 44 --MaxDistance range. | |
| 45 | |
| 46 Let: | |
| 47 | |
| 48 P = Valid pharmacophore atom type | |
| 49 | |
| 50 Px = Pharmacophore atom type x | |
| 51 Py = Pharmacophore atom type y | |
| 52 | |
| 53 Dmin = Minimum distance corresponding to number of bonds between | |
| 54 two atoms | |
| 55 Dmax = Maximum distance corresponding to number of bonds between | |
| 56 two atoms | |
| 57 D = Distance corresponding to number of bonds between two atoms | |
| 58 | |
| 59 Px-Dn-Py = Pharmacophore atom pair ID for atom types Px and Py at | |
| 60 distance Dn | |
| 61 | |
| 62 P = Number of pharmacophore atom types to consider | |
| 63 PPDn = Number of possible unique pharmacophore atom pairs at a distance Dn | |
| 64 | |
| 65 PPT = Total number of possible pharmacophore atom pairs at all distances | |
| 66 between Dmin and Dmax | |
| 67 | |
| 68 Then: | |
| 69 | |
| 70 PPD = (P * (P - 1))/2 + P | |
| 71 | |
| 72 PPT = ((Dmax - Dmin) + 1) * ((P * (P - 1))/2 + P) | |
| 73 = ((Dmax - Dmin) + 1) * PPD | |
| 74 | |
| 75 So for default values of Dmin = 1, Dmax = 10 and P = 5, | |
| 76 | |
| 77 PPD = (5 * (5 - 1))/2 + 5 = 15 | |
| 78 PPT = ((10 - 1) + 1) * 15 = 150 | |
| 79 | |
| 80 The pharmacophore atom pairs bais set includes 150 values. | |
| 81 | |
| 82 The atom pair IDs correspond to: | |
| 83 | |
| 84 Px-Dn-Py = Pharmacophore atom pair ID for atom types Px and Py at | |
| 85 distance Dn | |
| 86 | |
| 87 For example: H-D1-H, H-D2-HBA, PI-D5-PI and so on | |
| 88 | |
| 89 Using distance matrix and pharmacohore atom types, occurrence of unique | |
| 90 pharmacohore atom pairs is counted. The contribution of each atom type | |
| 91 to atom pair interaction is optionally weighted by specified | |
| 92 --AtomTypesWeight before assigning its count to appropriate distance | |
| 93 bin. Based on --NormalizationMethodology option, pharmacophore atom | |
| 94 pairs count is optionally normalized. Additionally, pharmacohore atom | |
| 95 pairs count is optionally fuzzified before or after the normalization | |
| 96 controlled by values of --FuzzifyAtomPairsCount, --FuzzificationMode, | |
| 97 --FuzzificationMethodology and --FuzzFactor options. | |
| 98 | |
| 99 The final pharmacophore atom pairs count along with atom pair | |
| 100 identifiers involving all non-hydrogen atoms, with optional | |
| 101 normalization and fuzzification, constitute pharmacophore topological | |
| 102 atom pairs fingerprints of the molecule. | |
| 103 | |
| 104 For *ArbitrarySize* value of --AtomPairsSetSizeToUse option, the | |
| 105 fingerprint vector correspond to only those topological pharmacophore | |
| 106 atom pairs which are present and have non-zero count. However, for | |
| 107 *FixedSize* value of --AtomPairsSetSizeToUse option, the fingerprint | |
| 108 vector contains all possible valid topological pharmacophore atom pairs | |
| 109 with both zero and non-zero count values. | |
| 110 | |
| 111 Example of *SD* file containing topological pharmacophore atom pairs | |
| 112 fingerprints string data: | |
| 113 | |
| 114 ... ... | |
| 115 ... ... | |
| 116 $$$$ | |
| 117 ... ... | |
| 118 ... ... | |
| 119 ... ... | |
| 120 41 44 0 0 0 0 0 0 0 0999 V2000 | |
| 121 -3.3652 1.4499 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 | |
| 122 ... ... | |
| 123 2 3 1 0 0 0 0 | |
| 124 ... ... | |
| 125 M END | |
| 126 > <CmpdID> | |
| 127 Cmpd1 | |
| 128 | |
| 129 > <TopologicalPharmacophoreAtomPairsFingerprints> | |
| 130 FingerprintsVector;TopologicalPharmacophoreAtomPairs:ArbitrarySize:Min | |
| 131 Distance1:MaxDistance10;54;NumericalValues;IDsAndValuesString;H-D1-H H | |
| 132 -D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA H-D2-HBD HBA-D2-HBA HBA-D2- | |
| 133 HBD H-D3-H H-D3-HBA H-D3-HBD H-D3-NI HBA-D3-NI HBD-D3-NI H-D4-H H-D...; | |
| 134 18 1 2 1 22 12 8 1 2 18 6 3 1 1 1 22 13 6 5 7 2 28 9 5 1 1 1 36 16 10 3 | |
| 135 4 1 37 10 8 1 35 10 9 3 3 1 28 7 7 4 18 16 12 5 1 2 1 | |
| 136 | |
| 137 $$$$ | |
| 138 ... ... | |
| 139 ... ... | |
| 140 | |
| 141 Example of *FP* file containing topological pharmacophore atom pairs | |
| 142 fingerprints string data: | |
| 143 | |
| 144 # | |
| 145 # Package = MayaChemTools 7.4 | |
| 146 # Release Date = Oct 21, 2010 | |
| 147 # | |
| 148 # TimeStamp = Fri Mar 11 15:32:48 2011 | |
| 149 # | |
| 150 # FingerprintsStringType = FingerprintsVector | |
| 151 # | |
| 152 # Description = TopologicalPharmacophoreAtomPairs:ArbitrarySize:MinDistance1:MaxDistance10 | |
| 153 # VectorStringFormat = IDsAndValuesString | |
| 154 # VectorValuesType = NumericalValues | |
| 155 # | |
| 156 Cmpd1 54;H-D1-H H-D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA...;18 1 2... | |
| 157 Cmpd2 61;H-D1-H H-D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA...;5 1 2 ... | |
| 158 ... ... | |
| 159 ... .. | |
| 160 | |
| 161 Example of CSV *Text* file containing topological pharmacophore atom | |
| 162 pairs fingerprints string data: | |
| 163 | |
| 164 "CompoundID","TopologicalPharmacophoreAtomPairsFingerprints" | |
| 165 "Cmpd1","FingerprintsVector;TopologicalPharmacophoreAtomPairs:Arbitrary | |
| 166 Size:MinDistance1:MaxDistance10;54;NumericalValues;IDsAndValuesString;H | |
| 167 -D1-H H-D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA H-D2-HBD HBA-D2-HBA H | |
| 168 BA-D2-HBD H-D3-H H-D3-HBA H-D3-HBD H-D3-NI HBA-D3-NI HBD-D3-NI H-D4...; | |
| 169 18 1 2 1 22 12 8 1 2 18 6 3 1 1 1 22 13 6 5 7 2 28 9 5 1 1 1 36 16 10 3 | |
| 170 4 1 37 10 8 1 35 10 9 3 3 1 28 7 7 4 18 16 12 5 1 2 1" | |
| 171 ... ... | |
| 172 ... ... | |
| 173 | |
| 174 The current release of MayaChemTools generates the following types of | |
| 175 topological pharmacophore atom pairs fingerprints vector strings: | |
| 176 | |
| 177 FingerprintsVector;TopologicalPharmacophoreAtomPairs:ArbitrarySize:Min | |
| 178 Distance1:MaxDistance10;54;NumericalValues;IDsAndValuesString;H-D1-H H | |
| 179 -D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA H-D2-HBD HBA-D2-HBA HBA-D2- | |
| 180 HBD H-D3-H H-D3-HBA H-D3-HBD H-D3-NI HBA-D3-NI HBD-D3-NI H-D4-H H-D4-H | |
| 181 BA H-D4-HBD HBA-D4-HBA HBA-D4-HBD HBD-D4-HBD H-D5-H H-D5-HBA H-D5-...; | |
| 182 18 1 2 1 22 12 8 1 2 18 6 3 1 1 1 22 13 6 5 7 2 28 9 5 1 1 1 36 16 10 | |
| 183 3 4 1 37 10 8 1 35 10 9 3 3 1 28 7 7 4 18 16 12 5 1 2 1 | |
| 184 | |
| 185 FingerprintsVector;TopologicalPharmacophoreAtomPairs:FixedSize:MinDist | |
| 186 ance1:MaxDistance10;150;OrderedNumericalValues;ValuesString;18 0 0 1 0 | |
| 187 0 0 2 0 0 1 0 0 0 0 22 12 8 0 0 1 2 0 0 0 0 0 0 0 0 18 6 3 1 0 0 0 1 | |
| 188 0 0 1 0 0 0 0 22 13 6 0 0 5 7 0 0 2 0 0 0 0 0 28 9 5 1 0 0 0 1 0 0 1 0 | |
| 189 0 0 0 36 16 10 0 0 3 4 0 0 1 0 0 0 0 0 37 10 8 0 0 0 0 1 0 0 0 0 0 0 | |
| 190 0 35 10 9 0 0 3 3 0 0 1 0 0 0 0 0 28 7 7 4 0 0 0 0 0 0 0 0 0 0 0 18... | |
| 191 | |
| 192 FingerprintsVector;TopologicalPharmacophoreAtomPairs:FixedSize:MinDist | |
| 193 ance1:MaxDistance10;150;OrderedNumericalValues;IDsAndValuesString;H-D1 | |
| 194 -H H-D1-HBA H-D1-HBD H-D1-NI H-D1-PI HBA-D1-HBA HBA-D1-HBD HBA-D1-NI H | |
| 195 BA-D1-PI HBD-D1-HBD HBD-D1-NI HBD-D1-PI NI-D1-NI NI-D1-PI PI-D1-PI H-D | |
| 196 2-H H-D2-HBA H-D2-HBD H-D2-NI H-D2-PI HBA-D2-HBA HBA-D2-HBD HBA-D2...; | |
| 197 18 0 0 1 0 0 0 2 0 0 1 0 0 0 0 22 12 8 0 0 1 2 0 0 0 0 0 0 0 0 18 6 3 | |
| 198 1 0 0 0 1 0 0 1 0 0 0 0 22 13 6 0 0 5 7 0 0 2 0 0 0 0 0 28 9 5 1 0 0 0 | |
| 199 1 0 0 1 0 0 0 0 36 16 10 0 0 3 4 0 0 1 0 0 0 0 | |
| 200 | |
| 201 OPTIONS | |
| 202 --AromaticityModel *MDLAromaticityModel | TriposAromaticityModel | | |
| 203 MMFFAromaticityModel | ChemAxonBasicAromaticityModel | | |
| 204 ChemAxonGeneralAromaticityModel | DaylightAromaticityModel | | |
| 205 MayaChemToolsAromaticityModel* | |
| 206 Specify aromaticity model to use during detection of aromaticity. | |
| 207 Possible values in the current release are: *MDLAromaticityModel, | |
| 208 TriposAromaticityModel, MMFFAromaticityModel, | |
| 209 ChemAxonBasicAromaticityModel, ChemAxonGeneralAromaticityModel, | |
| 210 DaylightAromaticityModel or MayaChemToolsAromaticityModel*. Default | |
| 211 value: *MayaChemToolsAromaticityModel*. | |
| 212 | |
| 213 The supported aromaticity model names along with model specific | |
| 214 control parameters are defined in AromaticityModelsData.csv, which | |
| 215 is distributed with the current release and is available under | |
| 216 lib/data directory. Molecule.pm module retrieves data from this file | |
| 217 during class instantiation and makes it available to method | |
| 218 DetectAromaticity for detecting aromaticity corresponding to a | |
| 219 specific model. | |
| 220 | |
| 221 --AtomPairsSetSizeToUse *ArbitrarySize | FixedSize* | |
| 222 Atom pairs set size to use during generation of topological | |
| 223 pharmacophore atom pairs fingerprints. | |
| 224 | |
| 225 Possible values: *ArbitrarySize | FixedSize*; Default value: | |
| 226 *ArbitrarySize*. | |
| 227 | |
| 228 For *ArbitrarySize* value of --AtomPairsSetSizeToUse option, the | |
| 229 fingerprint vector correspond to only those topological | |
| 230 pharmacophore atom pairs which are present and have non-zero count. | |
| 231 However, for *FixedSize* value of --AtomPairsSetSizeToUse option, | |
| 232 the fingerprint vector contains all possible valid topological | |
| 233 pharmacophore atom pairs with both zero and non-zero count values. | |
| 234 | |
| 235 -a, --AtomTypesToUse *"AtomType1,AtomType2,..."* | |
| 236 Pharmacophore atom types to use during generation of topological | |
| 237 phramacophore atom pairs. It's a list of comma separated valid | |
| 238 pharmacophore atom types. | |
| 239 | |
| 240 Possible values for pharmacophore atom types are: *Ar, CA, H, HBA, | |
| 241 HBD, Hal, NI, PI, RA*. Default value [ Ref 60-62 ] : | |
| 242 *HBD,HBA,PI,NI,H*. | |
| 243 | |
| 244 The pharmacophore atom types abbreviations correspond to: | |
| 245 | |
| 246 HBD: HydrogenBondDonor | |
| 247 HBA: HydrogenBondAcceptor | |
| 248 PI : PositivelyIonizable | |
| 249 NI : NegativelyIonizable | |
| 250 Ar : Aromatic | |
| 251 Hal : Halogen | |
| 252 H : Hydrophobic | |
| 253 RA : RingAtom | |
| 254 CA : ChainAtom | |
| 255 | |
| 256 *AtomTypes::FunctionalClassAtomTypes* module is used to assign | |
| 257 pharmacophore atom types. It uses following definitions [ Ref 60-61, | |
| 258 Ref 65-66 ]: | |
| 259 | |
| 260 HydrogenBondDonor: NH, NH2, OH | |
| 261 HydrogenBondAcceptor: N[!H], O | |
| 262 PositivelyIonizable: +, NH2 | |
| 263 NegativelyIonizable: -, C(=O)OH, S(=O)OH, P(=O)OH | |
| 264 | |
| 265 --AtomTypesWeight *"AtomType1,Weight1,AtomType2,Weight2..."* | |
| 266 Weights of specified pharmacophore atom types to use during | |
| 267 calculation of their contribution to atom pair count. Default value: | |
| 268 *None*. Valid values: real numbers greater than 0. In general it's | |
| 269 comma delimited list of valid atom type and its weight. | |
| 270 | |
| 271 The weight values allow to increase the importance of specific | |
| 272 pharmacophore atom type in the generated fingerprints. A weight | |
| 273 value of 0 for an atom type eliminates its contribution to atom pair | |
| 274 count where as weight value of 2 doubles its contribution. | |
| 275 | |
| 276 --CompoundID *DataFieldName or LabelPrefixString* | |
| 277 This value is --CompoundIDMode specific and indicates how compound | |
| 278 ID is generated. | |
| 279 | |
| 280 For *DataField* value of --CompoundIDMode option, it corresponds to | |
| 281 datafield label name whose value is used as compound ID; otherwise, | |
| 282 it's a prefix string used for generating compound IDs like | |
| 283 LabelPrefixString<Number>. Default value, *Cmpd*, generates compound | |
| 284 IDs which look like Cmpd<Number>. | |
| 285 | |
| 286 Examples for *DataField* value of --CompoundIDMode: | |
| 287 | |
| 288 MolID | |
| 289 ExtReg | |
| 290 | |
| 291 Examples for *LabelPrefix* or *MolNameOrLabelPrefix* value of | |
| 292 --CompoundIDMode: | |
| 293 | |
| 294 Compound | |
| 295 | |
| 296 The value specified above generates compound IDs which correspond to | |
| 297 Compound<Number> instead of default value of Cmpd<Number>. | |
| 298 | |
| 299 --CompoundIDLabel *text* | |
| 300 Specify compound ID column label for CSV/TSV text file(s) used | |
| 301 during *CompoundID* value of --DataFieldsMode option. Default value: | |
| 302 *CompoundID*. | |
| 303 | |
| 304 --CompoundIDMode *DataField | MolName | LabelPrefix | | |
| 305 MolNameOrLabelPrefix* | |
| 306 Specify how to generate compound IDs and write to FP or CSV/TSV text | |
| 307 file(s) along with generated fingerprints for *FP | text | all* | |
| 308 values of --output option: use a *SDFile(s)* datafield value; use | |
| 309 molname line from *SDFile(s)*; generate a sequential ID with | |
| 310 specific prefix; use combination of both MolName and LabelPrefix | |
| 311 with usage of LabelPrefix values for empty molname lines. | |
| 312 | |
| 313 Possible values: *DataField | MolName | LabelPrefix | | |
| 314 MolNameOrLabelPrefix*. Default value: *LabelPrefix*. | |
| 315 | |
| 316 For *MolNameAndLabelPrefix* value of --CompoundIDMode, molname line | |
| 317 in *SDFile(s)* takes precedence over sequential compound IDs | |
| 318 generated using *LabelPrefix* and only empty molname values are | |
| 319 replaced with sequential compound IDs. | |
| 320 | |
| 321 This is only used for *CompoundID* value of --DataFieldsMode option. | |
| 322 | |
| 323 --DataFields *"FieldLabel1,FieldLabel2,..."* | |
| 324 Comma delimited list of *SDFiles(s)* data fields to extract and | |
| 325 write to CSV/TSV text file(s) along with generated fingerprints for | |
| 326 *text | all* values of --output option. | |
| 327 | |
| 328 This is only used for *Specify* value of --DataFieldsMode option. | |
| 329 | |
| 330 Examples: | |
| 331 | |
| 332 Extreg | |
| 333 MolID,CompoundName | |
| 334 | |
| 335 -d, --DataFieldsMode *All | Common | Specify | CompoundID* | |
| 336 Specify how data fields in *SDFile(s)* are transferred to output | |
| 337 CSV/TSV text file(s) along with generated fingerprints for *text | | |
| 338 all* values of --output option: transfer all SD data field; transfer | |
| 339 SD data files common to all compounds; extract specified data | |
| 340 fields; generate a compound ID using molname line, a compound | |
| 341 prefix, or a combination of both. Possible values: *All | Common | | |
| 342 specify | CompoundID*. Default value: *CompoundID*. | |
| 343 | |
| 344 -f, --Filter *Yes | No* | |
| 345 Specify whether to check and filter compound data in SDFile(s). | |
| 346 Possible values: *Yes or No*. Default value: *Yes*. | |
| 347 | |
| 348 By default, compound data is checked before calculating fingerprints | |
| 349 and compounds containing atom data corresponding to non-element | |
| 350 symbols or no atom data are ignored. | |
| 351 | |
| 352 --FingerprintsLabelMode *FingerprintsLabelOnly | | |
| 353 FingerprintsLabelWithIDs* | |
| 354 Specify how fingerprints label is generated in conjunction with | |
| 355 --FingerprintsLabel option value: use fingerprints label generated | |
| 356 only by --FingerprintsLabel option value or append topological atom | |
| 357 pair count value IDs to --FingerprintsLabel option value. | |
| 358 | |
| 359 Possible values: *FingerprintsLabelOnly | FingerprintsLabelWithIDs*. | |
| 360 Default value: *FingerprintsLabelOnly*. | |
| 361 | |
| 362 Topological atom pairs IDs appended to --FingerprintsLabel value | |
| 363 during *FingerprintsLabelWithIDs* values of --FingerprintsLabelMode | |
| 364 correspond to atom pair count values in fingerprint vector string. | |
| 365 | |
| 366 *FingerprintsLabelWithIDs* value of --FingerprintsLabelMode is | |
| 367 ignored during *ArbitrarySize* value of --AtomPairsSetSizeToUse | |
| 368 option and topological atom pairs IDs not appended to the label. | |
| 369 | |
| 370 --FingerprintsLabel *text* | |
| 371 SD data label or text file column label to use for fingerprints | |
| 372 string in output SD or CSV/TSV text file(s) specified by --output. | |
| 373 Default value: *TopologicalPharmacophoreAtomPairsFingerprints*. | |
| 374 | |
| 375 --FuzzifyAtomPairsCount *Yes | No* | |
| 376 To fuzzify or not to fuzzify atom pairs count. Possible values: *Yes | |
| 377 or No*. Default value: *No*. | |
| 378 | |
| 379 --FuzzificationMode *BeforeNormalization | AfterNormalization* | |
| 380 When to fuzzify atom pairs count. Possible values: | |
| 381 *BeforeNormalization | AfterNormalizationYes*. Default value: | |
| 382 *AfterNormalization*. | |
| 383 | |
| 384 --FuzzificationMethodology *FuzzyBinning | FuzzyBinSmoothing* | |
| 385 How to fuzzify atom pairs count. Possible values: *FuzzyBinning | | |
| 386 FuzzyBinSmoothing*. Default value: *FuzzyBinning*. | |
| 387 | |
| 388 In conjunction with values for options --FuzzifyAtomPairsCount, | |
| 389 --FuzzificationMode and --FuzzFactor, --FuzzificationMethodology | |
| 390 option is used to fuzzify pharmacophore atom pairs count. | |
| 391 | |
| 392 Let: | |
| 393 | |
| 394 Px = Pharmacophore atom type x | |
| 395 Py = Pharmacophore atom type y | |
| 396 PPxy = Pharmacophore atom pair between atom type Px and Py | |
| 397 | |
| 398 PPxyDn = Pharmacophore atom pairs count between atom type Px and Py | |
| 399 at distance Dn | |
| 400 PPxyDn-1 = Pharmacophore atom pairs count between atom type Px and Py | |
| 401 at distance Dn - 1 | |
| 402 PPxyDn+1 = Pharmacophore atom pairs count between atom type Px and Py | |
| 403 at distance Dn + 1 | |
| 404 | |
| 405 FF = FuzzFactor for FuzzyBinning and FuzzyBinSmoothing | |
| 406 | |
| 407 Then: | |
| 408 | |
| 409 For *FuzzyBinning*: | |
| 410 | |
| 411 PPxyDn = PPxyDn (Unchanged) | |
| 412 | |
| 413 PPxyDn-1 = PPxyDn-1 + PPxyDn * FF | |
| 414 PPxyDn+1 = PPxyDn+1 + PPxyDn * FF | |
| 415 | |
| 416 For *FuzzyBinSmoothing*: | |
| 417 | |
| 418 PPxyDn = PPxyDn - PPxyDn * 2FF for Dmin < Dn < Dmax | |
| 419 PPxyDn = PPxyDn - PPxyDn * FF for Dn = Dmin or Dmax | |
| 420 | |
| 421 PPxyDn-1 = PPxyDn-1 + PPxyDn * FF | |
| 422 PPxyDn+1 = PPxyDn+1 + PPxyDn * FF | |
| 423 | |
| 424 In both fuzzification schemes, a value of 0 for FF implies no | |
| 425 fuzzification of occurrence counts. A value of 1 during | |
| 426 *FuzzyBinning* corresponds to maximum fuzzification of occurrence | |
| 427 counts; however, a value of 1 during *FuzzyBinSmoothing* ends up | |
| 428 completely distributing the value over the previous and next | |
| 429 distance bins. | |
| 430 | |
| 431 So for default value of --FuzzFactor (FF) 0.15, the occurrence count | |
| 432 of pharmacohore atom pairs at distance Dn during FuzzyBinning is | |
| 433 left unchanged and the counts at distances Dn -1 and Dn + 1 are | |
| 434 incremented by PPxyDn * 0.15. | |
| 435 | |
| 436 And during *FuzzyBinSmoothing* the occurrence counts at Distance Dn | |
| 437 is scaled back using multiplicative factor of (1 - 2*0.15) and the | |
| 438 occurrence counts at distances Dn -1 and Dn + 1 are incremented by | |
| 439 PPxyDn * 0.15. In otherwords, occurrence bin count is smoothed out | |
| 440 by distributing it over the previous and next distance value. | |
| 441 | |
| 442 --FuzzFactor *number* | |
| 443 Specify by how much to fuzzify atom pairs count. Default value: | |
| 444 *0.15*. Valid values: For *FuzzyBinning* value of | |
| 445 --FuzzificationMethodology option: *between 0 and 1.0*; For | |
| 446 *FuzzyBinSmoothing* value of --FuzzificationMethodology option: | |
| 447 *between 0 and 0.5*. | |
| 448 | |
| 449 -h, --help | |
| 450 Print this help message. | |
| 451 | |
| 452 -k, --KeepLargestComponent *Yes | No* | |
| 453 Generate fingerprints for only the largest component in molecule. | |
| 454 Possible values: *Yes or No*. Default value: *Yes*. | |
| 455 | |
| 456 For molecules containing multiple connected components, fingerprints | |
| 457 can be generated in two different ways: use all connected components | |
| 458 or just the largest connected component. By default, all atoms | |
| 459 except for the largest connected component are deleted before | |
| 460 generation of fingerprints. | |
| 461 | |
| 462 --MinDistance *number* | |
| 463 Minimum bond distance between atom pairs for generating topological | |
| 464 pharmacophore atom pairs. Default value: *1*. Valid values: positive | |
| 465 integers including 0 and less than --MaxDistance. | |
| 466 | |
| 467 --MaxDistance *number* | |
| 468 Maximum bond distance between atom pairs for generating topological | |
| 469 pharmacophore atom pairs. Default value: *10*. Valid values: | |
| 470 positive integers and greater than --MinDistance. | |
| 471 | |
| 472 -n, --NormalizationMethodology *None | ByHeavyAtomsCount | | |
| 473 ByAtomTypesCount* | |
| 474 Normalization methodology to use for scaling the occurrence count of | |
| 475 pharmacophore atom pairs within specified distance range. Possible | |
| 476 values: *None, ByHeavyAtomsCount or ByAtomTypesCount*. Default | |
| 477 value: *None*. | |
| 478 | |
| 479 --OutDelim *comma | tab | semicolon* | |
| 480 Delimiter for output CSV/TSV text file(s). Possible values: *comma, | |
| 481 tab, or semicolon* Default value: *comma*. | |
| 482 | |
| 483 --output *SD | FP | text | all* | |
| 484 Type of output files to generate. Possible values: *SD, FP, text, or | |
| 485 all*. Default value: *text*. | |
| 486 | |
| 487 -o, --overwrite | |
| 488 Overwrite existing files. | |
| 489 | |
| 490 -q, --quote *Yes | No* | |
| 491 Put quote around column values in output CSV/TSV text file(s). | |
| 492 Possible values: *Yes or No*. Default value: *Yes* | |
| 493 | |
| 494 -r, --root *RootName* | |
| 495 New file name is generated using the root: <Root>.<Ext>. Default for | |
| 496 new file names: | |
| 497 <SDFileName><TopologicalPharmacophoreAtomPairsFP>.<Ext>. The file | |
| 498 type determines <Ext> value. The sdf, fpf, csv, and tsv <Ext> values | |
| 499 are used for SD, FP, comma/semicolon, and tab delimited text files, | |
| 500 respectively.This option is ignored for multiple input files. | |
| 501 | |
| 502 --ValuesPrecision *number* | |
| 503 Precision of atom pairs count real values which might be generated | |
| 504 after normalization or fuzzification. Default value: up to *2* | |
| 505 decimal places. Valid values: positive integers. | |
| 506 | |
| 507 -v, --VectorStringFormat *ValuesString, IDsAndValuesString | | |
| 508 IDsAndValuesPairsString | ValuesAndIDsString | ValuesAndIDsPairsString* | |
| 509 Format of fingerprints vector string data in output SD, FP or | |
| 510 CSV/TSV text file(s) specified by --output option. Possible values: | |
| 511 *ValuesString, IDsAndValuesString | IDsAndValuesPairsString | | |
| 512 ValuesAndIDsString | ValuesAndIDsPairsString*. | |
| 513 | |
| 514 Default value during *FixedSize* value of --AtomPairsSetSizeToUse | |
| 515 option: *ValuesString*. Default value during *ArbitrarySize* value | |
| 516 of --AtomPairsSetSizeToUse option: *IDsAndValuesString*. | |
| 517 | |
| 518 *ValuesString* option value is not allowed for *ArbitrarySize* value | |
| 519 of --AtomPairsSetSizeToUse option. | |
| 520 | |
| 521 Examples: | |
| 522 | |
| 523 FingerprintsVector;TopologicalPharmacophoreAtomPairs:ArbitrarySize:Min | |
| 524 Distance1:MaxDistance10;54;NumericalValues;IDsAndValuesString;H-D1-H H | |
| 525 -D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA H-D2-HBD HBA-D2-HBA HBA-D2- | |
| 526 HBD H-D3-H H-D3-HBA H-D3-HBD H-D3-NI HBA-D3-NI HBD-D3-NI H-D4-H H-D4-H | |
| 527 BA H-D4-HBD HBA-D4-HBA HBA-D4-HBD HBD-D4-HBD H-D5-H H-D5-HBA H-D5-...; | |
| 528 18 1 2 1 22 12 8 1 2 18 6 3 1 1 1 22 13 6 5 7 2 28 9 5 1 1 1 36 16 10 | |
| 529 3 4 1 37 10 8 1 35 10 9 3 3 1 28 7 7 4 18 16 12 5 1 2 1 | |
| 530 | |
| 531 FingerprintsVector;TopologicalPharmacophoreAtomPairs:FixedSize:MinDist | |
| 532 ance1:MaxDistance10;150;OrderedNumericalValues;ValuesString;18 0 0 1 0 | |
| 533 0 0 2 0 0 1 0 0 0 0 22 12 8 0 0 1 2 0 0 0 0 0 0 0 0 18 6 3 1 0 0 0 1 | |
| 534 0 0 1 0 0 0 0 22 13 6 0 0 5 7 0 0 2 0 0 0 0 0 28 9 5 1 0 0 0 1 0 0 1 0 | |
| 535 0 0 0 36 16 10 0 0 3 4 0 0 1 0 0 0 0 0 37 10 8 0 0 0 0 1 0 0 0 0 0 0 | |
| 536 0 35 10 9 0 0 3 3 0 0 1 0 0 0 0 0 28 7 7 4 0 0 0 0 0 0 0 0 0 0 0 18... | |
| 537 | |
| 538 FingerprintsVector;TopologicalPharmacophoreAtomPairs:FixedSize:MinDist | |
| 539 ance1:MaxDistance10;150;OrderedNumericalValues;IDsAndValuesString;H-D1 | |
| 540 -H H-D1-HBA H-D1-HBD H-D1-NI H-D1-PI HBA-D1-HBA HBA-D1-HBD HBA-D1-NI H | |
| 541 BA-D1-PI HBD-D1-HBD HBD-D1-NI HBD-D1-PI NI-D1-NI NI-D1-PI PI-D1-PI H-D | |
| 542 2-H H-D2-HBA H-D2-HBD H-D2-NI H-D2-PI HBA-D2-HBA HBA-D2-HBD HBA-D2...; | |
| 543 18 0 0 1 0 0 0 2 0 0 1 0 0 0 0 22 12 8 0 0 1 2 0 0 0 0 0 0 0 0 18 6 3 | |
| 544 1 0 0 0 1 0 0 1 0 0 0 0 22 13 6 0 0 5 7 0 0 2 0 0 0 0 0 28 9 5 1 0 0 0 | |
| 545 1 0 0 1 0 0 0 0 36 16 10 0 0 3 4 0 0 1 0 0 0 0 | |
| 546 | |
| 547 -w, --WorkingDir *DirName* | |
| 548 Location of working directory. Default value: current directory. | |
| 549 | |
| 550 EXAMPLES | |
| 551 To generate topological pharmacophore atom pairs fingerprints of | |
| 552 arbitrary size corresponding to distances from 1 through 10 using | |
| 553 default atom types with no weighting, normalization, and fuzzification | |
| 554 of atom pairs count and create a SampleTPAPFP.csv file containing | |
| 555 sequential compound IDs along with fingerprints vector strings data in | |
| 556 ValuesString format, type: | |
| 557 | |
| 558 % TopologicalPharmacophoreAtomPairsFingerprints.pl -r SampleTPAPFP | |
| 559 -o Sample.sdf | |
| 560 | |
| 561 To generate topological pharmacophore atom pairs fingerprints of fixed | |
| 562 size corresponding to distances from 1 through 10 using default atom | |
| 563 types with no weighting, normalization, and fuzzification of atom pairs | |
| 564 count and create a SampleTPAPFP.csv file containing sequential compound | |
| 565 IDs along with fingerprints vector strings data in ValuesString format, | |
| 566 type: | |
| 567 | |
| 568 % TopologicalPharmacophoreAtomPairsFingerprints.pl | |
| 569 --AtomPairsSetSizeToUse FixedSize -r SampleTPAPFP-o Sample.sdf | |
| 570 | |
| 571 To generate topological pharmacophore atom pairs fingerprints of | |
| 572 arbitrary size corresponding to distances from 1 through 10 using | |
| 573 default atom types with no weighting, normalization, and fuzzification | |
| 574 of atom pairs count and create SampleTPAPFP.sdf, SampleTPAPFP.fpf and | |
| 575 SampleTPAPFP.csv files containing sequential compound IDs in CSV file | |
| 576 along with fingerprints vector strings data in ValuesString format, | |
| 577 type: | |
| 578 | |
| 579 % TopologicalPharmacophoreAtomPairsFingerprints.pl --output all | |
| 580 -r SampleTPAPFP -o Sample.sdf | |
| 581 | |
| 582 To generate topological pharmacophore atom pairs fingerprints of | |
| 583 arbitrary size corresponding to distances from 1 through 10 using | |
| 584 default atom types with no weighting, normalization, and fuzzification | |
| 585 of atom pairs count and create a SampleTPAPFP.csv file containing | |
| 586 sequential compound IDs along with fingerprints vector strings data in | |
| 587 IDsAndValuesPairsString format, type: | |
| 588 | |
| 589 % TopologicalPharmacophoreAtomPairsFingerprints.pl --VectorStringFormat | |
| 590 IDsAndValuesPairsString -r SampleTPAPFP -o Sample.sdf | |
| 591 | |
| 592 To generate topological pharmacophore atom pairs fingerprints of | |
| 593 arbitrary size corresponding to distances from 1 through 6 using default | |
| 594 atom types with no weighting, normalization, and fuzzification of atom | |
| 595 pairs count and create a SampleTPAPFP.csv file containing sequential | |
| 596 compound IDs along with fingerprints vector strings data in ValuesString | |
| 597 format, type: | |
| 598 | |
| 599 % TopologicalPharmacophoreAtomPairsFingerprints.pl --MinDistance 1 | |
| 600 -MaxDistance 6 -r SampleTPAPFP -o Sample.sdf | |
| 601 | |
| 602 To generate topological pharmacophore atom pairs fingerprints of | |
| 603 arbitrary size corresponding to distances from 1 through 10 using | |
| 604 "HBD,HBA,PI,NI" atom types with double the weighting for "HBD,HBA" and | |
| 605 normalization by HeavyAtomCount but no fuzzification of atom pairs count | |
| 606 and create a SampleTPAPFP.csv file containing sequential compound IDs | |
| 607 along with fingerprints vector strings data in ValuesString format, | |
| 608 type: | |
| 609 | |
| 610 % TopologicalPharmacophoreAtomPairsFingerprints.pl --MinDistance 1 | |
| 611 -MaxDistance 10 --AtomTypesToUse "HBD,HBA,PI, NI" --AtomTypesWeight | |
| 612 "HBD,2,HBA,2,PI,1,NI,1" --NormalizationMethodology ByHeavyAtomsCount | |
| 613 --FuzzifyAtomPairsCount No -r SampleTPAPFP -o Sample.sdf | |
| 614 | |
| 615 To generate topological pharmacophore atom pairs fingerprints of | |
| 616 arbitrary size corresponding to distances from 1 through 10 using | |
| 617 "HBD,HBA,PI,NI,H" atom types with no weighting of atom types and | |
| 618 normalization but with fuzzification of atom pairs count using | |
| 619 FuzzyBinning methodology with FuzzFactor value 0.15 and create a | |
| 620 SampleTPAPFP.csv file containing sequential compound IDs along with | |
| 621 fingerprints vector strings data in ValuesString format, type: | |
| 622 | |
| 623 % TopologicalPharmacophoreAtomPairsFingerprints.pl --MinDistance 1 | |
| 624 --MaxDistance 10 --AtomTypesToUse "HBD,HBA,PI, NI,H" --AtomTypesWeight | |
| 625 "HBD,1,HBA,1,PI,1,NI,1,H,1" --NormalizationMethodology None | |
| 626 --FuzzifyAtomPairsCount Yes --FuzzificationMethodology FuzzyBinning | |
| 627 --FuzzFactor 0.5 -r SampleTPAPFP -o Sample.sdf | |
| 628 | |
| 629 To generate topological pharmacophore atom pairs fingerprints of | |
| 630 arbitrary size corresponding to distances distances from 1 through 10 | |
| 631 using default atom types with no weighting, normalization, and | |
| 632 fuzzification of atom pairs count and create a SampleTPAPFP.csv file | |
| 633 containing compound ID from molecule name line along with fingerprints | |
| 634 vector strings data, type: | |
| 635 | |
| 636 % TopologicalPharmacophoreAtomPairsFingerprints.pl --DataFieldsMode | |
| 637 CompoundID -CompoundIDMode MolName -r SampleTPAPFP -o Sample.sdf | |
| 638 | |
| 639 To generate topological pharmacophore atom pairs fingerprints of | |
| 640 arbitrary size corresponding to distances from 1 through 10 using | |
| 641 default atom types with no weighting, normalization, and fuzzification | |
| 642 of atom pairs count and create a SampleTPAPFP.csv file containing | |
| 643 compound IDs using specified data field along with fingerprints vector | |
| 644 strings data, type: | |
| 645 | |
| 646 % TopologicalPharmacophoreAtomPairsFingerprints.pl --DataFieldsMode | |
| 647 CompoundID -CompoundIDMode DataField --CompoundID Mol_ID | |
| 648 -r SampleTPAPFP -o Sample.sdf | |
| 649 | |
| 650 To generate topological pharmacophore atom pairs fingerprints of | |
| 651 arbitrary size corresponding to distances from 1 through 10 using | |
| 652 default atom types with no weighting, normalization, and fuzzification | |
| 653 of atom pairs count and create a SampleTPAPFP.csv file containing | |
| 654 compound ID using combination of molecule name line and an explicit | |
| 655 compound prefix along with fingerprints vector strings data, type: | |
| 656 | |
| 657 % TopologicalPharmacophoreAtomPairsFingerprints.pl --DataFieldsMode | |
| 658 CompoundID -CompoundIDMode MolnameOrLabelPrefix | |
| 659 --CompoundID Cmpd --CompoundIDLabel MolID -r SampleTPAPFP -o Sample.sdf | |
| 660 | |
| 661 To generate topological pharmacophore atom pairs fingerprints of | |
| 662 arbitrary size corresponding to distances from 1 through 10 using | |
| 663 default atom types with no weighting, normalization, and fuzzification | |
| 664 of atom pairs count and create a SampleTPAPFP.csv file containing | |
| 665 specific data fields columns along with fingerprints vector strings | |
| 666 data, type: | |
| 667 | |
| 668 % TopologicalPharmacophoreAtomPairsFingerprints.pl --DataFieldsMode | |
| 669 Specify --DataFields Mol_ID -r SampleTPAPFP -o Sample.sdf | |
| 670 | |
| 671 To generate topological pharmacophore atom pairs fingerprints of | |
| 672 arbitrary size corresponding to distances from 1 through 10 using | |
| 673 default atom types with no weighting, normalization, and fuzzification | |
| 674 of atom pairs count and create a SampleTPAPFP.csv file containing common | |
| 675 data fields columns along with fingerprints vector strings data, type: | |
| 676 | |
| 677 % TopologicalPharmacophoreAtomPairsFingerprints.pl --DataFieldsMode | |
| 678 Common -r SampleTPAPFP -o Sample.sdf | |
| 679 | |
| 680 To generate topological pharmacophore atom pairs fingerprints of | |
| 681 arbitrary size corresponding to distances from 1 through 10 using | |
| 682 default atom types with no weighting, normalization, and fuzzification | |
| 683 of atom pairs count and create SampleTPAPFP.sdf, SampleTPAPFP.fpf, and | |
| 684 SampleTPAPFP.csv files containing all data fields columns in CSV file | |
| 685 along with fingerprints data, type: | |
| 686 | |
| 687 % TopologicalPharmacophoreAtomPairsFingerprints.pl --DataFieldsMode | |
| 688 All --output all -r SampleTPAPFP -o Sample.sdf | |
| 689 | |
| 690 AUTHOR | |
| 691 Manish Sud <msud@san.rr.com> | |
| 692 | |
| 693 SEE ALSO | |
| 694 InfoFingerprintsFiles.pl, SimilarityMatricesFingerprints.pl, | |
| 695 AtomNeighborhoodsFingerprints.pl, ExtendedConnectivityFingerprints.pl, | |
| 696 MACCSKeysFingerprints.pl, PathLengthFingerprints.pl, | |
| 697 TopologicalAtomPairsFingerprints.pl, | |
| 698 TopologicalAtomTorsionsFingerprints.pl, | |
| 699 TopologicalPharmacophoreAtomTripletsFingerprints.pl | |
| 700 | |
| 701 COPYRIGHT | |
| 702 Copyright (C) 2015 Manish Sud. All rights reserved. | |
| 703 | |
| 704 This file is part of MayaChemTools. | |
| 705 | |
| 706 MayaChemTools is free software; you can redistribute it and/or modify it | |
| 707 under the terms of the GNU Lesser General Public License as published by | |
| 708 the Free Software Foundation; either version 3 of the License, or (at | |
| 709 your option) any later version. | |
| 710 |
