0
|
1 NAME
|
|
2 TopologicalPharmacophoreAtomPairsFingerprints.pl - Generate topological
|
|
3 pharmacophore atom pairs fingerprints for SD files
|
|
4
|
|
5 SYNOPSIS
|
|
6 TopologicalPharmacophoreAtomPairsFingerprints.pl SDFile(s)...
|
|
7
|
|
8 TopologicalPharmacophoreAtomPairsFingerprints.pl [--AromaticityModel
|
|
9 *AromaticityModelType*] [--AtomPairsSetSizeToUse *ArbitrarySize |
|
|
10 FixedSize*] [-a, --AtomTypesToUse *"AtomType1, AtomType2..."*]
|
|
11 [--AtomTypesWeight *"AtomType1, Weight1, AtomType2, Weight2..."*]
|
|
12 [--CompoundID *DataFieldName or LabelPrefixString*] [--CompoundIDLabel
|
|
13 *text*] [--CompoundIDMode] [--DataFields *"FieldLabel1,
|
|
14 FieldLabel2,..."*] [-d, --DataFieldsMode *All | Common | Specify |
|
|
15 CompoundID*] [-f, --Filter *Yes | No*] [--FingerprintsLabelMode
|
|
16 *FingerprintsLabelOnly | FingerprintsLabelWithIDs*] [--FingerprintsLabel
|
|
17 *text*] [--FuzzifyAtomPairsCount *Yes | No*] [--FuzzificationMode
|
|
18 *FuzzyBinning | FuzzyBinSmoothing*] [--FuzzificationMethodology
|
|
19 *FuzzyBinning | FuzzyBinSmoothing*] [--FuzzFactor *number*] [-h, --help]
|
|
20 [-k, --KeepLargestComponent *Yes | No*] [--MinDistance *number*]
|
|
21 [--MaxDistance *number*] [-n, --NormalizationMethodology *None |
|
|
22 ByHeavyAtomsCount | ByAtomTypesCount*] [--OutDelim *comma | tab |
|
|
23 semicolon*] [--output *SD | FP | text | all*] [-o, --overwrite] [-q,
|
|
24 --quote *Yes | No*] [-r, --root *RootName*] [--ValuesPrecision *number*]
|
|
25 [-v, --VectorStringFormat *ValuesString, IDsAndValuesString |
|
|
26 IDsAndValuesPairsString | ValuesAndIDsString | ValuesAndIDsPairsString*]
|
|
27 [-w, --WorkingDir dirname] SDFile(s)...
|
|
28
|
|
29 DESCRIPTION
|
|
30 Generate topological pharmacophore atom pairs fingerprints [ Ref 60-62,
|
|
31 Ref 65, Ref 68 ] for *SDFile(s)* and create appropriate SD, FP or
|
|
32 CSV/TSV text file(s) containing fingerprints vector strings
|
|
33 corresponding to molecular fingerprints.
|
|
34
|
|
35 Multiple SDFile names are separated by spaces. The valid file extensions
|
|
36 are *.sdf* and *.sd*. All other file names are ignored. All the SD files
|
|
37 in a current directory can be specified either by **.sdf* or the current
|
|
38 directory name.
|
|
39
|
|
40 Based on the values specified for --AtomTypesToUse, pharmacophore atom
|
|
41 types are assigned to all non-hydrogen atoms in a molecule and a
|
|
42 distance matrix is generated. A pharmacophore atom pairs basis set is
|
|
43 initialized for all unique possible pairs within --MinDistance and
|
|
44 --MaxDistance range.
|
|
45
|
|
46 Let:
|
|
47
|
|
48 P = Valid pharmacophore atom type
|
|
49
|
|
50 Px = Pharmacophore atom type x
|
|
51 Py = Pharmacophore atom type y
|
|
52
|
|
53 Dmin = Minimum distance corresponding to number of bonds between
|
|
54 two atoms
|
|
55 Dmax = Maximum distance corresponding to number of bonds between
|
|
56 two atoms
|
|
57 D = Distance corresponding to number of bonds between two atoms
|
|
58
|
|
59 Px-Dn-Py = Pharmacophore atom pair ID for atom types Px and Py at
|
|
60 distance Dn
|
|
61
|
|
62 P = Number of pharmacophore atom types to consider
|
|
63 PPDn = Number of possible unique pharmacophore atom pairs at a distance Dn
|
|
64
|
|
65 PPT = Total number of possible pharmacophore atom pairs at all distances
|
|
66 between Dmin and Dmax
|
|
67
|
|
68 Then:
|
|
69
|
|
70 PPD = (P * (P - 1))/2 + P
|
|
71
|
|
72 PPT = ((Dmax - Dmin) + 1) * ((P * (P - 1))/2 + P)
|
|
73 = ((Dmax - Dmin) + 1) * PPD
|
|
74
|
|
75 So for default values of Dmin = 1, Dmax = 10 and P = 5,
|
|
76
|
|
77 PPD = (5 * (5 - 1))/2 + 5 = 15
|
|
78 PPT = ((10 - 1) + 1) * 15 = 150
|
|
79
|
|
80 The pharmacophore atom pairs bais set includes 150 values.
|
|
81
|
|
82 The atom pair IDs correspond to:
|
|
83
|
|
84 Px-Dn-Py = Pharmacophore atom pair ID for atom types Px and Py at
|
|
85 distance Dn
|
|
86
|
|
87 For example: H-D1-H, H-D2-HBA, PI-D5-PI and so on
|
|
88
|
|
89 Using distance matrix and pharmacohore atom types, occurrence of unique
|
|
90 pharmacohore atom pairs is counted. The contribution of each atom type
|
|
91 to atom pair interaction is optionally weighted by specified
|
|
92 --AtomTypesWeight before assigning its count to appropriate distance
|
|
93 bin. Based on --NormalizationMethodology option, pharmacophore atom
|
|
94 pairs count is optionally normalized. Additionally, pharmacohore atom
|
|
95 pairs count is optionally fuzzified before or after the normalization
|
|
96 controlled by values of --FuzzifyAtomPairsCount, --FuzzificationMode,
|
|
97 --FuzzificationMethodology and --FuzzFactor options.
|
|
98
|
|
99 The final pharmacophore atom pairs count along with atom pair
|
|
100 identifiers involving all non-hydrogen atoms, with optional
|
|
101 normalization and fuzzification, constitute pharmacophore topological
|
|
102 atom pairs fingerprints of the molecule.
|
|
103
|
|
104 For *ArbitrarySize* value of --AtomPairsSetSizeToUse option, the
|
|
105 fingerprint vector correspond to only those topological pharmacophore
|
|
106 atom pairs which are present and have non-zero count. However, for
|
|
107 *FixedSize* value of --AtomPairsSetSizeToUse option, the fingerprint
|
|
108 vector contains all possible valid topological pharmacophore atom pairs
|
|
109 with both zero and non-zero count values.
|
|
110
|
|
111 Example of *SD* file containing topological pharmacophore atom pairs
|
|
112 fingerprints string data:
|
|
113
|
|
114 ... ...
|
|
115 ... ...
|
|
116 $$$$
|
|
117 ... ...
|
|
118 ... ...
|
|
119 ... ...
|
|
120 41 44 0 0 0 0 0 0 0 0999 V2000
|
|
121 -3.3652 1.4499 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
122 ... ...
|
|
123 2 3 1 0 0 0 0
|
|
124 ... ...
|
|
125 M END
|
|
126 > <CmpdID>
|
|
127 Cmpd1
|
|
128
|
|
129 > <TopologicalPharmacophoreAtomPairsFingerprints>
|
|
130 FingerprintsVector;TopologicalPharmacophoreAtomPairs:ArbitrarySize:Min
|
|
131 Distance1:MaxDistance10;54;NumericalValues;IDsAndValuesString;H-D1-H H
|
|
132 -D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA H-D2-HBD HBA-D2-HBA HBA-D2-
|
|
133 HBD H-D3-H H-D3-HBA H-D3-HBD H-D3-NI HBA-D3-NI HBD-D3-NI H-D4-H H-D...;
|
|
134 18 1 2 1 22 12 8 1 2 18 6 3 1 1 1 22 13 6 5 7 2 28 9 5 1 1 1 36 16 10 3
|
|
135 4 1 37 10 8 1 35 10 9 3 3 1 28 7 7 4 18 16 12 5 1 2 1
|
|
136
|
|
137 $$$$
|
|
138 ... ...
|
|
139 ... ...
|
|
140
|
|
141 Example of *FP* file containing topological pharmacophore atom pairs
|
|
142 fingerprints string data:
|
|
143
|
|
144 #
|
|
145 # Package = MayaChemTools 7.4
|
|
146 # Release Date = Oct 21, 2010
|
|
147 #
|
|
148 # TimeStamp = Fri Mar 11 15:32:48 2011
|
|
149 #
|
|
150 # FingerprintsStringType = FingerprintsVector
|
|
151 #
|
|
152 # Description = TopologicalPharmacophoreAtomPairs:ArbitrarySize:MinDistance1:MaxDistance10
|
|
153 # VectorStringFormat = IDsAndValuesString
|
|
154 # VectorValuesType = NumericalValues
|
|
155 #
|
|
156 Cmpd1 54;H-D1-H H-D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA...;18 1 2...
|
|
157 Cmpd2 61;H-D1-H H-D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA...;5 1 2 ...
|
|
158 ... ...
|
|
159 ... ..
|
|
160
|
|
161 Example of CSV *Text* file containing topological pharmacophore atom
|
|
162 pairs fingerprints string data:
|
|
163
|
|
164 "CompoundID","TopologicalPharmacophoreAtomPairsFingerprints"
|
|
165 "Cmpd1","FingerprintsVector;TopologicalPharmacophoreAtomPairs:Arbitrary
|
|
166 Size:MinDistance1:MaxDistance10;54;NumericalValues;IDsAndValuesString;H
|
|
167 -D1-H H-D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA H-D2-HBD HBA-D2-HBA H
|
|
168 BA-D2-HBD H-D3-H H-D3-HBA H-D3-HBD H-D3-NI HBA-D3-NI HBD-D3-NI H-D4...;
|
|
169 18 1 2 1 22 12 8 1 2 18 6 3 1 1 1 22 13 6 5 7 2 28 9 5 1 1 1 36 16 10 3
|
|
170 4 1 37 10 8 1 35 10 9 3 3 1 28 7 7 4 18 16 12 5 1 2 1"
|
|
171 ... ...
|
|
172 ... ...
|
|
173
|
|
174 The current release of MayaChemTools generates the following types of
|
|
175 topological pharmacophore atom pairs fingerprints vector strings:
|
|
176
|
|
177 FingerprintsVector;TopologicalPharmacophoreAtomPairs:ArbitrarySize:Min
|
|
178 Distance1:MaxDistance10;54;NumericalValues;IDsAndValuesString;H-D1-H H
|
|
179 -D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA H-D2-HBD HBA-D2-HBA HBA-D2-
|
|
180 HBD H-D3-H H-D3-HBA H-D3-HBD H-D3-NI HBA-D3-NI HBD-D3-NI H-D4-H H-D4-H
|
|
181 BA H-D4-HBD HBA-D4-HBA HBA-D4-HBD HBD-D4-HBD H-D5-H H-D5-HBA H-D5-...;
|
|
182 18 1 2 1 22 12 8 1 2 18 6 3 1 1 1 22 13 6 5 7 2 28 9 5 1 1 1 36 16 10
|
|
183 3 4 1 37 10 8 1 35 10 9 3 3 1 28 7 7 4 18 16 12 5 1 2 1
|
|
184
|
|
185 FingerprintsVector;TopologicalPharmacophoreAtomPairs:FixedSize:MinDist
|
|
186 ance1:MaxDistance10;150;OrderedNumericalValues;ValuesString;18 0 0 1 0
|
|
187 0 0 2 0 0 1 0 0 0 0 22 12 8 0 0 1 2 0 0 0 0 0 0 0 0 18 6 3 1 0 0 0 1
|
|
188 0 0 1 0 0 0 0 22 13 6 0 0 5 7 0 0 2 0 0 0 0 0 28 9 5 1 0 0 0 1 0 0 1 0
|
|
189 0 0 0 36 16 10 0 0 3 4 0 0 1 0 0 0 0 0 37 10 8 0 0 0 0 1 0 0 0 0 0 0
|
|
190 0 35 10 9 0 0 3 3 0 0 1 0 0 0 0 0 28 7 7 4 0 0 0 0 0 0 0 0 0 0 0 18...
|
|
191
|
|
192 FingerprintsVector;TopologicalPharmacophoreAtomPairs:FixedSize:MinDist
|
|
193 ance1:MaxDistance10;150;OrderedNumericalValues;IDsAndValuesString;H-D1
|
|
194 -H H-D1-HBA H-D1-HBD H-D1-NI H-D1-PI HBA-D1-HBA HBA-D1-HBD HBA-D1-NI H
|
|
195 BA-D1-PI HBD-D1-HBD HBD-D1-NI HBD-D1-PI NI-D1-NI NI-D1-PI PI-D1-PI H-D
|
|
196 2-H H-D2-HBA H-D2-HBD H-D2-NI H-D2-PI HBA-D2-HBA HBA-D2-HBD HBA-D2...;
|
|
197 18 0 0 1 0 0 0 2 0 0 1 0 0 0 0 22 12 8 0 0 1 2 0 0 0 0 0 0 0 0 18 6 3
|
|
198 1 0 0 0 1 0 0 1 0 0 0 0 22 13 6 0 0 5 7 0 0 2 0 0 0 0 0 28 9 5 1 0 0 0
|
|
199 1 0 0 1 0 0 0 0 36 16 10 0 0 3 4 0 0 1 0 0 0 0
|
|
200
|
|
201 OPTIONS
|
|
202 --AromaticityModel *MDLAromaticityModel | TriposAromaticityModel |
|
|
203 MMFFAromaticityModel | ChemAxonBasicAromaticityModel |
|
|
204 ChemAxonGeneralAromaticityModel | DaylightAromaticityModel |
|
|
205 MayaChemToolsAromaticityModel*
|
|
206 Specify aromaticity model to use during detection of aromaticity.
|
|
207 Possible values in the current release are: *MDLAromaticityModel,
|
|
208 TriposAromaticityModel, MMFFAromaticityModel,
|
|
209 ChemAxonBasicAromaticityModel, ChemAxonGeneralAromaticityModel,
|
|
210 DaylightAromaticityModel or MayaChemToolsAromaticityModel*. Default
|
|
211 value: *MayaChemToolsAromaticityModel*.
|
|
212
|
|
213 The supported aromaticity model names along with model specific
|
|
214 control parameters are defined in AromaticityModelsData.csv, which
|
|
215 is distributed with the current release and is available under
|
|
216 lib/data directory. Molecule.pm module retrieves data from this file
|
|
217 during class instantiation and makes it available to method
|
|
218 DetectAromaticity for detecting aromaticity corresponding to a
|
|
219 specific model.
|
|
220
|
|
221 --AtomPairsSetSizeToUse *ArbitrarySize | FixedSize*
|
|
222 Atom pairs set size to use during generation of topological
|
|
223 pharmacophore atom pairs fingerprints.
|
|
224
|
|
225 Possible values: *ArbitrarySize | FixedSize*; Default value:
|
|
226 *ArbitrarySize*.
|
|
227
|
|
228 For *ArbitrarySize* value of --AtomPairsSetSizeToUse option, the
|
|
229 fingerprint vector correspond to only those topological
|
|
230 pharmacophore atom pairs which are present and have non-zero count.
|
|
231 However, for *FixedSize* value of --AtomPairsSetSizeToUse option,
|
|
232 the fingerprint vector contains all possible valid topological
|
|
233 pharmacophore atom pairs with both zero and non-zero count values.
|
|
234
|
|
235 -a, --AtomTypesToUse *"AtomType1,AtomType2,..."*
|
|
236 Pharmacophore atom types to use during generation of topological
|
|
237 phramacophore atom pairs. It's a list of comma separated valid
|
|
238 pharmacophore atom types.
|
|
239
|
|
240 Possible values for pharmacophore atom types are: *Ar, CA, H, HBA,
|
|
241 HBD, Hal, NI, PI, RA*. Default value [ Ref 60-62 ] :
|
|
242 *HBD,HBA,PI,NI,H*.
|
|
243
|
|
244 The pharmacophore atom types abbreviations correspond to:
|
|
245
|
|
246 HBD: HydrogenBondDonor
|
|
247 HBA: HydrogenBondAcceptor
|
|
248 PI : PositivelyIonizable
|
|
249 NI : NegativelyIonizable
|
|
250 Ar : Aromatic
|
|
251 Hal : Halogen
|
|
252 H : Hydrophobic
|
|
253 RA : RingAtom
|
|
254 CA : ChainAtom
|
|
255
|
|
256 *AtomTypes::FunctionalClassAtomTypes* module is used to assign
|
|
257 pharmacophore atom types. It uses following definitions [ Ref 60-61,
|
|
258 Ref 65-66 ]:
|
|
259
|
|
260 HydrogenBondDonor: NH, NH2, OH
|
|
261 HydrogenBondAcceptor: N[!H], O
|
|
262 PositivelyIonizable: +, NH2
|
|
263 NegativelyIonizable: -, C(=O)OH, S(=O)OH, P(=O)OH
|
|
264
|
|
265 --AtomTypesWeight *"AtomType1,Weight1,AtomType2,Weight2..."*
|
|
266 Weights of specified pharmacophore atom types to use during
|
|
267 calculation of their contribution to atom pair count. Default value:
|
|
268 *None*. Valid values: real numbers greater than 0. In general it's
|
|
269 comma delimited list of valid atom type and its weight.
|
|
270
|
|
271 The weight values allow to increase the importance of specific
|
|
272 pharmacophore atom type in the generated fingerprints. A weight
|
|
273 value of 0 for an atom type eliminates its contribution to atom pair
|
|
274 count where as weight value of 2 doubles its contribution.
|
|
275
|
|
276 --CompoundID *DataFieldName or LabelPrefixString*
|
|
277 This value is --CompoundIDMode specific and indicates how compound
|
|
278 ID is generated.
|
|
279
|
|
280 For *DataField* value of --CompoundIDMode option, it corresponds to
|
|
281 datafield label name whose value is used as compound ID; otherwise,
|
|
282 it's a prefix string used for generating compound IDs like
|
|
283 LabelPrefixString<Number>. Default value, *Cmpd*, generates compound
|
|
284 IDs which look like Cmpd<Number>.
|
|
285
|
|
286 Examples for *DataField* value of --CompoundIDMode:
|
|
287
|
|
288 MolID
|
|
289 ExtReg
|
|
290
|
|
291 Examples for *LabelPrefix* or *MolNameOrLabelPrefix* value of
|
|
292 --CompoundIDMode:
|
|
293
|
|
294 Compound
|
|
295
|
|
296 The value specified above generates compound IDs which correspond to
|
|
297 Compound<Number> instead of default value of Cmpd<Number>.
|
|
298
|
|
299 --CompoundIDLabel *text*
|
|
300 Specify compound ID column label for CSV/TSV text file(s) used
|
|
301 during *CompoundID* value of --DataFieldsMode option. Default value:
|
|
302 *CompoundID*.
|
|
303
|
|
304 --CompoundIDMode *DataField | MolName | LabelPrefix |
|
|
305 MolNameOrLabelPrefix*
|
|
306 Specify how to generate compound IDs and write to FP or CSV/TSV text
|
|
307 file(s) along with generated fingerprints for *FP | text | all*
|
|
308 values of --output option: use a *SDFile(s)* datafield value; use
|
|
309 molname line from *SDFile(s)*; generate a sequential ID with
|
|
310 specific prefix; use combination of both MolName and LabelPrefix
|
|
311 with usage of LabelPrefix values for empty molname lines.
|
|
312
|
|
313 Possible values: *DataField | MolName | LabelPrefix |
|
|
314 MolNameOrLabelPrefix*. Default value: *LabelPrefix*.
|
|
315
|
|
316 For *MolNameAndLabelPrefix* value of --CompoundIDMode, molname line
|
|
317 in *SDFile(s)* takes precedence over sequential compound IDs
|
|
318 generated using *LabelPrefix* and only empty molname values are
|
|
319 replaced with sequential compound IDs.
|
|
320
|
|
321 This is only used for *CompoundID* value of --DataFieldsMode option.
|
|
322
|
|
323 --DataFields *"FieldLabel1,FieldLabel2,..."*
|
|
324 Comma delimited list of *SDFiles(s)* data fields to extract and
|
|
325 write to CSV/TSV text file(s) along with generated fingerprints for
|
|
326 *text | all* values of --output option.
|
|
327
|
|
328 This is only used for *Specify* value of --DataFieldsMode option.
|
|
329
|
|
330 Examples:
|
|
331
|
|
332 Extreg
|
|
333 MolID,CompoundName
|
|
334
|
|
335 -d, --DataFieldsMode *All | Common | Specify | CompoundID*
|
|
336 Specify how data fields in *SDFile(s)* are transferred to output
|
|
337 CSV/TSV text file(s) along with generated fingerprints for *text |
|
|
338 all* values of --output option: transfer all SD data field; transfer
|
|
339 SD data files common to all compounds; extract specified data
|
|
340 fields; generate a compound ID using molname line, a compound
|
|
341 prefix, or a combination of both. Possible values: *All | Common |
|
|
342 specify | CompoundID*. Default value: *CompoundID*.
|
|
343
|
|
344 -f, --Filter *Yes | No*
|
|
345 Specify whether to check and filter compound data in SDFile(s).
|
|
346 Possible values: *Yes or No*. Default value: *Yes*.
|
|
347
|
|
348 By default, compound data is checked before calculating fingerprints
|
|
349 and compounds containing atom data corresponding to non-element
|
|
350 symbols or no atom data are ignored.
|
|
351
|
|
352 --FingerprintsLabelMode *FingerprintsLabelOnly |
|
|
353 FingerprintsLabelWithIDs*
|
|
354 Specify how fingerprints label is generated in conjunction with
|
|
355 --FingerprintsLabel option value: use fingerprints label generated
|
|
356 only by --FingerprintsLabel option value or append topological atom
|
|
357 pair count value IDs to --FingerprintsLabel option value.
|
|
358
|
|
359 Possible values: *FingerprintsLabelOnly | FingerprintsLabelWithIDs*.
|
|
360 Default value: *FingerprintsLabelOnly*.
|
|
361
|
|
362 Topological atom pairs IDs appended to --FingerprintsLabel value
|
|
363 during *FingerprintsLabelWithIDs* values of --FingerprintsLabelMode
|
|
364 correspond to atom pair count values in fingerprint vector string.
|
|
365
|
|
366 *FingerprintsLabelWithIDs* value of --FingerprintsLabelMode is
|
|
367 ignored during *ArbitrarySize* value of --AtomPairsSetSizeToUse
|
|
368 option and topological atom pairs IDs not appended to the label.
|
|
369
|
|
370 --FingerprintsLabel *text*
|
|
371 SD data label or text file column label to use for fingerprints
|
|
372 string in output SD or CSV/TSV text file(s) specified by --output.
|
|
373 Default value: *TopologicalPharmacophoreAtomPairsFingerprints*.
|
|
374
|
|
375 --FuzzifyAtomPairsCount *Yes | No*
|
|
376 To fuzzify or not to fuzzify atom pairs count. Possible values: *Yes
|
|
377 or No*. Default value: *No*.
|
|
378
|
|
379 --FuzzificationMode *BeforeNormalization | AfterNormalization*
|
|
380 When to fuzzify atom pairs count. Possible values:
|
|
381 *BeforeNormalization | AfterNormalizationYes*. Default value:
|
|
382 *AfterNormalization*.
|
|
383
|
|
384 --FuzzificationMethodology *FuzzyBinning | FuzzyBinSmoothing*
|
|
385 How to fuzzify atom pairs count. Possible values: *FuzzyBinning |
|
|
386 FuzzyBinSmoothing*. Default value: *FuzzyBinning*.
|
|
387
|
|
388 In conjunction with values for options --FuzzifyAtomPairsCount,
|
|
389 --FuzzificationMode and --FuzzFactor, --FuzzificationMethodology
|
|
390 option is used to fuzzify pharmacophore atom pairs count.
|
|
391
|
|
392 Let:
|
|
393
|
|
394 Px = Pharmacophore atom type x
|
|
395 Py = Pharmacophore atom type y
|
|
396 PPxy = Pharmacophore atom pair between atom type Px and Py
|
|
397
|
|
398 PPxyDn = Pharmacophore atom pairs count between atom type Px and Py
|
|
399 at distance Dn
|
|
400 PPxyDn-1 = Pharmacophore atom pairs count between atom type Px and Py
|
|
401 at distance Dn - 1
|
|
402 PPxyDn+1 = Pharmacophore atom pairs count between atom type Px and Py
|
|
403 at distance Dn + 1
|
|
404
|
|
405 FF = FuzzFactor for FuzzyBinning and FuzzyBinSmoothing
|
|
406
|
|
407 Then:
|
|
408
|
|
409 For *FuzzyBinning*:
|
|
410
|
|
411 PPxyDn = PPxyDn (Unchanged)
|
|
412
|
|
413 PPxyDn-1 = PPxyDn-1 + PPxyDn * FF
|
|
414 PPxyDn+1 = PPxyDn+1 + PPxyDn * FF
|
|
415
|
|
416 For *FuzzyBinSmoothing*:
|
|
417
|
|
418 PPxyDn = PPxyDn - PPxyDn * 2FF for Dmin < Dn < Dmax
|
|
419 PPxyDn = PPxyDn - PPxyDn * FF for Dn = Dmin or Dmax
|
|
420
|
|
421 PPxyDn-1 = PPxyDn-1 + PPxyDn * FF
|
|
422 PPxyDn+1 = PPxyDn+1 + PPxyDn * FF
|
|
423
|
|
424 In both fuzzification schemes, a value of 0 for FF implies no
|
|
425 fuzzification of occurrence counts. A value of 1 during
|
|
426 *FuzzyBinning* corresponds to maximum fuzzification of occurrence
|
|
427 counts; however, a value of 1 during *FuzzyBinSmoothing* ends up
|
|
428 completely distributing the value over the previous and next
|
|
429 distance bins.
|
|
430
|
|
431 So for default value of --FuzzFactor (FF) 0.15, the occurrence count
|
|
432 of pharmacohore atom pairs at distance Dn during FuzzyBinning is
|
|
433 left unchanged and the counts at distances Dn -1 and Dn + 1 are
|
|
434 incremented by PPxyDn * 0.15.
|
|
435
|
|
436 And during *FuzzyBinSmoothing* the occurrence counts at Distance Dn
|
|
437 is scaled back using multiplicative factor of (1 - 2*0.15) and the
|
|
438 occurrence counts at distances Dn -1 and Dn + 1 are incremented by
|
|
439 PPxyDn * 0.15. In otherwords, occurrence bin count is smoothed out
|
|
440 by distributing it over the previous and next distance value.
|
|
441
|
|
442 --FuzzFactor *number*
|
|
443 Specify by how much to fuzzify atom pairs count. Default value:
|
|
444 *0.15*. Valid values: For *FuzzyBinning* value of
|
|
445 --FuzzificationMethodology option: *between 0 and 1.0*; For
|
|
446 *FuzzyBinSmoothing* value of --FuzzificationMethodology option:
|
|
447 *between 0 and 0.5*.
|
|
448
|
|
449 -h, --help
|
|
450 Print this help message.
|
|
451
|
|
452 -k, --KeepLargestComponent *Yes | No*
|
|
453 Generate fingerprints for only the largest component in molecule.
|
|
454 Possible values: *Yes or No*. Default value: *Yes*.
|
|
455
|
|
456 For molecules containing multiple connected components, fingerprints
|
|
457 can be generated in two different ways: use all connected components
|
|
458 or just the largest connected component. By default, all atoms
|
|
459 except for the largest connected component are deleted before
|
|
460 generation of fingerprints.
|
|
461
|
|
462 --MinDistance *number*
|
|
463 Minimum bond distance between atom pairs for generating topological
|
|
464 pharmacophore atom pairs. Default value: *1*. Valid values: positive
|
|
465 integers including 0 and less than --MaxDistance.
|
|
466
|
|
467 --MaxDistance *number*
|
|
468 Maximum bond distance between atom pairs for generating topological
|
|
469 pharmacophore atom pairs. Default value: *10*. Valid values:
|
|
470 positive integers and greater than --MinDistance.
|
|
471
|
|
472 -n, --NormalizationMethodology *None | ByHeavyAtomsCount |
|
|
473 ByAtomTypesCount*
|
|
474 Normalization methodology to use for scaling the occurrence count of
|
|
475 pharmacophore atom pairs within specified distance range. Possible
|
|
476 values: *None, ByHeavyAtomsCount or ByAtomTypesCount*. Default
|
|
477 value: *None*.
|
|
478
|
|
479 --OutDelim *comma | tab | semicolon*
|
|
480 Delimiter for output CSV/TSV text file(s). Possible values: *comma,
|
|
481 tab, or semicolon* Default value: *comma*.
|
|
482
|
|
483 --output *SD | FP | text | all*
|
|
484 Type of output files to generate. Possible values: *SD, FP, text, or
|
|
485 all*. Default value: *text*.
|
|
486
|
|
487 -o, --overwrite
|
|
488 Overwrite existing files.
|
|
489
|
|
490 -q, --quote *Yes | No*
|
|
491 Put quote around column values in output CSV/TSV text file(s).
|
|
492 Possible values: *Yes or No*. Default value: *Yes*
|
|
493
|
|
494 -r, --root *RootName*
|
|
495 New file name is generated using the root: <Root>.<Ext>. Default for
|
|
496 new file names:
|
|
497 <SDFileName><TopologicalPharmacophoreAtomPairsFP>.<Ext>. The file
|
|
498 type determines <Ext> value. The sdf, fpf, csv, and tsv <Ext> values
|
|
499 are used for SD, FP, comma/semicolon, and tab delimited text files,
|
|
500 respectively.This option is ignored for multiple input files.
|
|
501
|
|
502 --ValuesPrecision *number*
|
|
503 Precision of atom pairs count real values which might be generated
|
|
504 after normalization or fuzzification. Default value: up to *2*
|
|
505 decimal places. Valid values: positive integers.
|
|
506
|
|
507 -v, --VectorStringFormat *ValuesString, IDsAndValuesString |
|
|
508 IDsAndValuesPairsString | ValuesAndIDsString | ValuesAndIDsPairsString*
|
|
509 Format of fingerprints vector string data in output SD, FP or
|
|
510 CSV/TSV text file(s) specified by --output option. Possible values:
|
|
511 *ValuesString, IDsAndValuesString | IDsAndValuesPairsString |
|
|
512 ValuesAndIDsString | ValuesAndIDsPairsString*.
|
|
513
|
|
514 Default value during *FixedSize* value of --AtomPairsSetSizeToUse
|
|
515 option: *ValuesString*. Default value during *ArbitrarySize* value
|
|
516 of --AtomPairsSetSizeToUse option: *IDsAndValuesString*.
|
|
517
|
|
518 *ValuesString* option value is not allowed for *ArbitrarySize* value
|
|
519 of --AtomPairsSetSizeToUse option.
|
|
520
|
|
521 Examples:
|
|
522
|
|
523 FingerprintsVector;TopologicalPharmacophoreAtomPairs:ArbitrarySize:Min
|
|
524 Distance1:MaxDistance10;54;NumericalValues;IDsAndValuesString;H-D1-H H
|
|
525 -D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA H-D2-HBD HBA-D2-HBA HBA-D2-
|
|
526 HBD H-D3-H H-D3-HBA H-D3-HBD H-D3-NI HBA-D3-NI HBD-D3-NI H-D4-H H-D4-H
|
|
527 BA H-D4-HBD HBA-D4-HBA HBA-D4-HBD HBD-D4-HBD H-D5-H H-D5-HBA H-D5-...;
|
|
528 18 1 2 1 22 12 8 1 2 18 6 3 1 1 1 22 13 6 5 7 2 28 9 5 1 1 1 36 16 10
|
|
529 3 4 1 37 10 8 1 35 10 9 3 3 1 28 7 7 4 18 16 12 5 1 2 1
|
|
530
|
|
531 FingerprintsVector;TopologicalPharmacophoreAtomPairs:FixedSize:MinDist
|
|
532 ance1:MaxDistance10;150;OrderedNumericalValues;ValuesString;18 0 0 1 0
|
|
533 0 0 2 0 0 1 0 0 0 0 22 12 8 0 0 1 2 0 0 0 0 0 0 0 0 18 6 3 1 0 0 0 1
|
|
534 0 0 1 0 0 0 0 22 13 6 0 0 5 7 0 0 2 0 0 0 0 0 28 9 5 1 0 0 0 1 0 0 1 0
|
|
535 0 0 0 36 16 10 0 0 3 4 0 0 1 0 0 0 0 0 37 10 8 0 0 0 0 1 0 0 0 0 0 0
|
|
536 0 35 10 9 0 0 3 3 0 0 1 0 0 0 0 0 28 7 7 4 0 0 0 0 0 0 0 0 0 0 0 18...
|
|
537
|
|
538 FingerprintsVector;TopologicalPharmacophoreAtomPairs:FixedSize:MinDist
|
|
539 ance1:MaxDistance10;150;OrderedNumericalValues;IDsAndValuesString;H-D1
|
|
540 -H H-D1-HBA H-D1-HBD H-D1-NI H-D1-PI HBA-D1-HBA HBA-D1-HBD HBA-D1-NI H
|
|
541 BA-D1-PI HBD-D1-HBD HBD-D1-NI HBD-D1-PI NI-D1-NI NI-D1-PI PI-D1-PI H-D
|
|
542 2-H H-D2-HBA H-D2-HBD H-D2-NI H-D2-PI HBA-D2-HBA HBA-D2-HBD HBA-D2...;
|
|
543 18 0 0 1 0 0 0 2 0 0 1 0 0 0 0 22 12 8 0 0 1 2 0 0 0 0 0 0 0 0 18 6 3
|
|
544 1 0 0 0 1 0 0 1 0 0 0 0 22 13 6 0 0 5 7 0 0 2 0 0 0 0 0 28 9 5 1 0 0 0
|
|
545 1 0 0 1 0 0 0 0 36 16 10 0 0 3 4 0 0 1 0 0 0 0
|
|
546
|
|
547 -w, --WorkingDir *DirName*
|
|
548 Location of working directory. Default value: current directory.
|
|
549
|
|
550 EXAMPLES
|
|
551 To generate topological pharmacophore atom pairs fingerprints of
|
|
552 arbitrary size corresponding to distances from 1 through 10 using
|
|
553 default atom types with no weighting, normalization, and fuzzification
|
|
554 of atom pairs count and create a SampleTPAPFP.csv file containing
|
|
555 sequential compound IDs along with fingerprints vector strings data in
|
|
556 ValuesString format, type:
|
|
557
|
|
558 % TopologicalPharmacophoreAtomPairsFingerprints.pl -r SampleTPAPFP
|
|
559 -o Sample.sdf
|
|
560
|
|
561 To generate topological pharmacophore atom pairs fingerprints of fixed
|
|
562 size corresponding to distances from 1 through 10 using default atom
|
|
563 types with no weighting, normalization, and fuzzification of atom pairs
|
|
564 count and create a SampleTPAPFP.csv file containing sequential compound
|
|
565 IDs along with fingerprints vector strings data in ValuesString format,
|
|
566 type:
|
|
567
|
|
568 % TopologicalPharmacophoreAtomPairsFingerprints.pl
|
|
569 --AtomPairsSetSizeToUse FixedSize -r SampleTPAPFP-o Sample.sdf
|
|
570
|
|
571 To generate topological pharmacophore atom pairs fingerprints of
|
|
572 arbitrary size corresponding to distances from 1 through 10 using
|
|
573 default atom types with no weighting, normalization, and fuzzification
|
|
574 of atom pairs count and create SampleTPAPFP.sdf, SampleTPAPFP.fpf and
|
|
575 SampleTPAPFP.csv files containing sequential compound IDs in CSV file
|
|
576 along with fingerprints vector strings data in ValuesString format,
|
|
577 type:
|
|
578
|
|
579 % TopologicalPharmacophoreAtomPairsFingerprints.pl --output all
|
|
580 -r SampleTPAPFP -o Sample.sdf
|
|
581
|
|
582 To generate topological pharmacophore atom pairs fingerprints of
|
|
583 arbitrary size corresponding to distances from 1 through 10 using
|
|
584 default atom types with no weighting, normalization, and fuzzification
|
|
585 of atom pairs count and create a SampleTPAPFP.csv file containing
|
|
586 sequential compound IDs along with fingerprints vector strings data in
|
|
587 IDsAndValuesPairsString format, type:
|
|
588
|
|
589 % TopologicalPharmacophoreAtomPairsFingerprints.pl --VectorStringFormat
|
|
590 IDsAndValuesPairsString -r SampleTPAPFP -o Sample.sdf
|
|
591
|
|
592 To generate topological pharmacophore atom pairs fingerprints of
|
|
593 arbitrary size corresponding to distances from 1 through 6 using default
|
|
594 atom types with no weighting, normalization, and fuzzification of atom
|
|
595 pairs count and create a SampleTPAPFP.csv file containing sequential
|
|
596 compound IDs along with fingerprints vector strings data in ValuesString
|
|
597 format, type:
|
|
598
|
|
599 % TopologicalPharmacophoreAtomPairsFingerprints.pl --MinDistance 1
|
|
600 -MaxDistance 6 -r SampleTPAPFP -o Sample.sdf
|
|
601
|
|
602 To generate topological pharmacophore atom pairs fingerprints of
|
|
603 arbitrary size corresponding to distances from 1 through 10 using
|
|
604 "HBD,HBA,PI,NI" atom types with double the weighting for "HBD,HBA" and
|
|
605 normalization by HeavyAtomCount but no fuzzification of atom pairs count
|
|
606 and create a SampleTPAPFP.csv file containing sequential compound IDs
|
|
607 along with fingerprints vector strings data in ValuesString format,
|
|
608 type:
|
|
609
|
|
610 % TopologicalPharmacophoreAtomPairsFingerprints.pl --MinDistance 1
|
|
611 -MaxDistance 10 --AtomTypesToUse "HBD,HBA,PI, NI" --AtomTypesWeight
|
|
612 "HBD,2,HBA,2,PI,1,NI,1" --NormalizationMethodology ByHeavyAtomsCount
|
|
613 --FuzzifyAtomPairsCount No -r SampleTPAPFP -o Sample.sdf
|
|
614
|
|
615 To generate topological pharmacophore atom pairs fingerprints of
|
|
616 arbitrary size corresponding to distances from 1 through 10 using
|
|
617 "HBD,HBA,PI,NI,H" atom types with no weighting of atom types and
|
|
618 normalization but with fuzzification of atom pairs count using
|
|
619 FuzzyBinning methodology with FuzzFactor value 0.15 and create a
|
|
620 SampleTPAPFP.csv file containing sequential compound IDs along with
|
|
621 fingerprints vector strings data in ValuesString format, type:
|
|
622
|
|
623 % TopologicalPharmacophoreAtomPairsFingerprints.pl --MinDistance 1
|
|
624 --MaxDistance 10 --AtomTypesToUse "HBD,HBA,PI, NI,H" --AtomTypesWeight
|
|
625 "HBD,1,HBA,1,PI,1,NI,1,H,1" --NormalizationMethodology None
|
|
626 --FuzzifyAtomPairsCount Yes --FuzzificationMethodology FuzzyBinning
|
|
627 --FuzzFactor 0.5 -r SampleTPAPFP -o Sample.sdf
|
|
628
|
|
629 To generate topological pharmacophore atom pairs fingerprints of
|
|
630 arbitrary size corresponding to distances distances from 1 through 10
|
|
631 using default atom types with no weighting, normalization, and
|
|
632 fuzzification of atom pairs count and create a SampleTPAPFP.csv file
|
|
633 containing compound ID from molecule name line along with fingerprints
|
|
634 vector strings data, type:
|
|
635
|
|
636 % TopologicalPharmacophoreAtomPairsFingerprints.pl --DataFieldsMode
|
|
637 CompoundID -CompoundIDMode MolName -r SampleTPAPFP -o Sample.sdf
|
|
638
|
|
639 To generate topological pharmacophore atom pairs fingerprints of
|
|
640 arbitrary size corresponding to distances from 1 through 10 using
|
|
641 default atom types with no weighting, normalization, and fuzzification
|
|
642 of atom pairs count and create a SampleTPAPFP.csv file containing
|
|
643 compound IDs using specified data field along with fingerprints vector
|
|
644 strings data, type:
|
|
645
|
|
646 % TopologicalPharmacophoreAtomPairsFingerprints.pl --DataFieldsMode
|
|
647 CompoundID -CompoundIDMode DataField --CompoundID Mol_ID
|
|
648 -r SampleTPAPFP -o Sample.sdf
|
|
649
|
|
650 To generate topological pharmacophore atom pairs fingerprints of
|
|
651 arbitrary size corresponding to distances from 1 through 10 using
|
|
652 default atom types with no weighting, normalization, and fuzzification
|
|
653 of atom pairs count and create a SampleTPAPFP.csv file containing
|
|
654 compound ID using combination of molecule name line and an explicit
|
|
655 compound prefix along with fingerprints vector strings data, type:
|
|
656
|
|
657 % TopologicalPharmacophoreAtomPairsFingerprints.pl --DataFieldsMode
|
|
658 CompoundID -CompoundIDMode MolnameOrLabelPrefix
|
|
659 --CompoundID Cmpd --CompoundIDLabel MolID -r SampleTPAPFP -o Sample.sdf
|
|
660
|
|
661 To generate topological pharmacophore atom pairs fingerprints of
|
|
662 arbitrary size corresponding to distances from 1 through 10 using
|
|
663 default atom types with no weighting, normalization, and fuzzification
|
|
664 of atom pairs count and create a SampleTPAPFP.csv file containing
|
|
665 specific data fields columns along with fingerprints vector strings
|
|
666 data, type:
|
|
667
|
|
668 % TopologicalPharmacophoreAtomPairsFingerprints.pl --DataFieldsMode
|
|
669 Specify --DataFields Mol_ID -r SampleTPAPFP -o Sample.sdf
|
|
670
|
|
671 To generate topological pharmacophore atom pairs fingerprints of
|
|
672 arbitrary size corresponding to distances from 1 through 10 using
|
|
673 default atom types with no weighting, normalization, and fuzzification
|
|
674 of atom pairs count and create a SampleTPAPFP.csv file containing common
|
|
675 data fields columns along with fingerprints vector strings data, type:
|
|
676
|
|
677 % TopologicalPharmacophoreAtomPairsFingerprints.pl --DataFieldsMode
|
|
678 Common -r SampleTPAPFP -o Sample.sdf
|
|
679
|
|
680 To generate topological pharmacophore atom pairs fingerprints of
|
|
681 arbitrary size corresponding to distances from 1 through 10 using
|
|
682 default atom types with no weighting, normalization, and fuzzification
|
|
683 of atom pairs count and create SampleTPAPFP.sdf, SampleTPAPFP.fpf, and
|
|
684 SampleTPAPFP.csv files containing all data fields columns in CSV file
|
|
685 along with fingerprints data, type:
|
|
686
|
|
687 % TopologicalPharmacophoreAtomPairsFingerprints.pl --DataFieldsMode
|
|
688 All --output all -r SampleTPAPFP -o Sample.sdf
|
|
689
|
|
690 AUTHOR
|
|
691 Manish Sud <msud@san.rr.com>
|
|
692
|
|
693 SEE ALSO
|
|
694 InfoFingerprintsFiles.pl, SimilarityMatricesFingerprints.pl,
|
|
695 AtomNeighborhoodsFingerprints.pl, ExtendedConnectivityFingerprints.pl,
|
|
696 MACCSKeysFingerprints.pl, PathLengthFingerprints.pl,
|
|
697 TopologicalAtomPairsFingerprints.pl,
|
|
698 TopologicalAtomTorsionsFingerprints.pl,
|
|
699 TopologicalPharmacophoreAtomTripletsFingerprints.pl
|
|
700
|
|
701 COPYRIGHT
|
|
702 Copyright (C) 2015 Manish Sud. All rights reserved.
|
|
703
|
|
704 This file is part of MayaChemTools.
|
|
705
|
|
706 MayaChemTools is free software; you can redistribute it and/or modify it
|
|
707 under the terms of the GNU Lesser General Public License as published by
|
|
708 the Free Software Foundation; either version 3 of the License, or (at
|
|
709 your option) any later version.
|
|
710
|