comparison docs/scripts/txt/TopologicalPharmacophoreAtomPairsFingerprints.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:4816e4a8ae95
1 NAME
2 TopologicalPharmacophoreAtomPairsFingerprints.pl - Generate topological
3 pharmacophore atom pairs fingerprints for SD files
4
5 SYNOPSIS
6 TopologicalPharmacophoreAtomPairsFingerprints.pl SDFile(s)...
7
8 TopologicalPharmacophoreAtomPairsFingerprints.pl [--AromaticityModel
9 *AromaticityModelType*] [--AtomPairsSetSizeToUse *ArbitrarySize |
10 FixedSize*] [-a, --AtomTypesToUse *"AtomType1, AtomType2..."*]
11 [--AtomTypesWeight *"AtomType1, Weight1, AtomType2, Weight2..."*]
12 [--CompoundID *DataFieldName or LabelPrefixString*] [--CompoundIDLabel
13 *text*] [--CompoundIDMode] [--DataFields *"FieldLabel1,
14 FieldLabel2,..."*] [-d, --DataFieldsMode *All | Common | Specify |
15 CompoundID*] [-f, --Filter *Yes | No*] [--FingerprintsLabelMode
16 *FingerprintsLabelOnly | FingerprintsLabelWithIDs*] [--FingerprintsLabel
17 *text*] [--FuzzifyAtomPairsCount *Yes | No*] [--FuzzificationMode
18 *FuzzyBinning | FuzzyBinSmoothing*] [--FuzzificationMethodology
19 *FuzzyBinning | FuzzyBinSmoothing*] [--FuzzFactor *number*] [-h, --help]
20 [-k, --KeepLargestComponent *Yes | No*] [--MinDistance *number*]
21 [--MaxDistance *number*] [-n, --NormalizationMethodology *None |
22 ByHeavyAtomsCount | ByAtomTypesCount*] [--OutDelim *comma | tab |
23 semicolon*] [--output *SD | FP | text | all*] [-o, --overwrite] [-q,
24 --quote *Yes | No*] [-r, --root *RootName*] [--ValuesPrecision *number*]
25 [-v, --VectorStringFormat *ValuesString, IDsAndValuesString |
26 IDsAndValuesPairsString | ValuesAndIDsString | ValuesAndIDsPairsString*]
27 [-w, --WorkingDir dirname] SDFile(s)...
28
29 DESCRIPTION
30 Generate topological pharmacophore atom pairs fingerprints [ Ref 60-62,
31 Ref 65, Ref 68 ] for *SDFile(s)* and create appropriate SD, FP or
32 CSV/TSV text file(s) containing fingerprints vector strings
33 corresponding to molecular fingerprints.
34
35 Multiple SDFile names are separated by spaces. The valid file extensions
36 are *.sdf* and *.sd*. All other file names are ignored. All the SD files
37 in a current directory can be specified either by **.sdf* or the current
38 directory name.
39
40 Based on the values specified for --AtomTypesToUse, pharmacophore atom
41 types are assigned to all non-hydrogen atoms in a molecule and a
42 distance matrix is generated. A pharmacophore atom pairs basis set is
43 initialized for all unique possible pairs within --MinDistance and
44 --MaxDistance range.
45
46 Let:
47
48 P = Valid pharmacophore atom type
49
50 Px = Pharmacophore atom type x
51 Py = Pharmacophore atom type y
52
53 Dmin = Minimum distance corresponding to number of bonds between
54 two atoms
55 Dmax = Maximum distance corresponding to number of bonds between
56 two atoms
57 D = Distance corresponding to number of bonds between two atoms
58
59 Px-Dn-Py = Pharmacophore atom pair ID for atom types Px and Py at
60 distance Dn
61
62 P = Number of pharmacophore atom types to consider
63 PPDn = Number of possible unique pharmacophore atom pairs at a distance Dn
64
65 PPT = Total number of possible pharmacophore atom pairs at all distances
66 between Dmin and Dmax
67
68 Then:
69
70 PPD = (P * (P - 1))/2 + P
71
72 PPT = ((Dmax - Dmin) + 1) * ((P * (P - 1))/2 + P)
73 = ((Dmax - Dmin) + 1) * PPD
74
75 So for default values of Dmin = 1, Dmax = 10 and P = 5,
76
77 PPD = (5 * (5 - 1))/2 + 5 = 15
78 PPT = ((10 - 1) + 1) * 15 = 150
79
80 The pharmacophore atom pairs bais set includes 150 values.
81
82 The atom pair IDs correspond to:
83
84 Px-Dn-Py = Pharmacophore atom pair ID for atom types Px and Py at
85 distance Dn
86
87 For example: H-D1-H, H-D2-HBA, PI-D5-PI and so on
88
89 Using distance matrix and pharmacohore atom types, occurrence of unique
90 pharmacohore atom pairs is counted. The contribution of each atom type
91 to atom pair interaction is optionally weighted by specified
92 --AtomTypesWeight before assigning its count to appropriate distance
93 bin. Based on --NormalizationMethodology option, pharmacophore atom
94 pairs count is optionally normalized. Additionally, pharmacohore atom
95 pairs count is optionally fuzzified before or after the normalization
96 controlled by values of --FuzzifyAtomPairsCount, --FuzzificationMode,
97 --FuzzificationMethodology and --FuzzFactor options.
98
99 The final pharmacophore atom pairs count along with atom pair
100 identifiers involving all non-hydrogen atoms, with optional
101 normalization and fuzzification, constitute pharmacophore topological
102 atom pairs fingerprints of the molecule.
103
104 For *ArbitrarySize* value of --AtomPairsSetSizeToUse option, the
105 fingerprint vector correspond to only those topological pharmacophore
106 atom pairs which are present and have non-zero count. However, for
107 *FixedSize* value of --AtomPairsSetSizeToUse option, the fingerprint
108 vector contains all possible valid topological pharmacophore atom pairs
109 with both zero and non-zero count values.
110
111 Example of *SD* file containing topological pharmacophore atom pairs
112 fingerprints string data:
113
114 ... ...
115 ... ...
116 $$$$
117 ... ...
118 ... ...
119 ... ...
120 41 44 0 0 0 0 0 0 0 0999 V2000
121 -3.3652 1.4499 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
122 ... ...
123 2 3 1 0 0 0 0
124 ... ...
125 M END
126 > <CmpdID>
127 Cmpd1
128
129 > <TopologicalPharmacophoreAtomPairsFingerprints>
130 FingerprintsVector;TopologicalPharmacophoreAtomPairs:ArbitrarySize:Min
131 Distance1:MaxDistance10;54;NumericalValues;IDsAndValuesString;H-D1-H H
132 -D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA H-D2-HBD HBA-D2-HBA HBA-D2-
133 HBD H-D3-H H-D3-HBA H-D3-HBD H-D3-NI HBA-D3-NI HBD-D3-NI H-D4-H H-D...;
134 18 1 2 1 22 12 8 1 2 18 6 3 1 1 1 22 13 6 5 7 2 28 9 5 1 1 1 36 16 10 3
135 4 1 37 10 8 1 35 10 9 3 3 1 28 7 7 4 18 16 12 5 1 2 1
136
137 $$$$
138 ... ...
139 ... ...
140
141 Example of *FP* file containing topological pharmacophore atom pairs
142 fingerprints string data:
143
144 #
145 # Package = MayaChemTools 7.4
146 # Release Date = Oct 21, 2010
147 #
148 # TimeStamp = Fri Mar 11 15:32:48 2011
149 #
150 # FingerprintsStringType = FingerprintsVector
151 #
152 # Description = TopologicalPharmacophoreAtomPairs:ArbitrarySize:MinDistance1:MaxDistance10
153 # VectorStringFormat = IDsAndValuesString
154 # VectorValuesType = NumericalValues
155 #
156 Cmpd1 54;H-D1-H H-D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA...;18 1 2...
157 Cmpd2 61;H-D1-H H-D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA...;5 1 2 ...
158 ... ...
159 ... ..
160
161 Example of CSV *Text* file containing topological pharmacophore atom
162 pairs fingerprints string data:
163
164 "CompoundID","TopologicalPharmacophoreAtomPairsFingerprints"
165 "Cmpd1","FingerprintsVector;TopologicalPharmacophoreAtomPairs:Arbitrary
166 Size:MinDistance1:MaxDistance10;54;NumericalValues;IDsAndValuesString;H
167 -D1-H H-D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA H-D2-HBD HBA-D2-HBA H
168 BA-D2-HBD H-D3-H H-D3-HBA H-D3-HBD H-D3-NI HBA-D3-NI HBD-D3-NI H-D4...;
169 18 1 2 1 22 12 8 1 2 18 6 3 1 1 1 22 13 6 5 7 2 28 9 5 1 1 1 36 16 10 3
170 4 1 37 10 8 1 35 10 9 3 3 1 28 7 7 4 18 16 12 5 1 2 1"
171 ... ...
172 ... ...
173
174 The current release of MayaChemTools generates the following types of
175 topological pharmacophore atom pairs fingerprints vector strings:
176
177 FingerprintsVector;TopologicalPharmacophoreAtomPairs:ArbitrarySize:Min
178 Distance1:MaxDistance10;54;NumericalValues;IDsAndValuesString;H-D1-H H
179 -D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA H-D2-HBD HBA-D2-HBA HBA-D2-
180 HBD H-D3-H H-D3-HBA H-D3-HBD H-D3-NI HBA-D3-NI HBD-D3-NI H-D4-H H-D4-H
181 BA H-D4-HBD HBA-D4-HBA HBA-D4-HBD HBD-D4-HBD H-D5-H H-D5-HBA H-D5-...;
182 18 1 2 1 22 12 8 1 2 18 6 3 1 1 1 22 13 6 5 7 2 28 9 5 1 1 1 36 16 10
183 3 4 1 37 10 8 1 35 10 9 3 3 1 28 7 7 4 18 16 12 5 1 2 1
184
185 FingerprintsVector;TopologicalPharmacophoreAtomPairs:FixedSize:MinDist
186 ance1:MaxDistance10;150;OrderedNumericalValues;ValuesString;18 0 0 1 0
187 0 0 2 0 0 1 0 0 0 0 22 12 8 0 0 1 2 0 0 0 0 0 0 0 0 18 6 3 1 0 0 0 1
188 0 0 1 0 0 0 0 22 13 6 0 0 5 7 0 0 2 0 0 0 0 0 28 9 5 1 0 0 0 1 0 0 1 0
189 0 0 0 36 16 10 0 0 3 4 0 0 1 0 0 0 0 0 37 10 8 0 0 0 0 1 0 0 0 0 0 0
190 0 35 10 9 0 0 3 3 0 0 1 0 0 0 0 0 28 7 7 4 0 0 0 0 0 0 0 0 0 0 0 18...
191
192 FingerprintsVector;TopologicalPharmacophoreAtomPairs:FixedSize:MinDist
193 ance1:MaxDistance10;150;OrderedNumericalValues;IDsAndValuesString;H-D1
194 -H H-D1-HBA H-D1-HBD H-D1-NI H-D1-PI HBA-D1-HBA HBA-D1-HBD HBA-D1-NI H
195 BA-D1-PI HBD-D1-HBD HBD-D1-NI HBD-D1-PI NI-D1-NI NI-D1-PI PI-D1-PI H-D
196 2-H H-D2-HBA H-D2-HBD H-D2-NI H-D2-PI HBA-D2-HBA HBA-D2-HBD HBA-D2...;
197 18 0 0 1 0 0 0 2 0 0 1 0 0 0 0 22 12 8 0 0 1 2 0 0 0 0 0 0 0 0 18 6 3
198 1 0 0 0 1 0 0 1 0 0 0 0 22 13 6 0 0 5 7 0 0 2 0 0 0 0 0 28 9 5 1 0 0 0
199 1 0 0 1 0 0 0 0 36 16 10 0 0 3 4 0 0 1 0 0 0 0
200
201 OPTIONS
202 --AromaticityModel *MDLAromaticityModel | TriposAromaticityModel |
203 MMFFAromaticityModel | ChemAxonBasicAromaticityModel |
204 ChemAxonGeneralAromaticityModel | DaylightAromaticityModel |
205 MayaChemToolsAromaticityModel*
206 Specify aromaticity model to use during detection of aromaticity.
207 Possible values in the current release are: *MDLAromaticityModel,
208 TriposAromaticityModel, MMFFAromaticityModel,
209 ChemAxonBasicAromaticityModel, ChemAxonGeneralAromaticityModel,
210 DaylightAromaticityModel or MayaChemToolsAromaticityModel*. Default
211 value: *MayaChemToolsAromaticityModel*.
212
213 The supported aromaticity model names along with model specific
214 control parameters are defined in AromaticityModelsData.csv, which
215 is distributed with the current release and is available under
216 lib/data directory. Molecule.pm module retrieves data from this file
217 during class instantiation and makes it available to method
218 DetectAromaticity for detecting aromaticity corresponding to a
219 specific model.
220
221 --AtomPairsSetSizeToUse *ArbitrarySize | FixedSize*
222 Atom pairs set size to use during generation of topological
223 pharmacophore atom pairs fingerprints.
224
225 Possible values: *ArbitrarySize | FixedSize*; Default value:
226 *ArbitrarySize*.
227
228 For *ArbitrarySize* value of --AtomPairsSetSizeToUse option, the
229 fingerprint vector correspond to only those topological
230 pharmacophore atom pairs which are present and have non-zero count.
231 However, for *FixedSize* value of --AtomPairsSetSizeToUse option,
232 the fingerprint vector contains all possible valid topological
233 pharmacophore atom pairs with both zero and non-zero count values.
234
235 -a, --AtomTypesToUse *"AtomType1,AtomType2,..."*
236 Pharmacophore atom types to use during generation of topological
237 phramacophore atom pairs. It's a list of comma separated valid
238 pharmacophore atom types.
239
240 Possible values for pharmacophore atom types are: *Ar, CA, H, HBA,
241 HBD, Hal, NI, PI, RA*. Default value [ Ref 60-62 ] :
242 *HBD,HBA,PI,NI,H*.
243
244 The pharmacophore atom types abbreviations correspond to:
245
246 HBD: HydrogenBondDonor
247 HBA: HydrogenBondAcceptor
248 PI : PositivelyIonizable
249 NI : NegativelyIonizable
250 Ar : Aromatic
251 Hal : Halogen
252 H : Hydrophobic
253 RA : RingAtom
254 CA : ChainAtom
255
256 *AtomTypes::FunctionalClassAtomTypes* module is used to assign
257 pharmacophore atom types. It uses following definitions [ Ref 60-61,
258 Ref 65-66 ]:
259
260 HydrogenBondDonor: NH, NH2, OH
261 HydrogenBondAcceptor: N[!H], O
262 PositivelyIonizable: +, NH2
263 NegativelyIonizable: -, C(=O)OH, S(=O)OH, P(=O)OH
264
265 --AtomTypesWeight *"AtomType1,Weight1,AtomType2,Weight2..."*
266 Weights of specified pharmacophore atom types to use during
267 calculation of their contribution to atom pair count. Default value:
268 *None*. Valid values: real numbers greater than 0. In general it's
269 comma delimited list of valid atom type and its weight.
270
271 The weight values allow to increase the importance of specific
272 pharmacophore atom type in the generated fingerprints. A weight
273 value of 0 for an atom type eliminates its contribution to atom pair
274 count where as weight value of 2 doubles its contribution.
275
276 --CompoundID *DataFieldName or LabelPrefixString*
277 This value is --CompoundIDMode specific and indicates how compound
278 ID is generated.
279
280 For *DataField* value of --CompoundIDMode option, it corresponds to
281 datafield label name whose value is used as compound ID; otherwise,
282 it's a prefix string used for generating compound IDs like
283 LabelPrefixString<Number>. Default value, *Cmpd*, generates compound
284 IDs which look like Cmpd<Number>.
285
286 Examples for *DataField* value of --CompoundIDMode:
287
288 MolID
289 ExtReg
290
291 Examples for *LabelPrefix* or *MolNameOrLabelPrefix* value of
292 --CompoundIDMode:
293
294 Compound
295
296 The value specified above generates compound IDs which correspond to
297 Compound<Number> instead of default value of Cmpd<Number>.
298
299 --CompoundIDLabel *text*
300 Specify compound ID column label for CSV/TSV text file(s) used
301 during *CompoundID* value of --DataFieldsMode option. Default value:
302 *CompoundID*.
303
304 --CompoundIDMode *DataField | MolName | LabelPrefix |
305 MolNameOrLabelPrefix*
306 Specify how to generate compound IDs and write to FP or CSV/TSV text
307 file(s) along with generated fingerprints for *FP | text | all*
308 values of --output option: use a *SDFile(s)* datafield value; use
309 molname line from *SDFile(s)*; generate a sequential ID with
310 specific prefix; use combination of both MolName and LabelPrefix
311 with usage of LabelPrefix values for empty molname lines.
312
313 Possible values: *DataField | MolName | LabelPrefix |
314 MolNameOrLabelPrefix*. Default value: *LabelPrefix*.
315
316 For *MolNameAndLabelPrefix* value of --CompoundIDMode, molname line
317 in *SDFile(s)* takes precedence over sequential compound IDs
318 generated using *LabelPrefix* and only empty molname values are
319 replaced with sequential compound IDs.
320
321 This is only used for *CompoundID* value of --DataFieldsMode option.
322
323 --DataFields *"FieldLabel1,FieldLabel2,..."*
324 Comma delimited list of *SDFiles(s)* data fields to extract and
325 write to CSV/TSV text file(s) along with generated fingerprints for
326 *text | all* values of --output option.
327
328 This is only used for *Specify* value of --DataFieldsMode option.
329
330 Examples:
331
332 Extreg
333 MolID,CompoundName
334
335 -d, --DataFieldsMode *All | Common | Specify | CompoundID*
336 Specify how data fields in *SDFile(s)* are transferred to output
337 CSV/TSV text file(s) along with generated fingerprints for *text |
338 all* values of --output option: transfer all SD data field; transfer
339 SD data files common to all compounds; extract specified data
340 fields; generate a compound ID using molname line, a compound
341 prefix, or a combination of both. Possible values: *All | Common |
342 specify | CompoundID*. Default value: *CompoundID*.
343
344 -f, --Filter *Yes | No*
345 Specify whether to check and filter compound data in SDFile(s).
346 Possible values: *Yes or No*. Default value: *Yes*.
347
348 By default, compound data is checked before calculating fingerprints
349 and compounds containing atom data corresponding to non-element
350 symbols or no atom data are ignored.
351
352 --FingerprintsLabelMode *FingerprintsLabelOnly |
353 FingerprintsLabelWithIDs*
354 Specify how fingerprints label is generated in conjunction with
355 --FingerprintsLabel option value: use fingerprints label generated
356 only by --FingerprintsLabel option value or append topological atom
357 pair count value IDs to --FingerprintsLabel option value.
358
359 Possible values: *FingerprintsLabelOnly | FingerprintsLabelWithIDs*.
360 Default value: *FingerprintsLabelOnly*.
361
362 Topological atom pairs IDs appended to --FingerprintsLabel value
363 during *FingerprintsLabelWithIDs* values of --FingerprintsLabelMode
364 correspond to atom pair count values in fingerprint vector string.
365
366 *FingerprintsLabelWithIDs* value of --FingerprintsLabelMode is
367 ignored during *ArbitrarySize* value of --AtomPairsSetSizeToUse
368 option and topological atom pairs IDs not appended to the label.
369
370 --FingerprintsLabel *text*
371 SD data label or text file column label to use for fingerprints
372 string in output SD or CSV/TSV text file(s) specified by --output.
373 Default value: *TopologicalPharmacophoreAtomPairsFingerprints*.
374
375 --FuzzifyAtomPairsCount *Yes | No*
376 To fuzzify or not to fuzzify atom pairs count. Possible values: *Yes
377 or No*. Default value: *No*.
378
379 --FuzzificationMode *BeforeNormalization | AfterNormalization*
380 When to fuzzify atom pairs count. Possible values:
381 *BeforeNormalization | AfterNormalizationYes*. Default value:
382 *AfterNormalization*.
383
384 --FuzzificationMethodology *FuzzyBinning | FuzzyBinSmoothing*
385 How to fuzzify atom pairs count. Possible values: *FuzzyBinning |
386 FuzzyBinSmoothing*. Default value: *FuzzyBinning*.
387
388 In conjunction with values for options --FuzzifyAtomPairsCount,
389 --FuzzificationMode and --FuzzFactor, --FuzzificationMethodology
390 option is used to fuzzify pharmacophore atom pairs count.
391
392 Let:
393
394 Px = Pharmacophore atom type x
395 Py = Pharmacophore atom type y
396 PPxy = Pharmacophore atom pair between atom type Px and Py
397
398 PPxyDn = Pharmacophore atom pairs count between atom type Px and Py
399 at distance Dn
400 PPxyDn-1 = Pharmacophore atom pairs count between atom type Px and Py
401 at distance Dn - 1
402 PPxyDn+1 = Pharmacophore atom pairs count between atom type Px and Py
403 at distance Dn + 1
404
405 FF = FuzzFactor for FuzzyBinning and FuzzyBinSmoothing
406
407 Then:
408
409 For *FuzzyBinning*:
410
411 PPxyDn = PPxyDn (Unchanged)
412
413 PPxyDn-1 = PPxyDn-1 + PPxyDn * FF
414 PPxyDn+1 = PPxyDn+1 + PPxyDn * FF
415
416 For *FuzzyBinSmoothing*:
417
418 PPxyDn = PPxyDn - PPxyDn * 2FF for Dmin < Dn < Dmax
419 PPxyDn = PPxyDn - PPxyDn * FF for Dn = Dmin or Dmax
420
421 PPxyDn-1 = PPxyDn-1 + PPxyDn * FF
422 PPxyDn+1 = PPxyDn+1 + PPxyDn * FF
423
424 In both fuzzification schemes, a value of 0 for FF implies no
425 fuzzification of occurrence counts. A value of 1 during
426 *FuzzyBinning* corresponds to maximum fuzzification of occurrence
427 counts; however, a value of 1 during *FuzzyBinSmoothing* ends up
428 completely distributing the value over the previous and next
429 distance bins.
430
431 So for default value of --FuzzFactor (FF) 0.15, the occurrence count
432 of pharmacohore atom pairs at distance Dn during FuzzyBinning is
433 left unchanged and the counts at distances Dn -1 and Dn + 1 are
434 incremented by PPxyDn * 0.15.
435
436 And during *FuzzyBinSmoothing* the occurrence counts at Distance Dn
437 is scaled back using multiplicative factor of (1 - 2*0.15) and the
438 occurrence counts at distances Dn -1 and Dn + 1 are incremented by
439 PPxyDn * 0.15. In otherwords, occurrence bin count is smoothed out
440 by distributing it over the previous and next distance value.
441
442 --FuzzFactor *number*
443 Specify by how much to fuzzify atom pairs count. Default value:
444 *0.15*. Valid values: For *FuzzyBinning* value of
445 --FuzzificationMethodology option: *between 0 and 1.0*; For
446 *FuzzyBinSmoothing* value of --FuzzificationMethodology option:
447 *between 0 and 0.5*.
448
449 -h, --help
450 Print this help message.
451
452 -k, --KeepLargestComponent *Yes | No*
453 Generate fingerprints for only the largest component in molecule.
454 Possible values: *Yes or No*. Default value: *Yes*.
455
456 For molecules containing multiple connected components, fingerprints
457 can be generated in two different ways: use all connected components
458 or just the largest connected component. By default, all atoms
459 except for the largest connected component are deleted before
460 generation of fingerprints.
461
462 --MinDistance *number*
463 Minimum bond distance between atom pairs for generating topological
464 pharmacophore atom pairs. Default value: *1*. Valid values: positive
465 integers including 0 and less than --MaxDistance.
466
467 --MaxDistance *number*
468 Maximum bond distance between atom pairs for generating topological
469 pharmacophore atom pairs. Default value: *10*. Valid values:
470 positive integers and greater than --MinDistance.
471
472 -n, --NormalizationMethodology *None | ByHeavyAtomsCount |
473 ByAtomTypesCount*
474 Normalization methodology to use for scaling the occurrence count of
475 pharmacophore atom pairs within specified distance range. Possible
476 values: *None, ByHeavyAtomsCount or ByAtomTypesCount*. Default
477 value: *None*.
478
479 --OutDelim *comma | tab | semicolon*
480 Delimiter for output CSV/TSV text file(s). Possible values: *comma,
481 tab, or semicolon* Default value: *comma*.
482
483 --output *SD | FP | text | all*
484 Type of output files to generate. Possible values: *SD, FP, text, or
485 all*. Default value: *text*.
486
487 -o, --overwrite
488 Overwrite existing files.
489
490 -q, --quote *Yes | No*
491 Put quote around column values in output CSV/TSV text file(s).
492 Possible values: *Yes or No*. Default value: *Yes*
493
494 -r, --root *RootName*
495 New file name is generated using the root: <Root>.<Ext>. Default for
496 new file names:
497 <SDFileName><TopologicalPharmacophoreAtomPairsFP>.<Ext>. The file
498 type determines <Ext> value. The sdf, fpf, csv, and tsv <Ext> values
499 are used for SD, FP, comma/semicolon, and tab delimited text files,
500 respectively.This option is ignored for multiple input files.
501
502 --ValuesPrecision *number*
503 Precision of atom pairs count real values which might be generated
504 after normalization or fuzzification. Default value: up to *2*
505 decimal places. Valid values: positive integers.
506
507 -v, --VectorStringFormat *ValuesString, IDsAndValuesString |
508 IDsAndValuesPairsString | ValuesAndIDsString | ValuesAndIDsPairsString*
509 Format of fingerprints vector string data in output SD, FP or
510 CSV/TSV text file(s) specified by --output option. Possible values:
511 *ValuesString, IDsAndValuesString | IDsAndValuesPairsString |
512 ValuesAndIDsString | ValuesAndIDsPairsString*.
513
514 Default value during *FixedSize* value of --AtomPairsSetSizeToUse
515 option: *ValuesString*. Default value during *ArbitrarySize* value
516 of --AtomPairsSetSizeToUse option: *IDsAndValuesString*.
517
518 *ValuesString* option value is not allowed for *ArbitrarySize* value
519 of --AtomPairsSetSizeToUse option.
520
521 Examples:
522
523 FingerprintsVector;TopologicalPharmacophoreAtomPairs:ArbitrarySize:Min
524 Distance1:MaxDistance10;54;NumericalValues;IDsAndValuesString;H-D1-H H
525 -D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA H-D2-HBD HBA-D2-HBA HBA-D2-
526 HBD H-D3-H H-D3-HBA H-D3-HBD H-D3-NI HBA-D3-NI HBD-D3-NI H-D4-H H-D4-H
527 BA H-D4-HBD HBA-D4-HBA HBA-D4-HBD HBD-D4-HBD H-D5-H H-D5-HBA H-D5-...;
528 18 1 2 1 22 12 8 1 2 18 6 3 1 1 1 22 13 6 5 7 2 28 9 5 1 1 1 36 16 10
529 3 4 1 37 10 8 1 35 10 9 3 3 1 28 7 7 4 18 16 12 5 1 2 1
530
531 FingerprintsVector;TopologicalPharmacophoreAtomPairs:FixedSize:MinDist
532 ance1:MaxDistance10;150;OrderedNumericalValues;ValuesString;18 0 0 1 0
533 0 0 2 0 0 1 0 0 0 0 22 12 8 0 0 1 2 0 0 0 0 0 0 0 0 18 6 3 1 0 0 0 1
534 0 0 1 0 0 0 0 22 13 6 0 0 5 7 0 0 2 0 0 0 0 0 28 9 5 1 0 0 0 1 0 0 1 0
535 0 0 0 36 16 10 0 0 3 4 0 0 1 0 0 0 0 0 37 10 8 0 0 0 0 1 0 0 0 0 0 0
536 0 35 10 9 0 0 3 3 0 0 1 0 0 0 0 0 28 7 7 4 0 0 0 0 0 0 0 0 0 0 0 18...
537
538 FingerprintsVector;TopologicalPharmacophoreAtomPairs:FixedSize:MinDist
539 ance1:MaxDistance10;150;OrderedNumericalValues;IDsAndValuesString;H-D1
540 -H H-D1-HBA H-D1-HBD H-D1-NI H-D1-PI HBA-D1-HBA HBA-D1-HBD HBA-D1-NI H
541 BA-D1-PI HBD-D1-HBD HBD-D1-NI HBD-D1-PI NI-D1-NI NI-D1-PI PI-D1-PI H-D
542 2-H H-D2-HBA H-D2-HBD H-D2-NI H-D2-PI HBA-D2-HBA HBA-D2-HBD HBA-D2...;
543 18 0 0 1 0 0 0 2 0 0 1 0 0 0 0 22 12 8 0 0 1 2 0 0 0 0 0 0 0 0 18 6 3
544 1 0 0 0 1 0 0 1 0 0 0 0 22 13 6 0 0 5 7 0 0 2 0 0 0 0 0 28 9 5 1 0 0 0
545 1 0 0 1 0 0 0 0 36 16 10 0 0 3 4 0 0 1 0 0 0 0
546
547 -w, --WorkingDir *DirName*
548 Location of working directory. Default value: current directory.
549
550 EXAMPLES
551 To generate topological pharmacophore atom pairs fingerprints of
552 arbitrary size corresponding to distances from 1 through 10 using
553 default atom types with no weighting, normalization, and fuzzification
554 of atom pairs count and create a SampleTPAPFP.csv file containing
555 sequential compound IDs along with fingerprints vector strings data in
556 ValuesString format, type:
557
558 % TopologicalPharmacophoreAtomPairsFingerprints.pl -r SampleTPAPFP
559 -o Sample.sdf
560
561 To generate topological pharmacophore atom pairs fingerprints of fixed
562 size corresponding to distances from 1 through 10 using default atom
563 types with no weighting, normalization, and fuzzification of atom pairs
564 count and create a SampleTPAPFP.csv file containing sequential compound
565 IDs along with fingerprints vector strings data in ValuesString format,
566 type:
567
568 % TopologicalPharmacophoreAtomPairsFingerprints.pl
569 --AtomPairsSetSizeToUse FixedSize -r SampleTPAPFP-o Sample.sdf
570
571 To generate topological pharmacophore atom pairs fingerprints of
572 arbitrary size corresponding to distances from 1 through 10 using
573 default atom types with no weighting, normalization, and fuzzification
574 of atom pairs count and create SampleTPAPFP.sdf, SampleTPAPFP.fpf and
575 SampleTPAPFP.csv files containing sequential compound IDs in CSV file
576 along with fingerprints vector strings data in ValuesString format,
577 type:
578
579 % TopologicalPharmacophoreAtomPairsFingerprints.pl --output all
580 -r SampleTPAPFP -o Sample.sdf
581
582 To generate topological pharmacophore atom pairs fingerprints of
583 arbitrary size corresponding to distances from 1 through 10 using
584 default atom types with no weighting, normalization, and fuzzification
585 of atom pairs count and create a SampleTPAPFP.csv file containing
586 sequential compound IDs along with fingerprints vector strings data in
587 IDsAndValuesPairsString format, type:
588
589 % TopologicalPharmacophoreAtomPairsFingerprints.pl --VectorStringFormat
590 IDsAndValuesPairsString -r SampleTPAPFP -o Sample.sdf
591
592 To generate topological pharmacophore atom pairs fingerprints of
593 arbitrary size corresponding to distances from 1 through 6 using default
594 atom types with no weighting, normalization, and fuzzification of atom
595 pairs count and create a SampleTPAPFP.csv file containing sequential
596 compound IDs along with fingerprints vector strings data in ValuesString
597 format, type:
598
599 % TopologicalPharmacophoreAtomPairsFingerprints.pl --MinDistance 1
600 -MaxDistance 6 -r SampleTPAPFP -o Sample.sdf
601
602 To generate topological pharmacophore atom pairs fingerprints of
603 arbitrary size corresponding to distances from 1 through 10 using
604 "HBD,HBA,PI,NI" atom types with double the weighting for "HBD,HBA" and
605 normalization by HeavyAtomCount but no fuzzification of atom pairs count
606 and create a SampleTPAPFP.csv file containing sequential compound IDs
607 along with fingerprints vector strings data in ValuesString format,
608 type:
609
610 % TopologicalPharmacophoreAtomPairsFingerprints.pl --MinDistance 1
611 -MaxDistance 10 --AtomTypesToUse "HBD,HBA,PI, NI" --AtomTypesWeight
612 "HBD,2,HBA,2,PI,1,NI,1" --NormalizationMethodology ByHeavyAtomsCount
613 --FuzzifyAtomPairsCount No -r SampleTPAPFP -o Sample.sdf
614
615 To generate topological pharmacophore atom pairs fingerprints of
616 arbitrary size corresponding to distances from 1 through 10 using
617 "HBD,HBA,PI,NI,H" atom types with no weighting of atom types and
618 normalization but with fuzzification of atom pairs count using
619 FuzzyBinning methodology with FuzzFactor value 0.15 and create a
620 SampleTPAPFP.csv file containing sequential compound IDs along with
621 fingerprints vector strings data in ValuesString format, type:
622
623 % TopologicalPharmacophoreAtomPairsFingerprints.pl --MinDistance 1
624 --MaxDistance 10 --AtomTypesToUse "HBD,HBA,PI, NI,H" --AtomTypesWeight
625 "HBD,1,HBA,1,PI,1,NI,1,H,1" --NormalizationMethodology None
626 --FuzzifyAtomPairsCount Yes --FuzzificationMethodology FuzzyBinning
627 --FuzzFactor 0.5 -r SampleTPAPFP -o Sample.sdf
628
629 To generate topological pharmacophore atom pairs fingerprints of
630 arbitrary size corresponding to distances distances from 1 through 10
631 using default atom types with no weighting, normalization, and
632 fuzzification of atom pairs count and create a SampleTPAPFP.csv file
633 containing compound ID from molecule name line along with fingerprints
634 vector strings data, type:
635
636 % TopologicalPharmacophoreAtomPairsFingerprints.pl --DataFieldsMode
637 CompoundID -CompoundIDMode MolName -r SampleTPAPFP -o Sample.sdf
638
639 To generate topological pharmacophore atom pairs fingerprints of
640 arbitrary size corresponding to distances from 1 through 10 using
641 default atom types with no weighting, normalization, and fuzzification
642 of atom pairs count and create a SampleTPAPFP.csv file containing
643 compound IDs using specified data field along with fingerprints vector
644 strings data, type:
645
646 % TopologicalPharmacophoreAtomPairsFingerprints.pl --DataFieldsMode
647 CompoundID -CompoundIDMode DataField --CompoundID Mol_ID
648 -r SampleTPAPFP -o Sample.sdf
649
650 To generate topological pharmacophore atom pairs fingerprints of
651 arbitrary size corresponding to distances from 1 through 10 using
652 default atom types with no weighting, normalization, and fuzzification
653 of atom pairs count and create a SampleTPAPFP.csv file containing
654 compound ID using combination of molecule name line and an explicit
655 compound prefix along with fingerprints vector strings data, type:
656
657 % TopologicalPharmacophoreAtomPairsFingerprints.pl --DataFieldsMode
658 CompoundID -CompoundIDMode MolnameOrLabelPrefix
659 --CompoundID Cmpd --CompoundIDLabel MolID -r SampleTPAPFP -o Sample.sdf
660
661 To generate topological pharmacophore atom pairs fingerprints of
662 arbitrary size corresponding to distances from 1 through 10 using
663 default atom types with no weighting, normalization, and fuzzification
664 of atom pairs count and create a SampleTPAPFP.csv file containing
665 specific data fields columns along with fingerprints vector strings
666 data, type:
667
668 % TopologicalPharmacophoreAtomPairsFingerprints.pl --DataFieldsMode
669 Specify --DataFields Mol_ID -r SampleTPAPFP -o Sample.sdf
670
671 To generate topological pharmacophore atom pairs fingerprints of
672 arbitrary size corresponding to distances from 1 through 10 using
673 default atom types with no weighting, normalization, and fuzzification
674 of atom pairs count and create a SampleTPAPFP.csv file containing common
675 data fields columns along with fingerprints vector strings data, type:
676
677 % TopologicalPharmacophoreAtomPairsFingerprints.pl --DataFieldsMode
678 Common -r SampleTPAPFP -o Sample.sdf
679
680 To generate topological pharmacophore atom pairs fingerprints of
681 arbitrary size corresponding to distances from 1 through 10 using
682 default atom types with no weighting, normalization, and fuzzification
683 of atom pairs count and create SampleTPAPFP.sdf, SampleTPAPFP.fpf, and
684 SampleTPAPFP.csv files containing all data fields columns in CSV file
685 along with fingerprints data, type:
686
687 % TopologicalPharmacophoreAtomPairsFingerprints.pl --DataFieldsMode
688 All --output all -r SampleTPAPFP -o Sample.sdf
689
690 AUTHOR
691 Manish Sud <msud@san.rr.com>
692
693 SEE ALSO
694 InfoFingerprintsFiles.pl, SimilarityMatricesFingerprints.pl,
695 AtomNeighborhoodsFingerprints.pl, ExtendedConnectivityFingerprints.pl,
696 MACCSKeysFingerprints.pl, PathLengthFingerprints.pl,
697 TopologicalAtomPairsFingerprints.pl,
698 TopologicalAtomTorsionsFingerprints.pl,
699 TopologicalPharmacophoreAtomTripletsFingerprints.pl
700
701 COPYRIGHT
702 Copyright (C) 2015 Manish Sud. All rights reserved.
703
704 This file is part of MayaChemTools.
705
706 MayaChemTools is free software; you can redistribute it and/or modify it
707 under the terms of the GNU Lesser General Public License as published by
708 the Free Software Foundation; either version 3 of the License, or (at
709 your option) any later version.
710