0
|
1 NAME
|
|
2 TopologicalPharmacophoreAtomTripletsFingerprints.pl - Generate
|
|
3 topological pharmacophore atom triplets fingerprints for SD files
|
|
4
|
|
5 SYNOPSIS
|
|
6 TopologicalPharmacophoreAtomTripletsFingerprints.pl SDFile(s)...
|
|
7
|
|
8 TopologicalPharmacophoreAtomTripletsFingerprints.pl [--AromaticityModel
|
|
9 *AromaticityModelType*] [--AtomTripletsSetSizeToUse *ArbitrarySize |
|
|
10 FixedSize*] [-a, --AtomTypesToUse *"AtomType1, AtomType2..."*]
|
|
11 [--AtomTypesWeight *"AtomType1, Weight1, AtomType2, Weight2..."*]
|
|
12 [--CompoundID *DataFieldName or LabelPrefixString*] [--CompoundIDLabel
|
|
13 *text*] [--CompoundIDMode] [--DataFields *"FieldLabel1,
|
|
14 FieldLabel2,..."*] [-d, --DataFieldsMode *All | Common | Specify |
|
|
15 CompoundID*] [--DistanceBinSize *number*] [-f, --Filter *Yes | No*]
|
|
16 [--FingerprintsLabelMode *FingerprintsLabelOnly |
|
|
17 FingerprintsLabelWithIDs*] [--FingerprintsLabel *text*] [-h, --help]
|
|
18 [-k, --KeepLargestComponent *Yes | No*] [--MinDistance *number*]
|
|
19 [--MaxDistance *number*] [--OutDelim *comma | tab | semicolon*]
|
|
20 [--output *SD | FP | text | all*] [-o, --overwrite] [-q, --quote *Yes |
|
|
21 No*] [-r, --root *RootName*] [-u, --UseTriangleInequality *Yes | No*]
|
|
22 [-v, --VectorStringFormat *ValuesString, IDsAndValuesString |
|
|
23 IDsAndValuesPairsString | ValuesAndIDsString | ValuesAndIDsPairsString*]
|
|
24 [-w, --WorkingDir dirname] SDFile(s)...
|
|
25
|
|
26 DESCRIPTION
|
|
27 Generate topological pharmacophore atom triplets fingerprints [ Ref 66,
|
|
28 Ref 68-71 ] for *SDFile(s)* and create appropriate SD, FP or CSV/TSV
|
|
29 text file(s) containing fingerprints vector strings corresponding to
|
|
30 molecular fingerprints.
|
|
31
|
|
32 Multiple SDFile names are separated by spaces. The valid file extensions
|
|
33 are *.sdf* and *.sd*. All other file names are ignored. All the SD files
|
|
34 in a current directory can be specified either by **.sdf* or the current
|
|
35 directory name.
|
|
36
|
|
37 Based on the values specified for --AtomTypesToUse, pharmacophore atom
|
|
38 types are assigned to all non-hydrogen atoms in a molecule and a
|
|
39 distance matrix is generated. Using --MinDistance, --MaxDistance, and
|
|
40 --DistanceBinSize values, a binned distance matrix is generated with
|
|
41 lower bound on the distance bin as the distance in distance matrix; the
|
|
42 lower bound on the distance bin is also used as the distance between
|
|
43 atom pairs for generation of atom triplet identifiers.
|
|
44
|
|
45 A pharmacophore atom triplets basis set is generated for all unique atom
|
|
46 triplets constituting atom pairs binned distances between --MinDistance
|
|
47 and --MaxDistance. The value of --UseTriangleInequality determines
|
|
48 whether the triangle inequality test is applied during generation of
|
|
49 atom triplets basis set. The lower distance bound, along with specified
|
|
50 pharmacophore types, is used during generation of atom triplet IDs.
|
|
51
|
|
52 Let:
|
|
53
|
|
54 P = Valid pharmacophore atom type
|
|
55
|
|
56 Px = Pharmacophore atom x
|
|
57 Py = Pharmacophore atom y
|
|
58 Pz = Pharmacophore atom z
|
|
59
|
|
60 Dmin = Minimum distance corresponding to number of bonds between two atoms
|
|
61 Dmax = Maximum distance corresponding to number of bonds between two atoms
|
|
62 D = Distance corresponding to number of bonds between two atom
|
|
63
|
|
64 Bsize = Distance bin size
|
|
65 Nbins = Number of distance bins
|
|
66
|
|
67 Dxy = Distance or lower bound of binned distance between Px and Py
|
|
68 Dxz = Distance or lower bound of binned distance between Px and Pz
|
|
69 Dyz = Distance or lower bound of binned distance between Py and Pz
|
|
70
|
|
71 Then:
|
|
72
|
|
73 PxDyz-PyDxz-PzDxy = Pharmacophore atom triplet IDs for atom types Px,
|
|
74 Py, and Pz
|
|
75
|
|
76 For example: H1-H1-H1, H2-HBA-H2 and so on
|
|
77
|
|
78 For default values of Dmin = 1 , Dmax = 10 and Bsize = 2:
|
|
79
|
|
80 the number of distance bins, Nbins = 5, are:
|
|
81
|
|
82 [1, 2] [3, 4] [5, 6] [7, 8] [9 10]
|
|
83
|
|
84 and atom triplet basis set size is 2692.
|
|
85
|
|
86 Atom triplet basis set size for various values of Dmin, Dmax and Bsize in
|
|
87 conjunction with usage of triangle inequality is:
|
|
88
|
|
89 Dmin Dmax Bsize UseTriangleInequality TripletBasisSetSize
|
|
90 1 10 2 No 4960
|
|
91 1 10 2 Yes 2692 [ Default ]
|
|
92 2 12 2 No 8436
|
|
93 2 12 2 Yes 4494
|
|
94
|
|
95 Using binned distance matrix and pharmacohore atom types, occurrence of
|
|
96 unique pharmacohore atom triplets is counted.
|
|
97
|
|
98 The final pharmacophore atom triples count along with atom pair
|
|
99 identifiers involving all non-hydrogen atoms constitute pharmacophore
|
|
100 topological atom triplets fingerprints of the molecule.
|
|
101
|
|
102 For *ArbitrarySize* value of --AtomTripletsSetSizeToUse option, the
|
|
103 fingerprint vector correspond to only those topological pharmacophore
|
|
104 atom triplets which are present and have non-zero count. However, for
|
|
105 *FixedSize* value of --AtomTripletsSetSizeToUse option, the fingerprint
|
|
106 vector contains all possible valid topological pharmacophore atom
|
|
107 triplets with both zero and non-zero count values.
|
|
108
|
|
109 Example of *SD* file containing topological pharmacophore atom triplets
|
|
110 fingerprints string data:
|
|
111
|
|
112 ... ...
|
|
113 ... ...
|
|
114 $$$$
|
|
115 ... ...
|
|
116 ... ...
|
|
117 ... ...
|
|
118 41 44 0 0 0 0 0 0 0 0999 V2000
|
|
119 -3.3652 1.4499 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
120 ... ...
|
|
121 2 3 1 0 0 0 0
|
|
122 ... ...
|
|
123 M END
|
|
124 > <CmpdID>
|
|
125 Cmpd1
|
|
126
|
|
127 > <TopologicalPharmacophoreAtomTripletsFingerprints>
|
|
128 FingerprintsVector;TopologicalPharmacophoreAtomTriplets:ArbitrarySize:
|
|
129 MinDistance1:MaxDistance10;696;NumericalValues;IDsAndValuesString;Ar1-
|
|
130 Ar1-Ar1 Ar1-Ar1-H1 Ar1-Ar1-HBA1 Ar1-Ar1-HBD1 Ar1-H1-H1 Ar1-H1-HBA1 Ar1
|
|
131 -H1-HBD1 Ar1-HBA1-HBD1 H1-H1-H1 H1-H1-HBA1 H1-H1-HBD1 H1-HBA1-HBA1 H1-
|
|
132 HBA1-HBD1 H1-HBA1-NI1 H1-HBD1-NI1 HBA1-HBA1-NI1 HBA1-HBD1-NI1 Ar1-...;
|
|
133 46 106 8 3 83 11 4 1 21 5 3 1 2 2 1 1 1 100 101 18 11 145 132 26 14 23
|
|
134 28 3 3 5 4 61 45 10 4 16 20 7 5 1 3 4 5 3 1 1 1 1 5 4 2 1 2 2 2 1 1 1
|
|
135 119 123 24 15 185 202 41 25 22 17 3 5 85 95 18 11 23 17 3 1 1 6 4 ...
|
|
136
|
|
137 $$$$
|
|
138 ... ...
|
|
139 ... ...
|
|
140
|
|
141 Example of *FP* file containing topological pharmacophore atom triplets
|
|
142 fingerprints string data:
|
|
143
|
|
144 #
|
|
145 # Package = MayaChemTools 7.4
|
|
146 # Release Date = Oct 21, 2010
|
|
147 #
|
|
148 # TimeStamp = Fri Mar 11 15:38:58 2011
|
|
149 #
|
|
150 # FingerprintsStringType = FingerprintsVector
|
|
151 #
|
|
152 # Description = TopologicalPharmacophoreAtomTriplets:ArbitrarySize:M...
|
|
153 # VectorStringFormat = IDsAndValuesString
|
|
154 # VectorValuesType = NumericalValues
|
|
155 #
|
|
156 Cmpd1 696;Ar1-Ar1-Ar1 Ar1-Ar1-H1 Ar1-Ar1-HBA1 Ar1-Ar1-HBD1...;;46 106...
|
|
157 Cmpd2 251;H1-H1-H1 H1-H1-HBA1 H1-H1-HBD1 H1-H1-NI1...;4 1 3 1 1 2 2...
|
|
158 ... ...
|
|
159 ... ..
|
|
160
|
|
161 Example of CSV *Text* file containing topological pharmacophore atom
|
|
162 triplets fingerprints string data:
|
|
163
|
|
164 "CompoundID","TopologicalPharmacophoreAtomTripletsFingerprints"
|
|
165 "Cmpd1","FingerprintsVector;TopologicalPharmacophoreAtomTriplets:Arbitr
|
|
166 arySize:MinDistance1:MaxDistance10;696;NumericalValues;IDsAndValuesStri
|
|
167 ng;Ar1-Ar1-Ar1 Ar1-Ar1-H1 Ar1-Ar1-HBA1 Ar1-Ar1-HBD1 Ar1-H1-H1 Ar1-H1-HB
|
|
168 A1 Ar1-H1-HBD1 Ar1-HBA1-HBD1 H1-H1-H1 H1-H1-HBA1 H1-H1-HBD1 H1-HBA1-HBA
|
|
169 1 H1-HBA1-HBD1 H1-HBA1-NI1 H1-HBD1-NI1 HBA1-HBA1-NI1 HBA1-HBD1-NI1 A...;
|
|
170 46 106 8 3 83 11 4 1 21 5 3 1 2 2 1 1 1 100 101 18 11 145 132 26 14 23
|
|
171 28 3 3 5 4 61 45 10 4 16 20 7 5 1 3 4 5 3 1 1 1 1 5 4 2 1 2 2 2 1 1 1
|
|
172 119 123 24 15 185 202 41 25 22 17 3 5 85 95 18 11 23 17 3 1 1 6 4 ...
|
|
173 ... ...
|
|
174 ... ...
|
|
175
|
|
176 The current release of MayaChemTools generates the following types of
|
|
177 topological pharmacophore atom triplets fingerprints vector strings:
|
|
178
|
|
179 FingerprintsVector;TopologicalPharmacophoreAtomTriplets:ArbitrarySize:
|
|
180 MinDistance1:MaxDistance10;696;NumericalValues;IDsAndValuesString;Ar1-
|
|
181 Ar1-Ar1 Ar1-Ar1-H1 Ar1-Ar1-HBA1 Ar1-Ar1-HBD1 Ar1-H1-H1 Ar1-H1-HBA1 Ar1
|
|
182 -H1-HBD1 Ar1-HBA1-HBD1 H1-H1-H1 H1-H1-HBA1 H1-H1-HBD1 H1-HBA1-HBA1 H1-
|
|
183 HBA1-HBD1 H1-HBA1-NI1 H1-HBD1-NI1 HBA1-HBA1-NI1 HBA1-HBD1-NI1 Ar1-...;
|
|
184 46 106 8 3 83 11 4 1 21 5 3 1 2 2 1 1 1 100 101 18 11 145 132 26 14 23
|
|
185 28 3 3 5 4 61 45 10 4 16 20 7 5 1 3 4 5 3 1 1 1 1 5 4 2 1 2 2 2 1 1 1
|
|
186 119 123 24 15 185 202 41 25 22 17 3 5 85 95 18 11 23 17 3 1 1 6 4 ...
|
|
187
|
|
188 FingerprintsVector;TopologicalPharmacophoreAtomTriplets:FixedSize:MinD
|
|
189 istance1:MaxDistance10;2692;OrderedNumericalValues;ValuesString;46 106
|
|
190 8 3 0 0 83 11 4 0 0 0 1 0 0 0 0 0 0 0 0 21 5 3 0 0 1 2 2 0 0 1 0 0 0
|
|
191 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 101 18 11 0 0 145 132 26
|
|
192 14 0 0 23 28 3 3 0 0 5 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 61 45 10 4 0
|
|
193 0 16 20 7 5 1 0 3 4 5 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 5 ...
|
|
194
|
|
195 FingerprintsVector;TopologicalPharmacophoreAtomTriplets:FixedSize:MinD
|
|
196 istance1:MaxDistance10;2692;OrderedNumericalValues;IDsAndValuesString;
|
|
197 Ar1-Ar1-Ar1 Ar1-Ar1-H1 Ar1-Ar1-HBA1 Ar1-Ar1-HBD1 Ar1-Ar1-NI1 Ar1-Ar1-P
|
|
198 I1 Ar1-H1-H1 Ar1-H1-HBA1 Ar1-H1-HBD1 Ar1-H1-NI1 Ar1-H1-PI1 Ar1-HBA1-HB
|
|
199 A1 Ar1-HBA1-HBD1 Ar1-HBA1-NI1 Ar1-HBA1-PI1 Ar1-HBD1-HBD1 Ar1-HBD1-...;
|
|
200 46 106 8 3 0 0 83 11 4 0 0 0 1 0 0 0 0 0 0 0 0 21 5 3 0 0 1 2 2 0 0 1
|
|
201 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 101 18 11 0 0 145
|
|
202 132 26 14 0 0 23 28 3 3 0 0 5 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 61 ...
|
|
203
|
|
204 OPTIONS
|
|
205 --AromaticityModel *MDLAromaticityModel | TriposAromaticityModel |
|
|
206 MMFFAromaticityModel | ChemAxonBasicAromaticityModel |
|
|
207 ChemAxonGeneralAromaticityModel | DaylightAromaticityModel |
|
|
208 MayaChemToolsAromaticityModel*
|
|
209 Specify aromaticity model to use during detection of aromaticity.
|
|
210 Possible values in the current release are: *MDLAromaticityModel,
|
|
211 TriposAromaticityModel, MMFFAromaticityModel,
|
|
212 ChemAxonBasicAromaticityModel, ChemAxonGeneralAromaticityModel,
|
|
213 DaylightAromaticityModel or MayaChemToolsAromaticityModel*. Default
|
|
214 value: *MayaChemToolsAromaticityModel*.
|
|
215
|
|
216 The supported aromaticity model names along with model specific
|
|
217 control parameters are defined in AromaticityModelsData.csv, which
|
|
218 is distributed with the current release and is available under
|
|
219 lib/data directory. Molecule.pm module retrieves data from this file
|
|
220 during class instantiation and makes it available to method
|
|
221 DetectAromaticity for detecting aromaticity corresponding to a
|
|
222 specific model.
|
|
223
|
|
224 --AtomTripletsSetSizeToUse *ArbitrarySize | FixedSize*
|
|
225 Atom triplets set size to use during generation of topological
|
|
226 pharmacophore atom triplets fingerprints.
|
|
227
|
|
228 Possible values: *ArbitrarySize | FixedSize*; Default value:
|
|
229 *ArbitrarySize*.
|
|
230
|
|
231 For *ArbitrarySize* value of --AtomTripletsSetSizeToUse option, the
|
|
232 fingerprint vector correspond to only those topological
|
|
233 pharmacophore atom triplets which are present and have non-zero
|
|
234 count. However, for *FixedSize* value of --AtomTripletsSetSizeToUse
|
|
235 option, the fingerprint vector contains all possible valid
|
|
236 topological pharmacophore atom triplets with both zero and non-zero
|
|
237 count values.
|
|
238
|
|
239 -a, --AtomTypesToUse *"AtomType1,AtomType2,..."*
|
|
240 Pharmacophore atom types to use during generation of topological
|
|
241 phramacophore atom triplets. It's a list of comma separated valid
|
|
242 pharmacophore atom types.
|
|
243
|
|
244 Possible values for pharmacophore atom types are: *Ar, CA, H, HBA,
|
|
245 HBD, Hal, NI, PI, RA*. Default value [ Ref 71 ] :
|
|
246 *HBD,HBA,PI,NI,H,Ar*.
|
|
247
|
|
248 The pharmacophore atom types abbreviations correspond to:
|
|
249
|
|
250 HBD: HydrogenBondDonor
|
|
251 HBA: HydrogenBondAcceptor
|
|
252 PI : PositivelyIonizable
|
|
253 NI : NegativelyIonizable
|
|
254 Ar : Aromatic
|
|
255 Hal : Halogen
|
|
256 H : Hydrophobic
|
|
257 RA : RingAtom
|
|
258 CA : ChainAtom
|
|
259
|
|
260 *AtomTypes::FunctionalClassAtomTypes* module is used to assign
|
|
261 pharmacophore atom types. It uses following definitions [ Ref 60-61,
|
|
262 Ref 65-66 ]:
|
|
263
|
|
264 HydrogenBondDonor: NH, NH2, OH
|
|
265 HydrogenBondAcceptor: N[!H], O
|
|
266 PositivelyIonizable: +, NH2
|
|
267 NegativelyIonizable: -, C(=O)OH, S(=O)OH, P(=O)OH
|
|
268
|
|
269 --CompoundID *DataFieldName or LabelPrefixString*
|
|
270 This value is --CompoundIDMode specific and indicates how compound
|
|
271 ID is generated.
|
|
272
|
|
273 For *DataField* value of --CompoundIDMode option, it corresponds to
|
|
274 datafield label name whose value is used as compound ID; otherwise,
|
|
275 it's a prefix string used for generating compound IDs like
|
|
276 LabelPrefixString<Number>. Default value, *Cmpd*, generates compound
|
|
277 IDs which look like Cmpd<Number>.
|
|
278
|
|
279 Examples for *DataField* value of --CompoundIDMode:
|
|
280
|
|
281 MolID
|
|
282 ExtReg
|
|
283
|
|
284 Examples for *LabelPrefix* or *MolNameOrLabelPrefix* value of
|
|
285 --CompoundIDMode:
|
|
286
|
|
287 Compound
|
|
288
|
|
289 The value specified above generates compound IDs which correspond to
|
|
290 Compound<Number> instead of default value of Cmpd<Number>.
|
|
291
|
|
292 --CompoundIDLabel *text*
|
|
293 Specify compound ID column label for CSV/TSV text file(s) used
|
|
294 during *CompoundID* value of --DataFieldsMode option. Default value:
|
|
295 *CompoundID*.
|
|
296
|
|
297 --CompoundIDMode *DataField | MolName | LabelPrefix |
|
|
298 MolNameOrLabelPrefix*
|
|
299 Specify how to generate compound IDs and write to FP or CSV/TSV text
|
|
300 file(s) along with generated fingerprints for *FP | text | all*
|
|
301 values of --output option: use a *SDFile(s)* datafield value; use
|
|
302 molname line from *SDFile(s)*; generate a sequential ID with
|
|
303 specific prefix; use combination of both MolName and LabelPrefix
|
|
304 with usage of LabelPrefix values for empty molname lines.
|
|
305
|
|
306 Possible values: *DataField | MolName | LabelPrefix |
|
|
307 MolNameOrLabelPrefix*. Default value: *LabelPrefix*.
|
|
308
|
|
309 For *MolNameAndLabelPrefix* value of --CompoundIDMode, molname line
|
|
310 in *SDFile(s)* takes precedence over sequential compound IDs
|
|
311 generated using *LabelPrefix* and only empty molname values are
|
|
312 replaced with sequential compound IDs.
|
|
313
|
|
314 This is only used for *CompoundID* value of --DataFieldsMode option.
|
|
315
|
|
316 --DataFields *"FieldLabel1,FieldLabel2,..."*
|
|
317 Comma delimited list of *SDFiles(s)* data fields to extract and
|
|
318 write to CSV/TSV text file(s) along with generated fingerprints for
|
|
319 *text | all* values of --output option.
|
|
320
|
|
321 This is only used for *Specify* value of --DataFieldsMode option.
|
|
322
|
|
323 Examples:
|
|
324
|
|
325 Extreg
|
|
326 MolID,CompoundName
|
|
327
|
|
328 -d, --DataFieldsMode *All | Common | Specify | CompoundID*
|
|
329 Specify how data fields in *SDFile(s)* are transferred to output
|
|
330 CSV/TSV text file(s) along with generated fingerprints for *text |
|
|
331 all* values of --output option: transfer all SD data field; transfer
|
|
332 SD data files common to all compounds; extract specified data
|
|
333 fields; generate a compound ID using molname line, a compound
|
|
334 prefix, or a combination of both. Possible values: *All | Common |
|
|
335 specify | CompoundID*. Default value: *CompoundID*.
|
|
336
|
|
337 --DistanceBinSize *number*
|
|
338 Distance bin size used to bin distances between atom pairs in atom
|
|
339 triplets. Default value: *2*. Valid values: positive integers.
|
|
340
|
|
341 For default --MinDistance and --MaxDistance values of 1 and 10 with
|
|
342 --DistanceBinSize of 2 [ Ref 70 ], the following 5 distance bins are
|
|
343 generated:
|
|
344
|
|
345 [1, 2] [3, 4] [5, 6] [7, 8] [9 10]
|
|
346
|
|
347 The lower distance bound on the distance bin is uses to bin the
|
|
348 distance between atom pairs in atom triplets. So in the previous
|
|
349 example, atom pairs with distances 1 and 2 fall in first distance
|
|
350 bin, atom pairs with distances 3 and 4 fall in second distance bin
|
|
351 and so on.
|
|
352
|
|
353 In order to distribute distance bins of equal size, the last bin is
|
|
354 allowed to go past --MaxDistance by up to distance bin size. For
|
|
355 example, --MinDistance and --MaxDistance values of 2 and 10 with
|
|
356 --DistanceBinSize of 2 generates the following 6 distance bins:
|
|
357
|
|
358 [2, 3] [4, 5] [6, 7] [8, 9] [10 11]
|
|
359
|
|
360 -f, --Filter *Yes | No*
|
|
361 Specify whether to check and filter compound data in SDFile(s).
|
|
362 Possible values: *Yes or No*. Default value: *Yes*.
|
|
363
|
|
364 By default, compound data is checked before calculating fingerprints
|
|
365 and compounds containing atom data corresponding to non-element
|
|
366 symbols or no atom data are ignored.
|
|
367
|
|
368 --FingerprintsLabelMode *FingerprintsLabelOnly |
|
|
369 FingerprintsLabelWithIDs*
|
|
370 Specify how fingerprints label is generated in conjunction with
|
|
371 --FingerprintsLabel option value: use fingerprints label generated
|
|
372 only by --FingerprintsLabel option value or append topological atom
|
|
373 pair count value IDs to --FingerprintsLabel option value.
|
|
374
|
|
375 Possible values: *FingerprintsLabelOnly | FingerprintsLabelWithIDs*.
|
|
376 Default value: *FingerprintsLabelOnly*.
|
|
377
|
|
378 Topological atom pairs IDs appended to --FingerprintsLabel value
|
|
379 during *FingerprintsLabelWithIDs* values of --FingerprintsLabelMode
|
|
380 correspond to atom pair count values in fingerprint vector string.
|
|
381
|
|
382 *FingerprintsLabelWithIDs* value of --FingerprintsLabelMode is
|
|
383 ignored during *ArbitrarySize* value of --AtomTripletsSetSizeToUse
|
|
384 option and topological atom triplets IDs not appended to the label.
|
|
385
|
|
386 --FingerprintsLabel *text*
|
|
387 SD data label or text file column label to use for fingerprints
|
|
388 string in output SD or CSV/TSV text file(s) specified by --output.
|
|
389 Default value: *TopologicalPharmacophoreAtomTripletsFingerprints*.
|
|
390
|
|
391 -h, --help
|
|
392 Print this help message.
|
|
393
|
|
394 -k, --KeepLargestComponent *Yes | No*
|
|
395 Generate fingerprints for only the largest component in molecule.
|
|
396 Possible values: *Yes or No*. Default value: *Yes*.
|
|
397
|
|
398 For molecules containing multiple connected components, fingerprints
|
|
399 can be generated in two different ways: use all connected components
|
|
400 or just the largest connected component. By default, all atoms
|
|
401 except for the largest connected component are deleted before
|
|
402 generation of fingerprints.
|
|
403
|
|
404 --MinDistance *number*
|
|
405 Minimum bond distance between atom pairs corresponding to atom
|
|
406 triplets for generating topological pharmacophore atom triplets.
|
|
407 Default value: *1*. Valid values: positive integers and less than
|
|
408 --MaxDistance.
|
|
409
|
|
410 --MaxDistance *number*
|
|
411 Maximum bond distance between atom pairs corresponding to atom
|
|
412 triplets for generating topological pharmacophore atom triplets.
|
|
413 Default value: *10*. Valid values: positive integers and greater
|
|
414 than --MinDistance.
|
|
415
|
|
416 --OutDelim *comma | tab | semicolon*
|
|
417 Delimiter for output CSV/TSV text file(s). Possible values: *comma,
|
|
418 tab, or semicolon* Default value: *comma*.
|
|
419
|
|
420 --output *SD | FP | text | all*
|
|
421 Type of output files to generate. Possible values: *SD, FP, text, or
|
|
422 all*. Default value: *text*.
|
|
423
|
|
424 -o, --overwrite
|
|
425 Overwrite existing files.
|
|
426
|
|
427 -q, --quote *Yes | No*
|
|
428 Put quote around column values in output CSV/TSV text file(s).
|
|
429 Possible values: *Yes or No*. Default value: *Yes*.
|
|
430
|
|
431 -r, --root *RootName*
|
|
432 New file name is generated using the root: <Root>.<Ext>. Default for
|
|
433 new file names:
|
|
434 <SDFileName><TopologicalPharmacophoreAtomTripletsFP>.<Ext>. The file
|
|
435 type determines <Ext> value. The sdf, fpf, csv, and tsv <Ext> values
|
|
436 are used for SD, FP, comma/semicolon, and tab delimited text files,
|
|
437 respectively.This option is ignored for multiple input files.
|
|
438
|
|
439 -u, --UseTriangleInequality *Yes | No*
|
|
440 Specify whether to imply triangle distance inequality test to
|
|
441 distances between atom pairs in atom triplets during generation of
|
|
442 atom triplets basis set generation. Possible values: *Yes or No*.
|
|
443 Default value: *Yes*.
|
|
444
|
|
445 Triangle distance inequality test implies that distance or binned
|
|
446 distance between any two atom pairs in an atom triplet must be less
|
|
447 than the sum of distances or binned distances between other two
|
|
448 atoms pairs and greater than the difference of their distances.
|
|
449
|
|
450 For atom triplet PxDyz-PyDxz-PzDxy to satisfy triangle inequality:
|
|
451
|
|
452 Dyz > |Dxz - Dxy| and Dyz < Dxz + Dxy
|
|
453 Dxz > |Dyz - Dxy| and Dyz < Dyz + Dxy
|
|
454 Dxy > |Dyz - Dxz| and Dxy < Dyz + Dxz
|
|
455
|
|
456 -v, --VectorStringFormat *ValuesString, IDsAndValuesString |
|
|
457 IDsAndValuesPairsString | ValuesAndIDsString | ValuesAndIDsPairsString*
|
|
458 Format of fingerprints vector string data in output SD, FP or
|
|
459 CSV/TSV text file(s) specified by --output option. Possible values:
|
|
460 *ValuesString, IDsAndValuesString | IDsAndValuesPairsString |
|
|
461 ValuesAndIDsString | ValuesAndIDsPairsString*. Defaultvalue:
|
|
462 *ValuesString*.
|
|
463
|
|
464 Default value during *FixedSize* value of --AtomTripletsSetSizeToUse
|
|
465 option: *ValuesString*. Default value during *ArbitrarySize* value
|
|
466 of --AtomTripletsSetSizeToUse option: *IDsAndValuesString*.
|
|
467
|
|
468 *ValuesString* option value is not allowed for *ArbitrarySize* value
|
|
469 of --AtomTripletsSetSizeToUse option.
|
|
470
|
|
471 Examples:
|
|
472
|
|
473 FingerprintsVector;TopologicalPharmacophoreAtomTriplets:ArbitrarySize:
|
|
474 MinDistance1:MaxDistance10;696;NumericalValues;IDsAndValuesString;Ar1-
|
|
475 Ar1-Ar1 Ar1-Ar1-H1 Ar1-Ar1-HBA1 Ar1-Ar1-HBD1 Ar1-H1-H1 Ar1-H1-HBA1 Ar1
|
|
476 -H1-HBD1 Ar1-HBA1-HBD1 H1-H1-H1 H1-H1-HBA1 H1-H1-HBD1 H1-HBA1-HBA1 H1-
|
|
477 HBA1-HBD1 H1-HBA1-NI1 H1-HBD1-NI1 HBA1-HBA1-NI1 HBA1-HBD1-NI1 Ar1-...;
|
|
478 46 106 8 3 83 11 4 1 21 5 3 1 2 2 1 1 1 100 101 18 11 145 132 26 14 23
|
|
479 28 3 3 5 4 61 45 10 4 16 20 7 5 1 3 4 5 3 1 1 1 1 5 4 2 1 2 2 2 1 1 1
|
|
480 119 123 24 15 185 202 41 25 22 17 3 5 85 95 18 11 23 17 3 1 1 6 4 ...
|
|
481
|
|
482 FingerprintsVector;TopologicalPharmacophoreAtomTriplets:FixedSize:MinD
|
|
483 istance1:MaxDistance10;2692;OrderedNumericalValues;ValuesString;46 106
|
|
484 8 3 0 0 83 11 4 0 0 0 1 0 0 0 0 0 0 0 0 21 5 3 0 0 1 2 2 0 0 1 0 0 0
|
|
485 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 101 18 11 0 0 145 132 26
|
|
486 14 0 0 23 28 3 3 0 0 5 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 61 45 10 4 0
|
|
487 0 16 20 7 5 1 0 3 4 5 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 5 ...
|
|
488
|
|
489 FingerprintsVector;TopologicalPharmacophoreAtomTriplets:FixedSize:MinD
|
|
490 istance1:MaxDistance10;2692;OrderedNumericalValues;ValuesAndIDsPairsSt
|
|
491 ring;46 Ar1-Ar1-Ar1 106 Ar1-Ar1-H1 8 Ar1-Ar1-HBA1 3 Ar1-Ar1-HBD1 0 Ar1
|
|
492 -Ar1-NI1 0 Ar1-Ar1-PI1 83 Ar1-H1-H1 11 Ar1-H1-HBA1 4 Ar1-H1-HBD1 0 Ar1
|
|
493 -H1-NI1 0 Ar1-H1-PI1 0 Ar1-HBA1-HBA1 1 Ar1-HBA1-HBD1 0 Ar1-HBA1-NI1 0
|
|
494 Ar1-HBA1-PI1 0 Ar1-HBD1-HBD1 0 Ar1-HBD1-NI1 0 Ar1-HBD1-PI1 0 Ar1-NI...
|
|
495
|
|
496 -w, --WorkingDir *DirName*
|
|
497 Location of working directory. Default value: current directory.
|
|
498
|
|
499 EXAMPLES
|
|
500 To generate topological pharmacophore atom triplets fingerprints of
|
|
501 arbitrary size corresponding to 5 distance bins spanning distances from
|
|
502 1 through 10 using default atoms with distances satisfying triangle
|
|
503 inequality and create a SampleTPATFP.csv file containing sequential
|
|
504 compound IDs along with fingerprints vector strings data in ValuesString
|
|
505 format, type:
|
|
506
|
|
507 % TopologicalPharmacophoreAtomTripletsFingerprints.pl -r SampleTPATFP
|
|
508 -o Sample.sdf
|
|
509
|
|
510 To generate topological pharmacophore atom triplets fingerprints of
|
|
511 fixed size corresponding to 5 distance bins spanning distances from 1
|
|
512 through 10 using default atoms with distances satisfying triangle
|
|
513 inequality and create a SampleTPATFP.csv file containing sequential
|
|
514 compound IDs along with fingerprints vector strings data in ValuesString
|
|
515 format, type:
|
|
516
|
|
517 % TopologicalPharmacophoreAtomTripletsFingerprints.pl
|
|
518 --AtomTripletsSetSizeToUse FixedSize -r SampleTPATFP -o Sample.sdf
|
|
519
|
|
520 To generate topological pharmacophore atom triplets fingerprints of
|
|
521 arbitrary size corresponding to 5 distance bins spanning distances from
|
|
522 1 through 10 using default atoms with distances satisfying triangle
|
|
523 inequality and create SampleTPATFP.sdf, SampleTPATFP.fpf and
|
|
524 SampleTPATFP.csv files with CSV file containing sequential compound IDs
|
|
525 along with fingerprints vector strings data in ValuesString format,
|
|
526 type:
|
|
527
|
|
528 % TopologicalPharmacophoreAtomTripletsFingerprints.pl --output all
|
|
529 -r SampleTPATFP -o Sample.sdf
|
|
530
|
|
531 To generate topological pharmacophore atom triplets fingerprints of
|
|
532 arbitrary size corresponding to 5 distance bins spanning distances from
|
|
533 1 through 10 using default atoms with distances satisfying triangle
|
|
534 inequality and create a SampleTPATFP.csv file containing sequential
|
|
535 compound IDs along with fingerprints vector strings data in ValuesString
|
|
536 format and atom triplets IDs in the fingerprint data column label
|
|
537 starting with Fingerprints, type:
|
|
538
|
|
539 % TopologicalPharmacophoreAtomTripletsFingerprints.pl
|
|
540 --FingerprintsLabelMode FingerprintsLabelWithIDs --FingerprintsLabel
|
|
541 Fingerprints -r SampleTPATFP -o Sample.sdf
|
|
542
|
|
543 To generate topological pharmacophore atom triplets fingerprints of
|
|
544 arbitrary size corresponding to 5 distance bins spanning distances from
|
|
545 1 through 10 using default atoms with distances not satisfying triangle
|
|
546 inequality and create a SampleTPATFP.csv file containing sequential
|
|
547 compound IDs along with fingerprints vector strings data in ValuesString
|
|
548 format, type:
|
|
549
|
|
550 % TopologicalPharmacophoreAtomTripletsFingerprints.pl
|
|
551 --UseTriangleInequality No -r SampleTPATFP -o Sample.sdf
|
|
552
|
|
553 To generate topological pharmacophore atom triplets fingerprints of
|
|
554 arbitrary size corresponding to 6 distance bins spanning distances from
|
|
555 1 through 12 using default atoms with distances satisfying triangle
|
|
556 inequality and create a SampleTPATFP.csv file containing sequential
|
|
557 compound IDs along with fingerprints vector strings data in ValuesString
|
|
558 format, type:
|
|
559
|
|
560 % TopologicalPharmacophoreAtomTripletsFingerprints.pl
|
|
561 --UseTriangleInequality Yes --MinDistance 1 --MaxDistance 12
|
|
562 --DistanceBinSIze 2 -r SampleTPATFP -o Sample.sdf
|
|
563
|
|
564 To generate topological pharmacophore atom triplets fingerprints of
|
|
565 arbitrary size corresponding to 6 distance bins spanning distances from
|
|
566 1 through 12 using "HBD,HBA,PI, NI, H, Ar" atoms with distances
|
|
567 satisfying triangle inequality and create a SampleTPATFP.csv file
|
|
568 containing sequential compound IDs along with fingerprints vector
|
|
569 strings data in ValuesString format, type:
|
|
570
|
|
571 % TopologicalPharmacophoreAtomTripletsFingerprints.pl
|
|
572 --AtomTypesToUse "HBD,HBA,PI,NI,H,Ar" --UseTriangleInequality Yes
|
|
573 --MinDistance 1 --MaxDistance 12 --DistanceBinSIze 2
|
|
574 --VectorStringFormat ValuesString -r SampleTPATFP -o Sample.sdf
|
|
575
|
|
576 To generate topological pharmacophore atom triplets fingerprints of
|
|
577 arbitrary size corresponding to 5 distance bins spanning distances from
|
|
578 1 through 10 using default atoms with distances satisfying triangle
|
|
579 inequality and create a SampleTPATFP.csv file containing sequential
|
|
580 compound IDs from molecule name line along with fingerprints vector
|
|
581 strings data in ValuesString format, type:
|
|
582
|
|
583 % TopologicalPharmacophoreAtomTripletsFingerprints.pl --DataFieldsMode
|
|
584 CompoundID -CompoundIDMode MolName -r SampleTPATFP -o Sample.sdf
|
|
585
|
|
586 To generate topological pharmacophore atom triplets fingerprints of
|
|
587 arbitrary size corresponding to 5 distance bins spanning distances from
|
|
588 1 through 10 using default atoms with distances satisfying triangle
|
|
589 inequality and create a SampleTPATFP.csv file containing sequential
|
|
590 compound IDs using specified data field along with fingerprints vector
|
|
591 strings data in ValuesString format, type:
|
|
592
|
|
593 % TopologicalPharmacophoreAtomTripletsFingerprints.pl --DataFieldsMode
|
|
594 CompoundID -CompoundIDMode DataField --CompoundID Mol_ID
|
|
595 -r SampleTPATFP -o Sample.sdf
|
|
596
|
|
597 To generate topological pharmacophore atom triplets fingerprints of
|
|
598 arbitrary size corresponding to 5 distance bins spanning distances from
|
|
599 1 through 10 using default atoms with distances satisfying triangle
|
|
600 inequality and create a SampleTPATFP.csv file containing sequential
|
|
601 compound IDs using combination of molecule name line and an explicit
|
|
602 compound prefix along with fingerprints vector strings data, type:
|
|
603
|
|
604 % TopologicalPharmacophoreAtomTripletsFingerprints.pl --DataFieldsMode
|
|
605 CompoundID -CompoundIDMode MolnameOrLabelPrefix
|
|
606 --CompoundID Cmpd --CompoundIDLabel MolID -r SampleSampleTPATFP
|
|
607 -o Sample.sdf
|
|
608
|
|
609 To generate topological pharmacophore atom triplets fingerprints of
|
|
610 arbitrary size corresponding to 5 distance bins spanning distances from
|
|
611 1 through 10 using default atoms with distances satisfying triangle
|
|
612 inequality and create a SampleTPATFP.csv file containing specific data
|
|
613 fields columns along with fingerprints vector strings data, type:
|
|
614
|
|
615 % TopologicalPharmacophoreAtomTripletsFingerprints.pl --DataFieldsMode
|
|
616 Specify --DataFields Mol_ID -r SampleTPATFP -o Sample.sdf
|
|
617
|
|
618 To generate topological pharmacophore atom triplets fingerprints of
|
|
619 arbitrary size corresponding to 5 distance bins spanning distances from
|
|
620 1 through 10 using default atoms with distances satisfying triangle
|
|
621 inequality and create a SampleTPATFP.csv file containing common data
|
|
622 fields columns along with fingerprints vector strings data, type:
|
|
623
|
|
624 % TopologicalPharmacophoreAtomTripletsFingerprints.pl --DataFieldsMode
|
|
625 Common -r SampleTPATFP -o Sample.sdf
|
|
626
|
|
627 To generate topological pharmacophore atom triplets fingerprints of
|
|
628 arbitrary size corresponding to 5 distance bins spanning distances from
|
|
629 1 through 10 using default atoms with distances satisfying triangle
|
|
630 inequality and create SampleTPATFP.sdf, SampleTPATFP.fpf and
|
|
631 SampleTPATFP.csv files containing all data fields columns in CSV file
|
|
632 along with fingerprints data, type:
|
|
633
|
|
634 % TopologicalPharmacophoreAtomTripletsFingerprints.pl --DataFieldsMode
|
|
635 All --output all -r SampleTPATFP -o Sample.sdf
|
|
636
|
|
637 AUTHOR
|
|
638 Manish Sud <msud@san.rr.com>
|
|
639
|
|
640 SEE ALSO
|
|
641 InfoFingerprintsFiles.pl, SimilarityMatricesFingerprints.pl,
|
|
642 AtomNeighborhoodsFingerprints.pl, ExtendedConnectivityFingerprints.pl,
|
|
643 MACCSKeysFingerprints.pl, PathLengthFingerprints.pl,
|
|
644 TopologicalAtomPairsFingerprints.pl,
|
|
645 TopologicalAtomTorsionsFingerprints.pl,
|
|
646 TopologicalPharmacophoreAtomPairsFingerprints.pl
|
|
647
|
|
648 COPYRIGHT
|
|
649 Copyright (C) 2015 Manish Sud. All rights reserved.
|
|
650
|
|
651 This file is part of MayaChemTools.
|
|
652
|
|
653 MayaChemTools is free software; you can redistribute it and/or modify it
|
|
654 under the terms of the GNU Lesser General Public License as published by
|
|
655 the Free Software Foundation; either version 3 of the License, or (at
|
|
656 your option) any later version.
|
|
657
|