Mercurial > repos > deepakjadmin > mayatool3_test2
comparison docs/scripts/txt/EStateIndiciesFingerprints.txt @ 0:4816e4a8ae95 draft default tip
Uploaded
author | deepakjadmin |
---|---|
date | Wed, 20 Jan 2016 09:23:18 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:4816e4a8ae95 |
---|---|
1 NAME | |
2 EStateIndiciesFingerprints.pl - Generate E-state indicies fingerprints | |
3 for SD files | |
4 | |
5 SYNOPSIS | |
6 EStateIndiciesFingerprints.pl SDFile(s)... | |
7 | |
8 EStateIndiciesFingerprints.pl [--AromaticityModel | |
9 *AromaticityModelType*] [--CompoundID *DataFieldName or | |
10 LabelPrefixString*] [--CompoundIDLabel *text*] [--CompoundIDMode | |
11 *DataField | MolName | LabelPrefix | MolNameOrLabelPrefix*] | |
12 [--DataFields *"FieldLabel1,FieldLabel2,..."*] [-d, --DataFieldsMode | |
13 *All | Common | Specify | CompoundID*] [-e, --EStateAtomTypesSetToUse | |
14 *ArbitrarySize or FixedSize*] [-f, --Filter *Yes | No*] | |
15 [--FingerprintsLabelMode *FingerprintsLabelOnly | | |
16 FingerprintsLabelWithIDs*] [--FingerprintsLabel *text*] [-h, --help] | |
17 [-k, --KeepLargestComponent *Yes | No*] [--OutDelim *comma | tab | | |
18 semicolon*] [--output *SD | FP | text | all*] [-o, --overwrite] [-q, | |
19 --quote *Yes | No*] [-r, --root *RootName*] [-s, --size *number*] | |
20 [--ValuesPrecision *number*] [-v, --VectorStringFormat | |
21 *IDsAndValuesString | IDsAndValuesPairsString | ValuesAndIDsString | | |
22 ValuesAndIDsPairsString*] [-w, --WorkingDir *DirName*] | |
23 | |
24 DESCRIPTION | |
25 Generate E-state indicies fingerprints [ Ref 75-78 ] for *SDFile(s)* and | |
26 create appropriate SD, FP, or CSV/TSV text file(s) containing | |
27 fingerprints bit-vector or vector strings corresponding to molecular | |
28 fingerprints. | |
29 | |
30 Multiple SDFile names are separated by spaces. The valid file extensions | |
31 are *.sdf* and *.sd*. All other file names are ignored. All the SD files | |
32 in a current directory can be specified either by **.sdf* or the current | |
33 directory name. | |
34 | |
35 E-state atom types are assigned to all non-hydrogen atoms in a molecule | |
36 using module AtomTypes::EStateAtomTypes.pm and E-state values are | |
37 calculated using module AtomicDescriptors::EStateValues.pm. Using | |
38 E-state atom types and E-state values, EStateIndiciesFingerprints | |
39 constituting sum of E-state values for E-sate atom types is generated. | |
40 | |
41 Two types of E-state atom types set size are allowed: | |
42 | |
43 ArbitrarySize - Corresponds to only E-state atom types detected | |
44 in molecule | |
45 FixedSize - Corresponds to fixed number of E-state atom types previously | |
46 defined | |
47 | |
48 Module AtomTypes::EStateAtomTypes.pm, used to assign E-state atom types | |
49 to non-hydrogen atoms in the molecule, is able to assign atom types to | |
50 any valid atom group. However, for *FixedSize* value of | |
51 EStateAtomTypesSetToUse, only a fixed set of E-state atom types | |
52 corresponding to specific atom groups [ Appendix III in Ref 77 ] are | |
53 used for fingerprints. | |
54 | |
55 The fixed size E-state atom type set size used during generation of | |
56 fingerprints contains 87 E-state non-hydrogen atom types in | |
57 EStateAtomTypes.csv data file distributed with MayaChemTools. | |
58 | |
59 Combination of Type and EStateAtomTypesSetToUse allow generation of 2 | |
60 different types of E-state indicies fingerprints: | |
61 | |
62 Type EStateAtomTypesSetToUse | |
63 | |
64 EStateIndicies ArbitrarySize [ default fingerprints ] | |
65 EStateIndicies FixedSize | |
66 | |
67 Example of *SD* file containing E-state indicies fingerprints string | |
68 data: | |
69 | |
70 ... ... | |
71 ... ... | |
72 $$$$ | |
73 ... ... | |
74 ... ... | |
75 ... ... | |
76 41 44 0 0 0 0 0 0 0 0999 V2000 | |
77 -3.3652 1.4499 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 | |
78 ... ... | |
79 2 3 1 0 0 0 0 | |
80 ... ... | |
81 M END | |
82 > <CmpdID> | |
83 Cmpd1 | |
84 | |
85 > <EStateIndiciesFingerprints> | |
86 FingerprintsVector;EStateIndicies:ArbitrarySize;11;NumericalValues;IDsA | |
87 ndValuesString;SaaCH SaasC SaasN SdO SdssC SsCH3 SsF SsOH SssCH2 SssNH | |
88 SsssCH;24.778 4.387 1.993 25.023 -1.435 3.975 14.006 29.759 -0.073 3.02 | |
89 4 -2.270 | |
90 | |
91 $$$$ | |
92 ... ... | |
93 ... ... | |
94 | |
95 Example of *FP* file containing E-state indicies fingerprints string | |
96 data: | |
97 | |
98 # | |
99 # Package = MayaChemTools 7.4 | |
100 # Release Date = Oct 21, 2010 | |
101 # | |
102 # TimeStamp = Fri Mar 11 14:35:11 2011 | |
103 # | |
104 # FingerprintsStringType = FingerprintsVector | |
105 # | |
106 # Description = EStateIndicies:ArbitrarySize | |
107 # VectorStringFormat = IDsAndValuesString | |
108 # VectorValuesType = NumericalValues | |
109 # | |
110 Cmpd1 11;SaaCH SaasC SaasN SdO SdssC...;24.778 4.387 1.993 25.023 -1... | |
111 Cmpd2 9;SdNH SdO SdssC SsCH3 SsNH...;7.418 22.984 -1.583 5.387 5.400... | |
112 ... ... | |
113 ... .. | |
114 | |
115 Example of CSV *Text* file containing E-state indicies fingerprints | |
116 string data: | |
117 | |
118 "CompoundID","EStateIndiciesFingerprints" | |
119 "Cmpd1","FingerprintsVector;EStateIndicies:ArbitrarySize;11;NumericalVa | |
120 lues;IDsAndValuesString;SaaCH SaasC SaasN SdO SdssC SsCH3 SsF SsOH SssC | |
121 H2 SssNH SsssCH;24.778 4.387 1.993 25.023 -1.435 3.975 14.006 29.759 -0 | |
122 .073 3.024 -2.270" | |
123 "Cmpd2","FingerprintsVector;EStateIndicies:ArbitrarySize;9;NumericalVal | |
124 ues;IDsAndValuesString;SdNH SdO SdssC SsCH3 SsNH2 SsOH SssCH2 SssNH Sss | |
125 sCH;7.418 22.984 -1.583 5.387 5.400 19.852 1.737 5.624 -3.319" | |
126 ... ... | |
127 ... ... | |
128 | |
129 The current release of MayaChemTools generates the following types of | |
130 E-state fingerprints vector strings: | |
131 | |
132 FingerprintsVector;EStateIndicies:ArbitrarySize;11;NumericalValues;IDs | |
133 AndValuesString;SaaCH SaasC SaasN SdO SdssC SsCH3 SsF SsOH SssCH2 SssN | |
134 H SsssCH;24.778 4.387 1.993 25.023 -1.435 3.975 14.006 29.759 -0.073 3 | |
135 .024 -2.270 | |
136 | |
137 FingerprintsVector;EStateIndicies:FixedSize;87;OrderedNumericalValues; | |
138 ValuesString;0 0 0 0 0 0 0 3.975 0 -0.073 0 0 24.778 -2.270 0 0 -1.435 | |
139 4.387 0 0 0 0 0 0 3.024 0 0 0 0 0 0 0 1.993 0 29.759 25.023 0 0 0 0 1 | |
140 4.006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | |
141 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | |
142 | |
143 FingerprintsVector;EStateIndicies:FixedSize;87;OrderedNumericalValues; | |
144 IDsAndValuesString;SsLi SssBe SssssBem SsBH2 SssBH SsssB SssssBm SsCH3 | |
145 SdCH2 SssCH2 StCH SdsCH SaaCH SsssCH SddC StsC SdssC SaasC SaaaC Sssss | |
146 C SsNH3p SsNH2 SssNH2p SdNH SssNH SaaNH StN SsssNHp SdsN SaaN SsssN Sd | |
147 0 0 0 0 0 0 0 3.975 0 -0.073 0 0 24.778 -2.270 0 0 -1.435 4.387 0 0 0 | |
148 0 0 0 3.024 0 0 0 0 0 0 0 1.993 0 29.759 25.023 0 0 0 0 14.006 0 0 0 0 | |
149 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0... | |
150 | |
151 OPTIONS | |
152 --AromaticityModel *MDLAromaticityModel | TriposAromaticityModel | | |
153 MMFFAromaticityModel | ChemAxonBasicAromaticityModel | | |
154 ChemAxonGeneralAromaticityModel | DaylightAromaticityModel | | |
155 MayaChemToolsAromaticityModel* | |
156 Specify aromaticity model to use during detection of aromaticity. | |
157 Possible values in the current release are: *MDLAromaticityModel, | |
158 TriposAromaticityModel, MMFFAromaticityModel, | |
159 ChemAxonBasicAromaticityModel, ChemAxonGeneralAromaticityModel, | |
160 DaylightAromaticityModel or MayaChemToolsAromaticityModel*. Default | |
161 value: *MayaChemToolsAromaticityModel*. | |
162 | |
163 The supported aromaticity model names along with model specific | |
164 control parameters are defined in AromaticityModelsData.csv, which | |
165 is distributed with the current release and is available under | |
166 lib/data directory. Molecule.pm module retrieves data from this file | |
167 during class instantiation and makes it available to method | |
168 DetectAromaticity for detecting aromaticity corresponding to a | |
169 specific model. | |
170 | |
171 --CompoundID *DataFieldName or LabelPrefixString* | |
172 This value is --CompoundIDMode specific and indicates how compound | |
173 ID is generated. | |
174 | |
175 For *DataField* value of --CompoundIDMode option, it corresponds to | |
176 datafield label name whose value is used as compound ID; otherwise, | |
177 it's a prefix string used for generating compound IDs like | |
178 LabelPrefixString<Number>. Default value, *Cmpd*, generates compound | |
179 IDs which look like Cmpd<Number>. | |
180 | |
181 Examples for *DataField* value of --CompoundIDMode: | |
182 | |
183 MolID | |
184 ExtReg | |
185 | |
186 Examples for *LabelPrefix* or *MolNameOrLabelPrefix* value of | |
187 --CompoundIDMode: | |
188 | |
189 Compound | |
190 | |
191 The value specified above generates compound IDs which correspond to | |
192 Compound<Number> instead of default value of Cmpd<Number>. | |
193 | |
194 --CompoundIDLabel *text* | |
195 Specify compound ID column label for FP or CSV/TSV text file(s) used | |
196 during *CompoundID* value of --DataFieldsMode option. Default: | |
197 *CompoundID*. | |
198 | |
199 --CompoundIDMode *DataField | MolName | LabelPrefix | | |
200 MolNameOrLabelPrefix* | |
201 Specify how to generate compound IDs and write to FP or CSV/TSV text | |
202 file(s) along with generated fingerprints for *FP | text | all* | |
203 values of --output option: use a *SDFile(s)* datafield value; use | |
204 molname line from *SDFile(s)*; generate a sequential ID with | |
205 specific prefix; use combination of both MolName and LabelPrefix | |
206 with usage of LabelPrefix values for empty molname lines. | |
207 | |
208 Possible values: *DataField | MolName | LabelPrefix | | |
209 MolNameOrLabelPrefix*. Default: *LabelPrefix*. | |
210 | |
211 For *MolNameAndLabelPrefix* value of --CompoundIDMode, molname line | |
212 in *SDFile(s)* takes precedence over sequential compound IDs | |
213 generated using *LabelPrefix* and only empty molname values are | |
214 replaced with sequential compound IDs. | |
215 | |
216 This is only used for *CompoundID* value of --DataFieldsMode option. | |
217 | |
218 --DataFields *"FieldLabel1,FieldLabel2,..."* | |
219 Comma delimited list of *SDFiles(s)* data fields to extract and | |
220 write to CSV/TSV text file(s) along with generated fingerprints for | |
221 *text | all* values of --output option. | |
222 | |
223 This is only used for *Specify* value of --DataFieldsMode option. | |
224 | |
225 Examples: | |
226 | |
227 Extreg | |
228 MolID,CompoundName | |
229 | |
230 -d, --DataFieldsMode *All | Common | Specify | CompoundID* | |
231 Specify how data fields in *SDFile(s)* are transferred to output | |
232 CSV/TSV text file(s) along with generated fingerprints for *text | | |
233 all* values of --output option: transfer all SD data field; transfer | |
234 SD data files common to all compounds; extract specified data | |
235 fields; generate a compound ID using molname line, a compound | |
236 prefix, or a combination of both. Possible values: *All | Common | | |
237 specify | CompoundID*. Default value: *CompoundID*. | |
238 | |
239 -e, --EStateAtomTypesSetToUse *ArbitrarySize | FixedSize* | |
240 E-state atom types set size to use during generation of E-state | |
241 indicies fingerprints. Possible values: *ArbitrarySize | FixedSize*; | |
242 Default value: *ArbitrarySize*. | |
243 | |
244 *ArbitrarySize* corrresponds to only E-state atom types detected in | |
245 molecule; *FixedSize* corresponds to fixed number of previously | |
246 defined E-state atom types. | |
247 | |
248 For *EStateIndicies*, a fingerprint vector string is generated. The | |
249 vector string corresponding to *EStateIndicies* contains sum of | |
250 E-state values for E-state atom types. | |
251 | |
252 Module AtomTypes::EStateAtomTypes.pm is used to assign E-state atom | |
253 types to non-hydrogen atoms in the molecule which is able to assign | |
254 atom types to any valid atom group. However, for *FixedSize* value | |
255 of EStateAtomTypesSetToUse, only a fixed set of E-state atom types | |
256 corresponding to specific atom groups [ Appendix III in Ref 77 ] are | |
257 used for fingerprints. | |
258 | |
259 The fixed size E-state atom type set size used during generation of | |
260 fingerprints contains 87 E-state non-hydrogen atom types in | |
261 EStateAtomTypes.csv data file distributed with MayaChemTools. | |
262 | |
263 -f, --Filter *Yes | No* | |
264 Specify whether to check and filter compound data in SDFile(s). | |
265 Possible values: *Yes or No*. Default value: *Yes*. | |
266 | |
267 By default, compound data is checked before calculating fingerprints | |
268 and compounds containing atom data corresponding to non-element | |
269 symbols or no atom data are ignored. | |
270 | |
271 --FingerprintsLabelMode *FingerprintsLabelOnly | | |
272 FingerprintsLabelWithIDs* | |
273 Specify how fingerprints label is generated in conjunction with | |
274 --FingerprintsLabel option value: use fingerprints label generated | |
275 only by --FingerprintsLabel option value or append E-state atom type | |
276 value IDs to --FingerprintsLabel option value. | |
277 | |
278 Possible values: *FingerprintsLabelOnly | FingerprintsLabelWithIDs*. | |
279 Default value: *FingerprintsLabelOnly*. | |
280 | |
281 This option is only used for *FixedSize* value of -e, | |
282 --EStateAtomTypesSetToUse option during generation of | |
283 *EStateIndicies* E-state fingerprints. | |
284 | |
285 E-state atom type IDs appended to --FingerprintsLabel value during | |
286 *FingerprintsLabelWithIDs* values of --FingerprintsLabelMode | |
287 correspond to fixed number of previously defined E-state atom types. | |
288 | |
289 --FingerprintsLabel *text* | |
290 SD data label or text file column label to use for fingerprints | |
291 string in output SD or CSV/TSV text file(s) specified by --output. | |
292 Default value: *EStateIndiciesFingerprints*. | |
293 | |
294 -h, --help | |
295 Print this help message. | |
296 | |
297 -k, --KeepLargestComponent *Yes | No* | |
298 Generate fingerprints for only the largest component in molecule. | |
299 Possible values: *Yes or No*. Default value: *Yes*. | |
300 | |
301 For molecules containing multiple connected components, fingerprints | |
302 can be generated in two different ways: use all connected components | |
303 or just the largest connected component. By default, all atoms | |
304 except for the largest connected component are deleted before | |
305 generation of fingerprints. | |
306 | |
307 --OutDelim *comma | tab | semicolon* | |
308 Delimiter for output CSV/TSV text file(s). Possible values: *comma, | |
309 tab, or semicolon* Default value: *comma*. | |
310 | |
311 --output *SD | FP | text | all* | |
312 Type of output files to generate. Possible values: *SD, FP, text, or | |
313 all*. Default value: *text*. | |
314 | |
315 -o, --overwrite | |
316 Overwrite existing files. | |
317 | |
318 -q, --quote *Yes | No* | |
319 Put quote around column values in output CSV/TSV text file(s). | |
320 Possible values: *Yes or No*. Default value: *Yes*. | |
321 | |
322 -r, --root *RootName* | |
323 New file name is generated using the root: <Root>.<Ext>. Default for | |
324 new file names: <SDFileName><EStateIndiciesFP>.<Ext>. The file type | |
325 determines <Ext> value. The sdf, fpf, csv, and tsv <Ext> values are | |
326 used for SD, FP, comma/semicolon, and tab delimited text files, | |
327 respectively.This option is ignored for multiple input files. | |
328 | |
329 --ValuesPrecision *number* | |
330 Precision of values for E-state indicies option. Default value: up | |
331 to *3* decimal places. Valid values: positive integers. | |
332 | |
333 -v, --VectorStringFormat *ValuesString | IDsAndValuesString | | |
334 IDsAndValuesPairsString | ValuesAndIDsString | ValuesAndIDsPairsString* | |
335 Format of fingerprints vector string data in output SD, FP or | |
336 CSV/TSV text file(s) specified by --output used for | |
337 *EStateIndicies*. Possible values: *ValuesString, | |
338 IDsAndValuesString, IDsAndValuesPairsString, ValuesAndIDsString, | |
339 ValuesAndIDsPairsString*. | |
340 | |
341 Default value during *ArbitrarySize* value of -e, | |
342 --EStateAtomTypesSetToUse option: *IDsAndValuesString*. Default | |
343 value during *FixedSize* value of -e, --EStateAtomTypesSetToUse | |
344 option: *ValuesString*. | |
345 | |
346 Examples: | |
347 | |
348 FingerprintsVector;EStateIndicies:ArbitrarySize;11;NumericalValues;IDs | |
349 AndValuesString;SaaCH SaasC SaasN SdO SdssC SsCH3 SsF SsOH SssCH2 SssN | |
350 H SsssCH;24.778 4.387 1.993 25.023 -1.435 3.975 14.006 29.759 -0.073 3 | |
351 .024 -2.270 | |
352 | |
353 -w, --WorkingDir *DirName* | |
354 Location of working directory. Default: current directory. | |
355 | |
356 EXAMPLES | |
357 To generate E-state fingerprints of arbitrary size in vector string | |
358 format and create a SampleESFP.csv file containing sequential compound | |
359 IDs along with fingerprints vector strings data, type: | |
360 | |
361 % EStateIndiciesFingerprints.pl -r SampleESFP -o Sample.sdf | |
362 | |
363 To generate E-state fingerprints of fixed size in vector string format | |
364 and create a SampleESFP.csv file containing sequential compound IDs | |
365 along with fingerprints vector strings data, type: | |
366 | |
367 % EStateIndiciesFingerprints.pl -e FixedSize -r SampleESFP | |
368 -o Sample.sdf | |
369 | |
370 To generate E-state fingerprints of fixed size in vector string with | |
371 IDsAndValues format and create a SampleESFP.csv file containing | |
372 sequential compound IDs along with fingerprints vector strings data, | |
373 type: | |
374 | |
375 % EStateIndiciesFingerprints.pl -e FixedSize -v IDsAndValuesString | |
376 -r SampleESFP -o Sample.sdf | |
377 | |
378 To generate E-state fingerprints of fixed size in vector string format | |
379 and create a SampleESFP.csv file containing compound ID from molecule | |
380 name line along with fingerprints vector strings data, type | |
381 | |
382 % EStateIndiciesFingerprints.pl -e FixedSize | |
383 --DataFieldsMode CompoundID --CompoundIDMode MolName | |
384 -r SampleESFP -o Sample.sdf | |
385 | |
386 To generate E-state fingerprints of fixed size in vector string format | |
387 and create a SampleESFP.csv file containing compound IDs using specified | |
388 data field along with fingerprints vector strings data, type: | |
389 | |
390 % EStateIndiciesFingerprints.pl -e FixedSize | |
391 --DataFieldsMode CompoundID --CompoundIDMode DataField --CompoundID | |
392 Mol_ID -r SampleESFP -o Sample.sdf | |
393 | |
394 To generate E-state fingerprints of fixed size in vector string format | |
395 and create a SampleESFP.csv file containing compound ID using | |
396 combination of molecule name line and an explicit compound prefix along | |
397 with fingerprints vector strings data, type: | |
398 | |
399 % EStateIndiciesFingerprints.pl -e FixedSize | |
400 --DataFieldsMode CompoundID --CompoundIDMode MolnameOrLabelPrefix | |
401 --CompoundID Cmpd --CompoundIDLabel MolID -r SampleESFP -o Sample.sdf | |
402 | |
403 To generate E-state fingerprints of fixed size in vector string format | |
404 and create a SampleESFP.csv file containing specific data fields columns | |
405 along with fingerprints vector strings data, type: | |
406 | |
407 % EStateIndiciesFingerprints.pl -e FixedSize | |
408 --DataFieldsMode Specify --DataFields Mol_ID -r SampleESFP | |
409 -o Sample.sdf | |
410 | |
411 To generate E-state fingerprints of fixed size in vector string format | |
412 and create a SampleESFP.csv file containing common data fields columns | |
413 along with fingerprints vector strings data, type: | |
414 | |
415 % EStateIndiciesFingerprints.pl -e FixedSize | |
416 --DataFieldsMode Common -r SampleESFP -o Sample.sdf | |
417 | |
418 To generate E-state fingerprints of fixed size in vector string format | |
419 and create SampleESFP.sdf, SampleESFP.fpf, and SampleESFP.csv files | |
420 containing all data fields columns in CSV file along with fingerprints | |
421 vector strings data, type: | |
422 | |
423 % EStateIndiciesFingerprints.pl -e FixedSize | |
424 --DataFieldsMode All --output all -r SampleESFP -o Sample.sdf | |
425 | |
426 AUTHOR | |
427 Manish Sud <msud@san.rr.com> | |
428 | |
429 SEE ALSO | |
430 InfoFingerprintsFiles.pl, SimilarityMatricesFingerprints.pl, | |
431 AtomNeighborhoodsFingerprints.pl, ExtendedConnectivityFingerprints.pl, | |
432 MACCSKeysFingeprints.pl, PathLengthFingerprints.pl, | |
433 TopologicalAtomPairsFingerprints.pl, | |
434 TopologicalAtomTorsionsFingerprints.pl, | |
435 TopologicalPharmacophoreAtomPairsFingerprints.pl, | |
436 TopologicalPharmacophoreAtomTripletsFingerprints.pl | |
437 | |
438 COPYRIGHT | |
439 Copyright (C) 2015 Manish Sud. All rights reserved. | |
440 | |
441 This file is part of MayaChemTools. | |
442 | |
443 MayaChemTools is free software; you can redistribute it and/or modify it | |
444 under the terms of the GNU Lesser General Public License as published by | |
445 the Free Software Foundation; either version 3 of the License, or (at | |
446 your option) any later version. | |
447 |