Mercurial > repos > deepakjadmin > mayatool3_test2
comparison docs/scripts/txt/TopologicalAtomTripletsFingerprints.txt @ 0:4816e4a8ae95 draft default tip
Uploaded
author | deepakjadmin |
---|---|
date | Wed, 20 Jan 2016 09:23:18 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:4816e4a8ae95 |
---|---|
1 NAME | |
2 TopologicalAtomTripletsFingerprints.pl - Generate topological atom | |
3 triplets fingerprints for SD files | |
4 | |
5 SYNOPSIS | |
6 TopologicalAtomTripletsFingerprints.pl SDFile(s)... | |
7 | |
8 TopologicalAtomTripletsFingerprints.pl [--AromaticityModel | |
9 *AromaticityModelType*] [-a, --AtomIdentifierType | |
10 *AtomicInvariantsAtomTypes*] [--AtomicInvariantsToUse | |
11 *"AtomicInvariant,AtomicInvariant..."*] [--FunctionalClassesToUse | |
12 *"FunctionalClass1,FunctionalClass2..."*] [--CompoundID *DataFieldName | |
13 or LabelPrefixString*] [--CompoundIDLabel *text*] [--CompoundIDMode] | |
14 [--DataFields *"FieldLabel1,FieldLabel2,..."*] [-d, --DataFieldsMode | |
15 *All | Common | Specify | CompoundID*] [-f, --Filter *Yes | No*] | |
16 [--FingerprintsLabel *text*] [-h, --help] [-k, --KeepLargestComponent | |
17 *Yes | No*] [--MinDistance *number*] [--MaxDistance *number*] | |
18 [--OutDelim *comma | tab | semicolon*] [--output *SD | FP | text | all*] | |
19 [-o, --overwrite] [-q, --quote *Yes | No*] [-r, --root *RootName*] [-u, | |
20 --UseTriangleInequality *Yes | No*] [-v, --VectorStringFormat | |
21 *ValuesString, IDsAndValuesString | IDsAndValuesPairsString | | |
22 ValuesAndIDsString | ValuesAndIDsPairsString*] [-w, --WorkingDir | |
23 dirname] SDFile(s)... | |
24 | |
25 DESCRIPTION | |
26 Generate topological atom triplets fingerprints for *SDFile(s)* and | |
27 create appropriate SD, FP or CSV/TSV text file(s) containing | |
28 fingerprints vector strings corresponding to molecular fingerprints. | |
29 | |
30 Multiple SDFile names are separated by spaces. The valid file extensions | |
31 are *.sdf* and *.sd*. All other file names are ignored. All the SD files | |
32 in a current directory can be specified either by **.sdf* or the current | |
33 directory name. | |
34 | |
35 The current release of MayaChemTools supports generation of topological | |
36 atom triplets fingerprints corresponding to following -a, | |
37 --AtomIdentifierTypes: | |
38 | |
39 AtomicInvariantsAtomTypes, DREIDINGAtomTypes, EStateAtomTypes, | |
40 FunctionalClassAtomTypes, MMFF94AtomTypes, SLogPAtomTypes, | |
41 SYBYLAtomTypes, TPSAAtomTypes, UFFAtomTypes | |
42 | |
43 Based on the values specified for -a, --AtomIdentifierType and | |
44 --AtomicInvariantsToUse, initial atom types are assigned to all | |
45 non-hydrogen atoms in a molecule. Using the distance matrix for the | |
46 molecule and initial atom types assigned to non-hydrogen atoms, all | |
47 unique atom pairs within --MinDistance and --MaxDistance are identified | |
48 and counted. An atom triplet identifier is generated for each unique | |
49 atom triplet; the format of the atom triplet identifier is: | |
50 | |
51 <ATx>-Dyz-<ATy>-Dxz-<ATz>-Dxy | |
52 | |
53 ATx, ATy, ATz: Atom types assigned to atom x, atom y, and atom z | |
54 Dxy: Distance between atom x and atom y | |
55 Dxz: Distance between atom x and atom z | |
56 Dyz: Distance between atom y and atom z | |
57 | |
58 where <AT1>-D23 <= <AT2>-D13 <= <AT3>-D12 | |
59 | |
60 The atom triplet identifiers for all unique atom triplets corresponding | |
61 to non-hydrogen atoms constitute topological atom triplets fingerprints | |
62 of the molecule. | |
63 | |
64 Example of *SD* file containing topological atom triplets fingerprints | |
65 string data: | |
66 | |
67 ... ... | |
68 ... ... | |
69 $$$$ | |
70 ... ... | |
71 ... ... | |
72 ... ... | |
73 41 44 0 0 0 0 0 0 0 0999 V2000 | |
74 -3.3652 1.4499 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 | |
75 ... ... | |
76 2 3 1 0 0 0 0 | |
77 ... ... | |
78 M END | |
79 > <CmpdID> | |
80 Cmpd1 | |
81 | |
82 > <TopologicalAtomTripletsFingerprints> | |
83 FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAtomTypes:Mi | |
84 nDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesString;C.X1.B | |
85 O1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D10-C | |
86 .X3.BO4-D9 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 C.X1.BO1.H3-D1...; | |
87 1 2 2 2 2 2 2 2 8 8 4 8 4 4 2 2 2 2 4 2 2 2 4 2 2 2 2 1 2 2 4 4 4 2 2 2 | |
88 4 4 4 8 4 4 2 4 4 4 2 4 4 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 8 8 ... | |
89 | |
90 $$$$ | |
91 ... ... | |
92 ... ... | |
93 | |
94 Example of *FP* file containing topological atom triplets fingerprints | |
95 string data: | |
96 | |
97 # | |
98 # Package = MayaChemTools 7.4 | |
99 # Release Date = Oct 21, 2010 | |
100 # | |
101 # TimeStamp = Fri Mar 11 15:24:01 2011 | |
102 # | |
103 # FingerprintsStringType = FingerprintsVector | |
104 # | |
105 # Description = TopologicalAtomTriplets:AtomicInvariantsAtomTypes:Mi... | |
106 # VectorStringFormat = IDsAndValuesString | |
107 # VectorValuesType = NumericalValues | |
108 # | |
109 Cmpd1 3096;C.X1.BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2...;1 2 2 2 2... | |
110 Cmpd2 1093;C.X1.BO1.H3-D1-C.X1.BO1.H3-D3-C.X2.BO2.H2-D4...;2 2 2 2 2... | |
111 ... ... | |
112 ... .. | |
113 | |
114 Example of CSV *Text* file containing topological atom triplets | |
115 fingerprints string data: | |
116 | |
117 "CompoundID","TopologicalAtomTripletsFingerprints" | |
118 "Cmpd1","FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAto | |
119 mTypes:MinDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesStri | |
120 ng;C.X1.BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 C.X1.BO1.H3-D1-C.X2.BO2 | |
121 .H2-D10-C.X3.BO4-D9 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 C.X1....; | |
122 1 2 2 2 2 2 2 2 8 8 4 8 4 4 2 2 2 2 4 2 2 2 4 2 2 2 2 1 2 2 4 4 4 2 2 2 | |
123 4 4 4 8 4 4 2 4 4 4 2 4 4 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 8 8 ... | |
124 ... ... | |
125 ... ... | |
126 | |
127 The current release of MayaChemTools generates the following types of | |
128 topological atom triplets fingerprints vector strings: | |
129 | |
130 FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAtomTypes:M | |
131 inDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesString;C.X1 | |
132 .BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D1 | |
133 0-C.X3.BO4-D9 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 C.X1.BO1.H3-D1 | |
134 -C.X2.BO2.H2-D4-C.X2.BO2.H2-D5 C.X1.BO1.H3-D1-C.X2.BO2.H2-D6-C.X3....; | |
135 1 2 2 2 2 2 2 2 8 8 4 8 4 4 2 2 2 2 4 2 2 2 4 2 2 2 2 1 2 2 4 4 4 2 2 | |
136 2 4 4 4 8 4 4 2 4 4 4 2 4 4 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 8... | |
137 | |
138 FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAtomTypes:M | |
139 inDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesPairsString | |
140 ;C.X1.BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 1 C.X1.BO1.H3-D1-C.X2.BO | |
141 2.H2-D10-C.X3.BO4-D9 2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 2 C.X | |
142 1.BO1.H3-D1-C.X2.BO2.H2-D4-C.X2.BO2.H2-D5 2 C.X1.BO1.H3-D1-C.X2.BO2.H2 | |
143 -D6-C.X3.BO3.H1-D5 2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D6-C.X3.BO3.H1-D7 2... | |
144 | |
145 FingerprintsVector;TopologicalAtomTriplets:DREIDINGAtomTypes:MinDistan | |
146 ce1:MaxDistance10;2377;NumericalValues;IDsAndValuesString;C_2-D1-C_2-D | |
147 9-C_3-D10 C_2-D1-C_2-D9-C_R-D10 C_2-D1-C_3-D1-C_3-D2 C_2-D1-C_3-D10-C_ | |
148 3-D9 C_2-D1-C_3-D2-C_3-D3 C_2-D1-C_3-D2-C_R-D3 C_2-D1-C_3-D3-C_3-D4 C_ | |
149 2-D1-C_3-D3-N_R-D4 C_2-D1-C_3-D3-O_3-D2 C_2-D1-C_3-D4-C_3-D5 C_2-D...; | |
150 1 1 1 2 1 1 3 1 1 2 2 1 1 1 1 1 1 1 1 2 1 3 4 5 1 1 6 4 2 2 3 1 1 1 2 | |
151 2 1 2 1 1 2 2 2 1 2 1 2 1 1 3 3 2 6 4 2 1 1 1 2 2 1 1 1 1 1 1 1 1 1... | |
152 | |
153 FingerprintsVector;TopologicalAtomTriplets:EStateAtomTypes:MinDistance | |
154 1:MaxDistance10;3298;NumericalValues;IDsAndValuesString;aaCH-D1-aaCH-D | |
155 1-aaCH-D2 aaCH-D1-aaCH-D1-aasC-D2 aaCH-D1-aaCH-D10-aaCH-D9 aaCH-D1-aaC | |
156 H-D10-aasC-D9 aaCH-D1-aaCH-D2-aaCH-D3 aaCH-D1-aaCH-D2-aasC-D1 aaCH-D1- | |
157 aaCH-D2-aasC-D3 aaCH-D1-aaCH-D3-aasC-D2 aaCH-D1-aaCH-D4-aasC-D5 aa...; | |
158 6 4 24 4 16 8 8 4 8 8 8 12 10 14 4 16 24 4 12 2 2 4 1 10 2 2 15 2 2 2 | |
159 2 2 2 14 4 2 2 2 2 1 2 10 2 2 4 1 2 4 8 3 3 3 4 6 4 2 2 3 3 1 1 1 2 1 | |
160 2 2 4 2 3 2 1 2 4 5 3 2 2 1 2 4 3 2 8 12 6 2 2 4 4 7 1 4 2 4 2 2 2 ... | |
161 | |
162 FingerprintsVector;TopologicalAtomTriplets:FunctionalClassAtomTypes:Mi | |
163 nDistance1:MaxDistance10;2182;NumericalValues;IDsAndValuesString;Ar-D1 | |
164 -Ar-D1-Ar-D2 Ar-D1-Ar-D1-Ar.HBA-D2 Ar-D1-Ar-D10-Ar-D9 Ar-D1-Ar-D10-Hal | |
165 -D9 Ar-D1-Ar-D2-Ar-D2 Ar-D1-Ar-D2-Ar-D3 Ar-D1-Ar-D2-Ar.HBA-D1 Ar-D1-Ar | |
166 -D2-Ar.HBA-D2 Ar-D1-Ar-D2-Ar.HBA-D3 Ar-D1-Ar-D2-HBD-D1 Ar-D1-Ar-D2...; | |
167 27 1 32 2 2 63 3 2 1 2 1 2 3 1 1 40 3 1 2 2 2 2 4 2 2 47 4 2 2 1 2 1 5 | |
168 2 2 51 4 3 1 3 1 9 1 1 50 3 3 4 1 9 50 2 2 3 3 5 45 1 1 1 2 1 2 2 3 3 | |
169 4 4 3 2 1 1 3 4 5 5 3 1 2 3 2 3 5 7 2 7 3 7 1 1 2 2 2 2 3 1 4 3 1 2... | |
170 | |
171 FingerprintsVector;TopologicalAtomTriplets:MMFF94AtomTypes:MinDistance | |
172 1:MaxDistance10;2966;NumericalValues;IDsAndValuesString;C5A-D1-C5A-D1- | |
173 N5-D2 C5A-D1-C5A-D2-C5B-D2 C5A-D1-C5A-D3-CB-D2 C5A-D1-C5A-D3-CR-D2 C5A | |
174 -D1-C5B-D1-C5B-D2 C5A-D1-C5B-D2-C=ON-D1 C5A-D1-C5B-D2-CB-D1 C5A-D1-C5B | |
175 -D3-C=ON-D2 C5A-D1-C5B-D3-CB-D2 C5A-D1-C=ON-D3-NC=O-D2 C5A-D1-C=ON-D3- | |
176 O=CN-D2 C5A-D1-C=ON-D4-NC=O-D3 C5A-D1-C=ON-D4-O=CN-D3 C5A-D1-CB-D1-... | |
177 | |
178 FingerprintsVector;TopologicalAtomTriplets:SLogPAtomTypes:MinDistance1 | |
179 :MaxDistance10;3710;NumericalValues;IDsAndValuesString;C1-D1-C1-D1-C11 | |
180 -D2 C1-D1-C1-D1-CS-D2 C1-D1-C1-D10-C5-D9 C1-D1-C1-D3-C10-D2 C1-D1-C1-D | |
181 3-C5-D2 C1-D1-C1-D3-CS-D2 C1-D1-C1-D3-CS-D4 C1-D1-C1-D4-C10-D5 C1-D1-C | |
182 1-D4-C11-D5 C1-D1-C1-D5-C10-D4 C1-D1-C1-D5-C5-D4 C1-D1-C1-D6-C11-D7 C1 | |
183 -D1-C1-D6-CS-D5 C1-D1-C1-D6-CS-D7 C1-D1-C1-D8-C11-D9 C1-D1-C1-D8-CS... | |
184 | |
185 FingerprintsVector;TopologicalAtomTriplets:SYBYLAtomTypes:MinDistance1 | |
186 :MaxDistance10;2332;NumericalValues;IDsAndValuesString;C.2-D1-C.2-D9-C | |
187 .3-D10 C.2-D1-C.2-D9-C.ar-D10 C.2-D1-C.3-D1-C.3-D2 C.2-D1-C.3-D10-C.3- | |
188 D9 C.2-D1-C.3-D2-C.3-D3 C.2-D1-C.3-D2-C.ar-D3 C.2-D1-C.3-D3-C.3-D4 C.2 | |
189 -D1-C.3-D3-N.ar-D4 C.2-D1-C.3-D3-O.3-D2 C.2-D1-C.3-D4-C.3-D5 C.2-D1-C. | |
190 3-D5-C.3-D6 C.2-D1-C.3-D5-O.3-D4 C.2-D1-C.3-D6-C.3-D7 C.2-D1-C.3-D7... | |
191 | |
192 FingerprintsVector;TopologicalAtomTriplets:TPSAAtomTypes:MinDistance1: | |
193 MaxDistance10;1007;NumericalValues;IDsAndValuesString;N21-D1-N7-D3-Non | |
194 e-D4 N21-D1-N7-D5-None-D4 N21-D1-None-D1-None-D2 N21-D1-None-D2-None-D | |
195 2 N21-D1-None-D2-None-D3 N21-D1-None-D3-None-D4 N21-D1-None-D4-None-D5 | |
196 N21-D1-None-D4-O3-D3 N21-D1-None-D4-O4-D3 N21-D1-None-D5-None-D6 N21- | |
197 D1-None-D6-None-D7 N21-D1-None-D6-O4-D5 N21-D1-None-D7-None-D8 N21-... | |
198 | |
199 FingerprintsVector;TopologicalAtomTriplets:UFFAtomTypes:MinDistance1:M | |
200 axDistance10;2377;NumericalValues;IDsAndValuesString;C_2-D1-C_2-D9-C_3 | |
201 -D10 C_2-D1-C_2-D9-C_R-D10 C_2-D1-C_3-D1-C_3-D2 C_2-D1-C_3-D10-C_3-D9 | |
202 C_2-D1-C_3-D2-C_3-D3 C_2-D1-C_3-D2-C_R-D3 C_2-D1-C_3-D3-C_3-D4 C_2-D1- | |
203 C_3-D3-N_R-D4 C_2-D1-C_3-D3-O_3-D2 C_2-D1-C_3-D4-C_3-D5 C_2-D1-C_3-D5- | |
204 C_3-D6 C_2-D1-C_3-D5-O_3-D4 C_2-D1-C_3-D6-C_3-D7 C_2-D1-C_3-D7-C_3-... | |
205 | |
206 OPTIONS | |
207 --AromaticityModel *MDLAromaticityModel | TriposAromaticityModel | | |
208 MMFFAromaticityModel | ChemAxonBasicAromaticityModel | | |
209 ChemAxonGeneralAromaticityModel | DaylightAromaticityModel | | |
210 MayaChemToolsAromaticityModel* | |
211 Specify aromaticity model to use during detection of aromaticity. | |
212 Possible values in the current release are: *MDLAromaticityModel, | |
213 TriposAromaticityModel, MMFFAromaticityModel, | |
214 ChemAxonBasicAromaticityModel, ChemAxonGeneralAromaticityModel, | |
215 DaylightAromaticityModel or MayaChemToolsAromaticityModel*. Default | |
216 value: *MayaChemToolsAromaticityModel*. | |
217 | |
218 The supported aromaticity model names along with model specific | |
219 control parameters are defined in AromaticityModelsData.csv, which | |
220 is distributed with the current release and is available under | |
221 lib/data directory. Molecule.pm module retrieves data from this file | |
222 during class instantiation and makes it available to method | |
223 DetectAromaticity for detecting aromaticity corresponding to a | |
224 specific model. | |
225 | |
226 -a, --AtomIdentifierType *AtomicInvariantsAtomTypes | DREIDINGAtomTypes | |
227 | EStateAtomTypes | FunctionalClassAtomTypes | MMFF94AtomTypes | | |
228 SLogPAtomTypes | SYBYLAtomTypes | TPSAAtomTypes | UFFAtomTypes* | |
229 Specify atom identifier type to use for assignment of initial atom | |
230 identifier to non-hydrogen atoms during calculation of topological | |
231 atom triplets fingerprints. Possible values in the current release | |
232 are: *AtomicInvariantsAtomTypes, DREIDINGAtomTypes, EStateAtomTypes, | |
233 FunctionalClassAtomTypes, MMFF94AtomTypes, SLogPAtomTypes, | |
234 SYBYLAtomTypes, TPSAAtomTypes, UFFAtomTypes*. Default value: | |
235 *AtomicInvariantsAtomTypes*. | |
236 | |
237 --AtomicInvariantsToUse *"AtomicInvariant,AtomicInvariant..."* | |
238 This value is used during *AtomicInvariantsAtomTypes* value of a, | |
239 --AtomIdentifierType option. It's a list of comma separated valid | |
240 atomic invariant atom types. | |
241 | |
242 Possible values for atomic invariants are: *AS, X, BO, LBO, SB, DB, | |
243 TB, H, Ar, RA, FC, MN, SM*. Default value: *AS,X,BO,H,FC*. | |
244 | |
245 The atomic invariants abbreviations correspond to: | |
246 | |
247 AS = Atom symbol corresponding to element symbol | |
248 | |
249 X<n> = Number of non-hydrogen atom neighbors or heavy atoms | |
250 BO<n> = Sum of bond orders to non-hydrogen atom neighbors or heavy atoms | |
251 LBO<n> = Largest bond order of non-hydrogen atom neighbors or heavy atoms | |
252 SB<n> = Number of single bonds to non-hydrogen atom neighbors or heavy atoms | |
253 DB<n> = Number of double bonds to non-hydrogen atom neighbors or heavy atoms | |
254 TB<n> = Number of triple bonds to non-hydrogen atom neighbors or heavy atoms | |
255 H<n> = Number of implicit and explicit hydrogens for atom | |
256 Ar = Aromatic annotation indicating whether atom is aromatic | |
257 RA = Ring atom annotation indicating whether atom is a ring | |
258 FC<+n/-n> = Formal charge assigned to atom | |
259 MN<n> = Mass number indicating isotope other than most abundant isotope | |
260 SM<n> = Spin multiplicity of atom. Possible values: 1 (singlet), 2 (doublet) or | |
261 3 (triplet) | |
262 | |
263 Atom type generated by AtomTypes::AtomicInvariantsAtomTypes class | |
264 corresponds to: | |
265 | |
266 AS.X<n>.BO<n>.LBO<n>.<SB><n>.<DB><n>.<TB><n>.H<n>.Ar.RA.FC<+n/-n>.MN<n>.SM<n> | |
267 | |
268 Except for AS which is a required atomic invariant in atom types, | |
269 all other atomic invariants are optional. Atom type specification | |
270 doesn't include atomic invariants with zero or undefined values. | |
271 | |
272 In addition to usage of abbreviations for specifying atomic | |
273 invariants, the following descriptive words are also allowed: | |
274 | |
275 X : NumOfNonHydrogenAtomNeighbors or NumOfHeavyAtomNeighbors | |
276 BO : SumOfBondOrdersToNonHydrogenAtoms or SumOfBondOrdersToHeavyAtoms | |
277 LBO : LargestBondOrderToNonHydrogenAtoms or LargestBondOrderToHeavyAtoms | |
278 SB : NumOfSingleBondsToNonHydrogenAtoms or NumOfSingleBondsToHeavyAtoms | |
279 DB : NumOfDoubleBondsToNonHydrogenAtoms or NumOfDoubleBondsToHeavyAtoms | |
280 TB : NumOfTripleBondsToNonHydrogenAtoms or NumOfTripleBondsToHeavyAtoms | |
281 H : NumOfImplicitAndExplicitHydrogens | |
282 Ar : Aromatic | |
283 RA : RingAtom | |
284 FC : FormalCharge | |
285 MN : MassNumber | |
286 SM : SpinMultiplicity | |
287 | |
288 *AtomTypes::AtomicInvariantsAtomTypes* module is used to assign | |
289 atomic invariant atom types. | |
290 | |
291 --FunctionalClassesToUse *"FunctionalClass1,FunctionalClass2..."* | |
292 This value is used during *FunctionalClassAtomTypes* value of a, | |
293 --AtomIdentifierType option. It's a list of comma separated valid | |
294 functional classes. | |
295 | |
296 Possible values for atom functional classes are: *Ar, CA, H, HBA, | |
297 HBD, Hal, NI, PI, RA*. Default value [ Ref 24 ]: | |
298 *HBD,HBA,PI,NI,Ar,Hal*. | |
299 | |
300 The functional class abbreviations correspond to: | |
301 | |
302 HBD: HydrogenBondDonor | |
303 HBA: HydrogenBondAcceptor | |
304 PI : PositivelyIonizable | |
305 NI : NegativelyIonizable | |
306 Ar : Aromatic | |
307 Hal : Halogen | |
308 H : Hydrophobic | |
309 RA : RingAtom | |
310 CA : ChainAtom | |
311 | |
312 Functional class atom type specification for an atom corresponds to: | |
313 | |
314 Ar.CA.H.HBA.HBD.Hal.NI.PI.RA | |
315 | |
316 *AtomTypes::FunctionalClassAtomTypes* module is used to assign | |
317 functional class atom types. It uses following definitions [ Ref | |
318 60-61, Ref 65-66 ]: | |
319 | |
320 HydrogenBondDonor: NH, NH2, OH | |
321 HydrogenBondAcceptor: N[!H], O | |
322 PositivelyIonizable: +, NH2 | |
323 NegativelyIonizable: -, C(=O)OH, S(=O)OH, P(=O)OH | |
324 | |
325 --CompoundID *DataFieldName or LabelPrefixString* | |
326 This value is --CompoundIDMode specific and indicates how compound | |
327 ID is generated. | |
328 | |
329 For *DataField* value of --CompoundIDMode option, it corresponds to | |
330 datafield label name whose value is used as compound ID; otherwise, | |
331 it's a prefix string used for generating compound IDs like | |
332 LabelPrefixString<Number>. Default value, *Cmpd*, generates compound | |
333 IDs which look like Cmpd<Number>. | |
334 | |
335 Examples for *DataField* value of --CompoundIDMode: | |
336 | |
337 MolID | |
338 ExtReg | |
339 | |
340 Examples for *LabelPrefix* or *MolNameOrLabelPrefix* value of | |
341 --CompoundIDMode: | |
342 | |
343 Compound | |
344 | |
345 The value specified above generates compound IDs which correspond to | |
346 Compound<Number> instead of default value of Cmpd<Number>. | |
347 | |
348 --CompoundIDLabel *text* | |
349 Specify compound ID column label for CSV/TSV text file(s) used | |
350 during *CompoundID* value of --DataFieldsMode option. Default value: | |
351 *CompoundID*. | |
352 | |
353 --CompoundIDMode *DataField | MolName | LabelPrefix | | |
354 MolNameOrLabelPrefix* | |
355 Specify how to generate compound IDs and write to FP or CSV/TSV text | |
356 file(s) along with generated fingerprints for *FP | text | all* | |
357 values of --output option: use a *SDFile(s)* datafield value; use | |
358 molname line from *SDFile(s)*; generate a sequential ID with | |
359 specific prefix; use combination of both MolName and LabelPrefix | |
360 with usage of LabelPrefix values for empty molname lines. | |
361 | |
362 Possible values: *DataField | MolName | LabelPrefix | | |
363 MolNameOrLabelPrefix*. Default value: *LabelPrefix*. | |
364 | |
365 For *MolNameAndLabelPrefix* value of --CompoundIDMode, molname line | |
366 in *SDFile(s)* takes precedence over sequential compound IDs | |
367 generated using *LabelPrefix* and only empty molname values are | |
368 replaced with sequential compound IDs. | |
369 | |
370 This is only used for *CompoundID* value of --DataFieldsMode option. | |
371 | |
372 --DataFields *"FieldLabel1,FieldLabel2,..."* | |
373 Comma delimited list of *SDFiles(s)* data fields to extract and | |
374 write to CSV/TSV text file(s) along with generated fingerprints for | |
375 *text | all* values of --output option. | |
376 | |
377 This is only used for *Specify* value of --DataFieldsMode option. | |
378 | |
379 Examples: | |
380 | |
381 Extreg | |
382 MolID,CompoundName | |
383 | |
384 -d, --DataFieldsMode *All | Common | Specify | CompoundID* | |
385 Specify how data fields in *SDFile(s)* are transferred to output | |
386 CSV/TSV text file(s) along with generated fingerprints for *text | | |
387 all* values of --output option: transfer all SD data field; transfer | |
388 SD data files common to all compounds; extract specified data | |
389 fields; generate a compound ID using molname line, a compound | |
390 prefix, or a combination of both. Possible values: *All | Common | | |
391 specify | CompoundID*. Default value: *CompoundID*. | |
392 | |
393 -f, --Filter *Yes | No* | |
394 Specify whether to check and filter compound data in SDFile(s). | |
395 Possible values: *Yes or No*. Default value: *Yes*. | |
396 | |
397 By default, compound data is checked before calculating fingerprints | |
398 and compounds containing atom data corresponding to non-element | |
399 symbols or no atom data are ignored. | |
400 | |
401 --FingerprintsLabel *text* | |
402 SD data label or text file column label to use for fingerprints | |
403 string in output SD or CSV/TSV text file(s) specified by --output. | |
404 Default value: *TopologicalAtomTripletsFingerprints*. | |
405 | |
406 -h, --help | |
407 Print this help message. | |
408 | |
409 -k, --KeepLargestComponent *Yes | No* | |
410 Generate fingerprints for only the largest component in molecule. | |
411 Possible values: *Yes or No*. Default value: *Yes*. | |
412 | |
413 For molecules containing multiple connected components, fingerprints | |
414 can be generated in two different ways: use all connected components | |
415 or just the largest connected component. By default, all atoms | |
416 except for the largest connected component are deleted before | |
417 generation of fingerprints. | |
418 | |
419 --MinDistance *number* | |
420 Minimum bond distance between atom triplets for generating | |
421 topological atom triplets. Default value: *1*. Valid values: | |
422 positive integers and less than --MaxDistance. | |
423 | |
424 --MaxDistance *number* | |
425 Maximum bond distance between atom triplets for generating | |
426 topological atom triplets. Default value: *10*. Valid values: | |
427 positive integers and greater than --MinDistance. | |
428 | |
429 --OutDelim *comma | tab | semicolon* | |
430 Delimiter for output CSV/TSV text file(s). Possible values: *comma, | |
431 tab, or semicolon* Default value: *comma* | |
432 | |
433 --output *SD | FP | text | all* | |
434 Type of output files to generate. Possible values: *SD, FP, text, or | |
435 all*. Default value: *text*. | |
436 | |
437 -o, --overwrite | |
438 Overwrite existing files. | |
439 | |
440 -q, --quote *Yes | No* | |
441 Put quote around column values in output CSV/TSV text file(s). | |
442 Possible values: *Yes or No*. Default value: *Yes*. | |
443 | |
444 -r, --root *RootName* | |
445 New file name is generated using the root: <Root>.<Ext>. Default for | |
446 new file names: <SDFileName><TopologicalAtomTripletsFP>.<Ext>. The | |
447 file type determines <Ext> value. The sdf, fpf, csv, and tsv <Ext> | |
448 values are used for SD, FP, comma/semicolon, and tab delimited text | |
449 files, respectively.This option is ignored for multiple input files. | |
450 | |
451 -u, --UseTriangleInequality *Yes | No* | |
452 Specify whether to imply triangle distance inequality test to | |
453 distances between atom pairs in atom triplets during generation of | |
454 atom triplets generation. Possible values: *Yes or No*. Default | |
455 value: *No*. | |
456 | |
457 Triangle distance inequality test implies that distance or binned | |
458 distance between any two atom pairs in an atom triplet must be less | |
459 than the sum of distances or binned distances between other two | |
460 atoms pairs and greater than the difference of their distances. | |
461 | |
462 For atom triplet ATx-Dyz-ATy-Dxz-ATz-Dxy to satisfy triangle inequality: | |
463 | |
464 Dyz > |Dxz - Dxy| and Dyz < Dxz + Dxy | |
465 Dxz > |Dyz - Dxy| and Dyz < Dyz + Dxy | |
466 Dxy > |Dyz - Dxz| and Dxy < Dyz + Dxz | |
467 | |
468 -v, --VectorStringFormat *IDsAndValuesString | IDsAndValuesPairsString | | |
469 ValuesAndIDsString | ValuesAndIDsPairsString* | |
470 Format of fingerprints vector string data in output SD, FP or | |
471 CSV/TSV text file(s) specified by --output option. Possible values: | |
472 *IDsAndValuesString | IDsAndValuesPairsString | ValuesAndIDsString | | |
473 ValuesAndIDsPairsString*. Default value: *IDsAndValuesString*. | |
474 | |
475 Examples: | |
476 | |
477 FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAtomTypes:M | |
478 inDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesString;C.X1 | |
479 .BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D1 | |
480 0-C.X3.BO4-D9 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 C.X1.BO1.H3-D1 | |
481 -C.X2.BO2.H2-D4-C.X2.BO2.H2-D5 C.X1.BO1.H3-D1-C.X2.BO2.H2-D6-C.X3....; | |
482 1 2 2 2 2 2 2 2 8 8 4 8 4 4 2 2 2 2 4 2 2 2 4 2 2 2 2 1 2 2 4 4 4 2 2 | |
483 2 4 4 4 8 4 4 2 4 4 4 2 4 4 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 8... | |
484 | |
485 FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAtomTypes:M | |
486 inDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesPairsString | |
487 ;C.X1.BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 1 C.X1.BO1.H3-D1-C.X2.BO | |
488 2.H2-D10-C.X3.BO4-D9 2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 2 C.X | |
489 1.BO1.H3-D1-C.X2.BO2.H2-D4-C.X2.BO2.H2-D5 2 C.X1.BO1.H3-D1-C.X2.BO2.H2 | |
490 -D6-C.X3.BO3.H1-D5 2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D6-C.X3.BO3.H1-D7 2... | |
491 | |
492 -w, --WorkingDir *DirName* | |
493 Location of working directory. Default value: current directory. | |
494 | |
495 EXAMPLES | |
496 To generate topological atom triplets fingerprints corresponding to bond | |
497 distances from 1 through 10 using atomic invariants atom types in | |
498 IDsAndValuesString format and create a SampleTATFP.csv file containing | |
499 sequential compound IDs along with fingerprints vector strings data, | |
500 type: | |
501 | |
502 % TopologicalAtomTripletsFingerprints.pl -r SampleTATFP -o Sample.sdf | |
503 | |
504 To generate topological atom triplets fingerprints corresponding to bond | |
505 distances from 1 through 10 using atomic invariants atom types in | |
506 IDsAndValuesString format and create SampleTATFP.sdf, SampleTATFP.fpf | |
507 and SampleTATFP.csv files containing sequential compound IDs in CSV file | |
508 along with fingerprints vector strings data, type: | |
509 | |
510 % TopologicalAtomTripletsFingerprints.pl --output all -r SampleTATFP | |
511 -o Sample.sdf | |
512 | |
513 To generate topological atom triplets fingerprints corresponding to bond | |
514 distances from 1 through 10 using atomic invariants atom types in | |
515 IDsAndValuesPairsString format and create a SampleTATFP.csv file | |
516 containing sequential compound IDs along with fingerprints vector | |
517 strings data, type: | |
518 | |
519 % TopologicalAtomTripletsFingerprints.pl --VectorStringFormat | |
520 IDsAndValuesPairsString -r SampleTATFP -o Sample.sdf | |
521 | |
522 To generate topological atom triplets fingerprints corresponding to bond | |
523 distances from 1 through 10 using DREIDING atom types in | |
524 IDsAndValuesString format and create a SampleTATFP.csv file containing | |
525 sequential compound IDs along with fingerprints vector strings data, | |
526 type: | |
527 | |
528 % TopologicalAtomTripletsFingerprints.pl -a DREIDINGAtomTypes | |
529 -r SampleTATFP -o Sample.sdf | |
530 | |
531 To generate topological atom triplets fingerprints corresponding to bond | |
532 distances from 1 through 10 using E-state atom types in | |
533 IDsAndValuesString format and create a SampleTATFP.csv file containing | |
534 sequential compound IDs along with fingerprints vector strings data, | |
535 type: | |
536 | |
537 % TopologicalAtomTripletsFingerprints.pl -a EStateAtomTypes | |
538 -r SampleTATFP -o Sample.sdf | |
539 | |
540 To generate topological atom triplets fingerprints corresponding to bond | |
541 distances from 1 through 10 using functional class atom types in | |
542 IDsAndValuesString format and create a SampleTATFP.csv file containing | |
543 sequential compound IDs along with fingerprints vector strings data, | |
544 type: | |
545 | |
546 % TopologicalAtomTripletsFingerprints.pl -a FunctionalClassAtomTypes | |
547 -r SampleTATFP -o Sample.sdf | |
548 | |
549 To generate topological atom triplets fingerprints corresponding to bond | |
550 distances from 1 through 10 using DREIDING atom types in | |
551 IDsAndValuesString format and create a SampleTATFP.csv file containing | |
552 sequential compound IDs along with fingerprints vector strings data, | |
553 type: | |
554 | |
555 % TopologicalAtomTripletsFingerprints.pl -a DREIDINGAtomTypes | |
556 -r SampleTATFP -o Sample.sdf | |
557 | |
558 To generate topological atom triplets fingerprints corresponding to bond | |
559 distances from 1 through 10 using MM94 atom types in IDsAndValuesString | |
560 format and create a SampleTATFP.csv file containing sequential compound | |
561 IDs along with fingerprints vector strings data, type: | |
562 | |
563 % TopologicalAtomTripletsFingerprints.pl -a MMFF94AtomTypes | |
564 -r SampleTATFP -o Sample.sdf | |
565 | |
566 To generate topological atom triplets fingerprints corresponding to bond | |
567 distances from 1 through 10 using SLogP atom types in IDsAndValuesString | |
568 format and create a SampleTATFP.csv file containing sequential compound | |
569 IDs along with fingerprints vector strings data, type: | |
570 | |
571 % TopologicalAtomTripletsFingerprints.pl -a SLogPAtomTypes | |
572 -r SampleTATFP -o Sample.sdf | |
573 | |
574 To generate topological atom triplets fingerprints corresponding to bond | |
575 distances from 1 through 10 using SYBYL atom types in IDsAndValuesString | |
576 format and create a SampleTATFP.csv file containing sequential compound | |
577 IDs along with fingerprints vector strings data, type: | |
578 | |
579 % TopologicalAtomTripletsFingerprints.pl -a SYBYLAtomTypes | |
580 -r SampleTATFP -o Sample.sdf | |
581 | |
582 To generate topological atom triplets fingerprints corresponding to bond | |
583 distances from 1 through 10 using TPSA atom types in IDsAndValuesString | |
584 format and create a SampleTATFP.csv file containing sequential compound | |
585 IDs along with fingerprints vector strings data, type: | |
586 | |
587 % TopologicalAtomTripletsFingerprints.pl -a TPSAAtomTypes | |
588 -r SampleTATFP -o Sample.sdf | |
589 | |
590 To generate topological atom triplets fingerprints corresponding to bond | |
591 distances from 1 through 10 using UFF atom types in IDsAndValuesString | |
592 format and create a SampleTATFP.csv file containing sequential compound | |
593 IDs along with fingerprints vector strings data, type: | |
594 | |
595 % TopologicalAtomTripletsFingerprints.pl -a UFFAtomTypes | |
596 -r SampleTATFP -o Sample.sdf | |
597 | |
598 To generate topological atom triplets fingerprints corresponding to bond | |
599 distances from 1 through 6 using atomic invariants atom types in | |
600 IDsAndValuesString format and create a SampleTATFP.csv file containing | |
601 sequential compound IDs along with fingerprints vector strings data, | |
602 type: | |
603 | |
604 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes | |
605 --MinDistance 1 --MaxDistance 6 -r SampleTATFP -o Sample.sdf | |
606 | |
607 To generate topological atom triplets fingerprints corresponding to bond | |
608 distances from 1 through 10 using only AS,X atomic invariants atom types | |
609 in IDsAndValuesString format and create a SampleTATFP.csv file | |
610 containing sequential compound IDs along with fingerprints vector | |
611 strings data, type: | |
612 | |
613 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes | |
614 --AtomicInvariantsToUse "AS,X" --MinDistance 1 --MaxDistance 6 | |
615 -r SampleTATFP -o Sample.sdf | |
616 | |
617 To generate topological atom triplets fingerprints corresponding to bond | |
618 distances from 1 through 10 using atomic invariants atom types in | |
619 IDsAndValuesString format and create a SampleTATFP.csv file containing | |
620 compound ID from molecule name line along with fingerprints vector | |
621 strings data, type: | |
622 | |
623 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes | |
624 --DataFieldsMode CompoundID -CompoundIDMode MolName | |
625 -r SampleTATFP -o Sample.sdf | |
626 | |
627 To generate topological atom triplets fingerprints corresponding to bond | |
628 distances from 1 through 10 using atomic invariants atom types in | |
629 IDsAndValuesString format and create a SampleTATFP.csv file containing | |
630 compound IDs using specified data field along with fingerprints vector | |
631 strings data, type: | |
632 | |
633 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes | |
634 --DataFieldsMode CompoundID -CompoundIDMode DataField --CompoundID | |
635 Mol_ID -r SampleTATFP -o Sample.sdf | |
636 | |
637 To generate topological atom triplets fingerprints corresponding to bond | |
638 distances from 1 through 10 using atomic invariants atom types in | |
639 IDsAndValuesString format and create a SampleTATFP.csv file containing | |
640 compound ID using combination of molecule name line and an explicit | |
641 compound prefix along with fingerprints vector strings data, type: | |
642 | |
643 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes | |
644 --DataFieldsMode CompoundID -CompoundIDMode MolnameOrLabelPrefix | |
645 --CompoundID Cmpd --CompoundIDLabel MolID -r SampleTATFP -o Sample.sdf | |
646 | |
647 To generate topological atom triplets fingerprints corresponding to bond | |
648 distances from 1 through 10 using atomic invariants atom types in | |
649 IDsAndValuesString format and create a SampleTATFP.csv file containing | |
650 specific data fields columns along with fingerprints vector strings | |
651 data, type: | |
652 | |
653 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes | |
654 --DataFieldsMode Specify --DataFields Mol_ID -r SampleTATFP | |
655 -o Sample.sdf | |
656 | |
657 To generate topological atom triplets fingerprints corresponding to bond | |
658 distances from 1 through 10 using atomic invariants atom types in | |
659 IDsAndValuesString format and create a SampleTATFP.csv file containing | |
660 common data fields columns along with fingerprints vector strings data, | |
661 type: | |
662 | |
663 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes | |
664 --DataFieldsMode Common -r SampleTATFP -o Sample.sdf | |
665 | |
666 To generate topological atom triplets fingerprints corresponding to bond | |
667 distances from 1 through 10 using atomic invariants atom types in | |
668 IDsAndValuesString format and create SampleTATFP.sdf, SampleTATFP.fpf | |
669 and SampleTATFP.csv files containing all data fields columns in CSV file | |
670 along with fingerprints data, type: | |
671 | |
672 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes | |
673 --DataFieldsMode All --output all -r SampleTATFP | |
674 -o Sample.sdf | |
675 | |
676 AUTHOR | |
677 Manish Sud <msud@san.rr.com> | |
678 | |
679 SEE ALSO | |
680 InfoFingerprintsFiles.pl, SimilarityMatricesFingerprints.pl, | |
681 AtomNeighborhoodsFingerprints.pl, ExtendedConnectivityFingerprints.pl, | |
682 MACCSKeysFingerprints.pl, PathLengthFingerprints.pl, | |
683 TopologicalAtomTorsionsFingerprints.pl, | |
684 TopologicalPharmacophoreAtomPairsFingerprints.pl, | |
685 TopologicalPharmacophoreAtomTripletsFingerprints.pl | |
686 | |
687 COPYRIGHT | |
688 Copyright (C) 2015 Manish Sud. All rights reserved. | |
689 | |
690 This file is part of MayaChemTools. | |
691 | |
692 MayaChemTools is free software; you can redistribute it and/or modify it | |
693 under the terms of the GNU Lesser General Public License as published by | |
694 the Free Software Foundation; either version 3 of the License, or (at | |
695 your option) any later version. | |
696 |