0
|
1 NAME
|
|
2 TopologicalAtomTripletsFingerprints.pl - Generate topological atom
|
|
3 triplets fingerprints for SD files
|
|
4
|
|
5 SYNOPSIS
|
|
6 TopologicalAtomTripletsFingerprints.pl SDFile(s)...
|
|
7
|
|
8 TopologicalAtomTripletsFingerprints.pl [--AromaticityModel
|
|
9 *AromaticityModelType*] [-a, --AtomIdentifierType
|
|
10 *AtomicInvariantsAtomTypes*] [--AtomicInvariantsToUse
|
|
11 *"AtomicInvariant,AtomicInvariant..."*] [--FunctionalClassesToUse
|
|
12 *"FunctionalClass1,FunctionalClass2..."*] [--CompoundID *DataFieldName
|
|
13 or LabelPrefixString*] [--CompoundIDLabel *text*] [--CompoundIDMode]
|
|
14 [--DataFields *"FieldLabel1,FieldLabel2,..."*] [-d, --DataFieldsMode
|
|
15 *All | Common | Specify | CompoundID*] [-f, --Filter *Yes | No*]
|
|
16 [--FingerprintsLabel *text*] [-h, --help] [-k, --KeepLargestComponent
|
|
17 *Yes | No*] [--MinDistance *number*] [--MaxDistance *number*]
|
|
18 [--OutDelim *comma | tab | semicolon*] [--output *SD | FP | text | all*]
|
|
19 [-o, --overwrite] [-q, --quote *Yes | No*] [-r, --root *RootName*] [-u,
|
|
20 --UseTriangleInequality *Yes | No*] [-v, --VectorStringFormat
|
|
21 *ValuesString, IDsAndValuesString | IDsAndValuesPairsString |
|
|
22 ValuesAndIDsString | ValuesAndIDsPairsString*] [-w, --WorkingDir
|
|
23 dirname] SDFile(s)...
|
|
24
|
|
25 DESCRIPTION
|
|
26 Generate topological atom triplets fingerprints for *SDFile(s)* and
|
|
27 create appropriate SD, FP or CSV/TSV text file(s) containing
|
|
28 fingerprints vector strings corresponding to molecular fingerprints.
|
|
29
|
|
30 Multiple SDFile names are separated by spaces. The valid file extensions
|
|
31 are *.sdf* and *.sd*. All other file names are ignored. All the SD files
|
|
32 in a current directory can be specified either by **.sdf* or the current
|
|
33 directory name.
|
|
34
|
|
35 The current release of MayaChemTools supports generation of topological
|
|
36 atom triplets fingerprints corresponding to following -a,
|
|
37 --AtomIdentifierTypes:
|
|
38
|
|
39 AtomicInvariantsAtomTypes, DREIDINGAtomTypes, EStateAtomTypes,
|
|
40 FunctionalClassAtomTypes, MMFF94AtomTypes, SLogPAtomTypes,
|
|
41 SYBYLAtomTypes, TPSAAtomTypes, UFFAtomTypes
|
|
42
|
|
43 Based on the values specified for -a, --AtomIdentifierType and
|
|
44 --AtomicInvariantsToUse, initial atom types are assigned to all
|
|
45 non-hydrogen atoms in a molecule. Using the distance matrix for the
|
|
46 molecule and initial atom types assigned to non-hydrogen atoms, all
|
|
47 unique atom pairs within --MinDistance and --MaxDistance are identified
|
|
48 and counted. An atom triplet identifier is generated for each unique
|
|
49 atom triplet; the format of the atom triplet identifier is:
|
|
50
|
|
51 <ATx>-Dyz-<ATy>-Dxz-<ATz>-Dxy
|
|
52
|
|
53 ATx, ATy, ATz: Atom types assigned to atom x, atom y, and atom z
|
|
54 Dxy: Distance between atom x and atom y
|
|
55 Dxz: Distance between atom x and atom z
|
|
56 Dyz: Distance between atom y and atom z
|
|
57
|
|
58 where <AT1>-D23 <= <AT2>-D13 <= <AT3>-D12
|
|
59
|
|
60 The atom triplet identifiers for all unique atom triplets corresponding
|
|
61 to non-hydrogen atoms constitute topological atom triplets fingerprints
|
|
62 of the molecule.
|
|
63
|
|
64 Example of *SD* file containing topological atom triplets fingerprints
|
|
65 string data:
|
|
66
|
|
67 ... ...
|
|
68 ... ...
|
|
69 $$$$
|
|
70 ... ...
|
|
71 ... ...
|
|
72 ... ...
|
|
73 41 44 0 0 0 0 0 0 0 0999 V2000
|
|
74 -3.3652 1.4499 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
|
|
75 ... ...
|
|
76 2 3 1 0 0 0 0
|
|
77 ... ...
|
|
78 M END
|
|
79 > <CmpdID>
|
|
80 Cmpd1
|
|
81
|
|
82 > <TopologicalAtomTripletsFingerprints>
|
|
83 FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAtomTypes:Mi
|
|
84 nDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesString;C.X1.B
|
|
85 O1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D10-C
|
|
86 .X3.BO4-D9 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 C.X1.BO1.H3-D1...;
|
|
87 1 2 2 2 2 2 2 2 8 8 4 8 4 4 2 2 2 2 4 2 2 2 4 2 2 2 2 1 2 2 4 4 4 2 2 2
|
|
88 4 4 4 8 4 4 2 4 4 4 2 4 4 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 8 8 ...
|
|
89
|
|
90 $$$$
|
|
91 ... ...
|
|
92 ... ...
|
|
93
|
|
94 Example of *FP* file containing topological atom triplets fingerprints
|
|
95 string data:
|
|
96
|
|
97 #
|
|
98 # Package = MayaChemTools 7.4
|
|
99 # Release Date = Oct 21, 2010
|
|
100 #
|
|
101 # TimeStamp = Fri Mar 11 15:24:01 2011
|
|
102 #
|
|
103 # FingerprintsStringType = FingerprintsVector
|
|
104 #
|
|
105 # Description = TopologicalAtomTriplets:AtomicInvariantsAtomTypes:Mi...
|
|
106 # VectorStringFormat = IDsAndValuesString
|
|
107 # VectorValuesType = NumericalValues
|
|
108 #
|
|
109 Cmpd1 3096;C.X1.BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2...;1 2 2 2 2...
|
|
110 Cmpd2 1093;C.X1.BO1.H3-D1-C.X1.BO1.H3-D3-C.X2.BO2.H2-D4...;2 2 2 2 2...
|
|
111 ... ...
|
|
112 ... ..
|
|
113
|
|
114 Example of CSV *Text* file containing topological atom triplets
|
|
115 fingerprints string data:
|
|
116
|
|
117 "CompoundID","TopologicalAtomTripletsFingerprints"
|
|
118 "Cmpd1","FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAto
|
|
119 mTypes:MinDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesStri
|
|
120 ng;C.X1.BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 C.X1.BO1.H3-D1-C.X2.BO2
|
|
121 .H2-D10-C.X3.BO4-D9 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 C.X1....;
|
|
122 1 2 2 2 2 2 2 2 8 8 4 8 4 4 2 2 2 2 4 2 2 2 4 2 2 2 2 1 2 2 4 4 4 2 2 2
|
|
123 4 4 4 8 4 4 2 4 4 4 2 4 4 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 8 8 ...
|
|
124 ... ...
|
|
125 ... ...
|
|
126
|
|
127 The current release of MayaChemTools generates the following types of
|
|
128 topological atom triplets fingerprints vector strings:
|
|
129
|
|
130 FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAtomTypes:M
|
|
131 inDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesString;C.X1
|
|
132 .BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D1
|
|
133 0-C.X3.BO4-D9 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 C.X1.BO1.H3-D1
|
|
134 -C.X2.BO2.H2-D4-C.X2.BO2.H2-D5 C.X1.BO1.H3-D1-C.X2.BO2.H2-D6-C.X3....;
|
|
135 1 2 2 2 2 2 2 2 8 8 4 8 4 4 2 2 2 2 4 2 2 2 4 2 2 2 2 1 2 2 4 4 4 2 2
|
|
136 2 4 4 4 8 4 4 2 4 4 4 2 4 4 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 8...
|
|
137
|
|
138 FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAtomTypes:M
|
|
139 inDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesPairsString
|
|
140 ;C.X1.BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 1 C.X1.BO1.H3-D1-C.X2.BO
|
|
141 2.H2-D10-C.X3.BO4-D9 2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 2 C.X
|
|
142 1.BO1.H3-D1-C.X2.BO2.H2-D4-C.X2.BO2.H2-D5 2 C.X1.BO1.H3-D1-C.X2.BO2.H2
|
|
143 -D6-C.X3.BO3.H1-D5 2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D6-C.X3.BO3.H1-D7 2...
|
|
144
|
|
145 FingerprintsVector;TopologicalAtomTriplets:DREIDINGAtomTypes:MinDistan
|
|
146 ce1:MaxDistance10;2377;NumericalValues;IDsAndValuesString;C_2-D1-C_2-D
|
|
147 9-C_3-D10 C_2-D1-C_2-D9-C_R-D10 C_2-D1-C_3-D1-C_3-D2 C_2-D1-C_3-D10-C_
|
|
148 3-D9 C_2-D1-C_3-D2-C_3-D3 C_2-D1-C_3-D2-C_R-D3 C_2-D1-C_3-D3-C_3-D4 C_
|
|
149 2-D1-C_3-D3-N_R-D4 C_2-D1-C_3-D3-O_3-D2 C_2-D1-C_3-D4-C_3-D5 C_2-D...;
|
|
150 1 1 1 2 1 1 3 1 1 2 2 1 1 1 1 1 1 1 1 2 1 3 4 5 1 1 6 4 2 2 3 1 1 1 2
|
|
151 2 1 2 1 1 2 2 2 1 2 1 2 1 1 3 3 2 6 4 2 1 1 1 2 2 1 1 1 1 1 1 1 1 1...
|
|
152
|
|
153 FingerprintsVector;TopologicalAtomTriplets:EStateAtomTypes:MinDistance
|
|
154 1:MaxDistance10;3298;NumericalValues;IDsAndValuesString;aaCH-D1-aaCH-D
|
|
155 1-aaCH-D2 aaCH-D1-aaCH-D1-aasC-D2 aaCH-D1-aaCH-D10-aaCH-D9 aaCH-D1-aaC
|
|
156 H-D10-aasC-D9 aaCH-D1-aaCH-D2-aaCH-D3 aaCH-D1-aaCH-D2-aasC-D1 aaCH-D1-
|
|
157 aaCH-D2-aasC-D3 aaCH-D1-aaCH-D3-aasC-D2 aaCH-D1-aaCH-D4-aasC-D5 aa...;
|
|
158 6 4 24 4 16 8 8 4 8 8 8 12 10 14 4 16 24 4 12 2 2 4 1 10 2 2 15 2 2 2
|
|
159 2 2 2 14 4 2 2 2 2 1 2 10 2 2 4 1 2 4 8 3 3 3 4 6 4 2 2 3 3 1 1 1 2 1
|
|
160 2 2 4 2 3 2 1 2 4 5 3 2 2 1 2 4 3 2 8 12 6 2 2 4 4 7 1 4 2 4 2 2 2 ...
|
|
161
|
|
162 FingerprintsVector;TopologicalAtomTriplets:FunctionalClassAtomTypes:Mi
|
|
163 nDistance1:MaxDistance10;2182;NumericalValues;IDsAndValuesString;Ar-D1
|
|
164 -Ar-D1-Ar-D2 Ar-D1-Ar-D1-Ar.HBA-D2 Ar-D1-Ar-D10-Ar-D9 Ar-D1-Ar-D10-Hal
|
|
165 -D9 Ar-D1-Ar-D2-Ar-D2 Ar-D1-Ar-D2-Ar-D3 Ar-D1-Ar-D2-Ar.HBA-D1 Ar-D1-Ar
|
|
166 -D2-Ar.HBA-D2 Ar-D1-Ar-D2-Ar.HBA-D3 Ar-D1-Ar-D2-HBD-D1 Ar-D1-Ar-D2...;
|
|
167 27 1 32 2 2 63 3 2 1 2 1 2 3 1 1 40 3 1 2 2 2 2 4 2 2 47 4 2 2 1 2 1 5
|
|
168 2 2 51 4 3 1 3 1 9 1 1 50 3 3 4 1 9 50 2 2 3 3 5 45 1 1 1 2 1 2 2 3 3
|
|
169 4 4 3 2 1 1 3 4 5 5 3 1 2 3 2 3 5 7 2 7 3 7 1 1 2 2 2 2 3 1 4 3 1 2...
|
|
170
|
|
171 FingerprintsVector;TopologicalAtomTriplets:MMFF94AtomTypes:MinDistance
|
|
172 1:MaxDistance10;2966;NumericalValues;IDsAndValuesString;C5A-D1-C5A-D1-
|
|
173 N5-D2 C5A-D1-C5A-D2-C5B-D2 C5A-D1-C5A-D3-CB-D2 C5A-D1-C5A-D3-CR-D2 C5A
|
|
174 -D1-C5B-D1-C5B-D2 C5A-D1-C5B-D2-C=ON-D1 C5A-D1-C5B-D2-CB-D1 C5A-D1-C5B
|
|
175 -D3-C=ON-D2 C5A-D1-C5B-D3-CB-D2 C5A-D1-C=ON-D3-NC=O-D2 C5A-D1-C=ON-D3-
|
|
176 O=CN-D2 C5A-D1-C=ON-D4-NC=O-D3 C5A-D1-C=ON-D4-O=CN-D3 C5A-D1-CB-D1-...
|
|
177
|
|
178 FingerprintsVector;TopologicalAtomTriplets:SLogPAtomTypes:MinDistance1
|
|
179 :MaxDistance10;3710;NumericalValues;IDsAndValuesString;C1-D1-C1-D1-C11
|
|
180 -D2 C1-D1-C1-D1-CS-D2 C1-D1-C1-D10-C5-D9 C1-D1-C1-D3-C10-D2 C1-D1-C1-D
|
|
181 3-C5-D2 C1-D1-C1-D3-CS-D2 C1-D1-C1-D3-CS-D4 C1-D1-C1-D4-C10-D5 C1-D1-C
|
|
182 1-D4-C11-D5 C1-D1-C1-D5-C10-D4 C1-D1-C1-D5-C5-D4 C1-D1-C1-D6-C11-D7 C1
|
|
183 -D1-C1-D6-CS-D5 C1-D1-C1-D6-CS-D7 C1-D1-C1-D8-C11-D9 C1-D1-C1-D8-CS...
|
|
184
|
|
185 FingerprintsVector;TopologicalAtomTriplets:SYBYLAtomTypes:MinDistance1
|
|
186 :MaxDistance10;2332;NumericalValues;IDsAndValuesString;C.2-D1-C.2-D9-C
|
|
187 .3-D10 C.2-D1-C.2-D9-C.ar-D10 C.2-D1-C.3-D1-C.3-D2 C.2-D1-C.3-D10-C.3-
|
|
188 D9 C.2-D1-C.3-D2-C.3-D3 C.2-D1-C.3-D2-C.ar-D3 C.2-D1-C.3-D3-C.3-D4 C.2
|
|
189 -D1-C.3-D3-N.ar-D4 C.2-D1-C.3-D3-O.3-D2 C.2-D1-C.3-D4-C.3-D5 C.2-D1-C.
|
|
190 3-D5-C.3-D6 C.2-D1-C.3-D5-O.3-D4 C.2-D1-C.3-D6-C.3-D7 C.2-D1-C.3-D7...
|
|
191
|
|
192 FingerprintsVector;TopologicalAtomTriplets:TPSAAtomTypes:MinDistance1:
|
|
193 MaxDistance10;1007;NumericalValues;IDsAndValuesString;N21-D1-N7-D3-Non
|
|
194 e-D4 N21-D1-N7-D5-None-D4 N21-D1-None-D1-None-D2 N21-D1-None-D2-None-D
|
|
195 2 N21-D1-None-D2-None-D3 N21-D1-None-D3-None-D4 N21-D1-None-D4-None-D5
|
|
196 N21-D1-None-D4-O3-D3 N21-D1-None-D4-O4-D3 N21-D1-None-D5-None-D6 N21-
|
|
197 D1-None-D6-None-D7 N21-D1-None-D6-O4-D5 N21-D1-None-D7-None-D8 N21-...
|
|
198
|
|
199 FingerprintsVector;TopologicalAtomTriplets:UFFAtomTypes:MinDistance1:M
|
|
200 axDistance10;2377;NumericalValues;IDsAndValuesString;C_2-D1-C_2-D9-C_3
|
|
201 -D10 C_2-D1-C_2-D9-C_R-D10 C_2-D1-C_3-D1-C_3-D2 C_2-D1-C_3-D10-C_3-D9
|
|
202 C_2-D1-C_3-D2-C_3-D3 C_2-D1-C_3-D2-C_R-D3 C_2-D1-C_3-D3-C_3-D4 C_2-D1-
|
|
203 C_3-D3-N_R-D4 C_2-D1-C_3-D3-O_3-D2 C_2-D1-C_3-D4-C_3-D5 C_2-D1-C_3-D5-
|
|
204 C_3-D6 C_2-D1-C_3-D5-O_3-D4 C_2-D1-C_3-D6-C_3-D7 C_2-D1-C_3-D7-C_3-...
|
|
205
|
|
206 OPTIONS
|
|
207 --AromaticityModel *MDLAromaticityModel | TriposAromaticityModel |
|
|
208 MMFFAromaticityModel | ChemAxonBasicAromaticityModel |
|
|
209 ChemAxonGeneralAromaticityModel | DaylightAromaticityModel |
|
|
210 MayaChemToolsAromaticityModel*
|
|
211 Specify aromaticity model to use during detection of aromaticity.
|
|
212 Possible values in the current release are: *MDLAromaticityModel,
|
|
213 TriposAromaticityModel, MMFFAromaticityModel,
|
|
214 ChemAxonBasicAromaticityModel, ChemAxonGeneralAromaticityModel,
|
|
215 DaylightAromaticityModel or MayaChemToolsAromaticityModel*. Default
|
|
216 value: *MayaChemToolsAromaticityModel*.
|
|
217
|
|
218 The supported aromaticity model names along with model specific
|
|
219 control parameters are defined in AromaticityModelsData.csv, which
|
|
220 is distributed with the current release and is available under
|
|
221 lib/data directory. Molecule.pm module retrieves data from this file
|
|
222 during class instantiation and makes it available to method
|
|
223 DetectAromaticity for detecting aromaticity corresponding to a
|
|
224 specific model.
|
|
225
|
|
226 -a, --AtomIdentifierType *AtomicInvariantsAtomTypes | DREIDINGAtomTypes
|
|
227 | EStateAtomTypes | FunctionalClassAtomTypes | MMFF94AtomTypes |
|
|
228 SLogPAtomTypes | SYBYLAtomTypes | TPSAAtomTypes | UFFAtomTypes*
|
|
229 Specify atom identifier type to use for assignment of initial atom
|
|
230 identifier to non-hydrogen atoms during calculation of topological
|
|
231 atom triplets fingerprints. Possible values in the current release
|
|
232 are: *AtomicInvariantsAtomTypes, DREIDINGAtomTypes, EStateAtomTypes,
|
|
233 FunctionalClassAtomTypes, MMFF94AtomTypes, SLogPAtomTypes,
|
|
234 SYBYLAtomTypes, TPSAAtomTypes, UFFAtomTypes*. Default value:
|
|
235 *AtomicInvariantsAtomTypes*.
|
|
236
|
|
237 --AtomicInvariantsToUse *"AtomicInvariant,AtomicInvariant..."*
|
|
238 This value is used during *AtomicInvariantsAtomTypes* value of a,
|
|
239 --AtomIdentifierType option. It's a list of comma separated valid
|
|
240 atomic invariant atom types.
|
|
241
|
|
242 Possible values for atomic invariants are: *AS, X, BO, LBO, SB, DB,
|
|
243 TB, H, Ar, RA, FC, MN, SM*. Default value: *AS,X,BO,H,FC*.
|
|
244
|
|
245 The atomic invariants abbreviations correspond to:
|
|
246
|
|
247 AS = Atom symbol corresponding to element symbol
|
|
248
|
|
249 X<n> = Number of non-hydrogen atom neighbors or heavy atoms
|
|
250 BO<n> = Sum of bond orders to non-hydrogen atom neighbors or heavy atoms
|
|
251 LBO<n> = Largest bond order of non-hydrogen atom neighbors or heavy atoms
|
|
252 SB<n> = Number of single bonds to non-hydrogen atom neighbors or heavy atoms
|
|
253 DB<n> = Number of double bonds to non-hydrogen atom neighbors or heavy atoms
|
|
254 TB<n> = Number of triple bonds to non-hydrogen atom neighbors or heavy atoms
|
|
255 H<n> = Number of implicit and explicit hydrogens for atom
|
|
256 Ar = Aromatic annotation indicating whether atom is aromatic
|
|
257 RA = Ring atom annotation indicating whether atom is a ring
|
|
258 FC<+n/-n> = Formal charge assigned to atom
|
|
259 MN<n> = Mass number indicating isotope other than most abundant isotope
|
|
260 SM<n> = Spin multiplicity of atom. Possible values: 1 (singlet), 2 (doublet) or
|
|
261 3 (triplet)
|
|
262
|
|
263 Atom type generated by AtomTypes::AtomicInvariantsAtomTypes class
|
|
264 corresponds to:
|
|
265
|
|
266 AS.X<n>.BO<n>.LBO<n>.<SB><n>.<DB><n>.<TB><n>.H<n>.Ar.RA.FC<+n/-n>.MN<n>.SM<n>
|
|
267
|
|
268 Except for AS which is a required atomic invariant in atom types,
|
|
269 all other atomic invariants are optional. Atom type specification
|
|
270 doesn't include atomic invariants with zero or undefined values.
|
|
271
|
|
272 In addition to usage of abbreviations for specifying atomic
|
|
273 invariants, the following descriptive words are also allowed:
|
|
274
|
|
275 X : NumOfNonHydrogenAtomNeighbors or NumOfHeavyAtomNeighbors
|
|
276 BO : SumOfBondOrdersToNonHydrogenAtoms or SumOfBondOrdersToHeavyAtoms
|
|
277 LBO : LargestBondOrderToNonHydrogenAtoms or LargestBondOrderToHeavyAtoms
|
|
278 SB : NumOfSingleBondsToNonHydrogenAtoms or NumOfSingleBondsToHeavyAtoms
|
|
279 DB : NumOfDoubleBondsToNonHydrogenAtoms or NumOfDoubleBondsToHeavyAtoms
|
|
280 TB : NumOfTripleBondsToNonHydrogenAtoms or NumOfTripleBondsToHeavyAtoms
|
|
281 H : NumOfImplicitAndExplicitHydrogens
|
|
282 Ar : Aromatic
|
|
283 RA : RingAtom
|
|
284 FC : FormalCharge
|
|
285 MN : MassNumber
|
|
286 SM : SpinMultiplicity
|
|
287
|
|
288 *AtomTypes::AtomicInvariantsAtomTypes* module is used to assign
|
|
289 atomic invariant atom types.
|
|
290
|
|
291 --FunctionalClassesToUse *"FunctionalClass1,FunctionalClass2..."*
|
|
292 This value is used during *FunctionalClassAtomTypes* value of a,
|
|
293 --AtomIdentifierType option. It's a list of comma separated valid
|
|
294 functional classes.
|
|
295
|
|
296 Possible values for atom functional classes are: *Ar, CA, H, HBA,
|
|
297 HBD, Hal, NI, PI, RA*. Default value [ Ref 24 ]:
|
|
298 *HBD,HBA,PI,NI,Ar,Hal*.
|
|
299
|
|
300 The functional class abbreviations correspond to:
|
|
301
|
|
302 HBD: HydrogenBondDonor
|
|
303 HBA: HydrogenBondAcceptor
|
|
304 PI : PositivelyIonizable
|
|
305 NI : NegativelyIonizable
|
|
306 Ar : Aromatic
|
|
307 Hal : Halogen
|
|
308 H : Hydrophobic
|
|
309 RA : RingAtom
|
|
310 CA : ChainAtom
|
|
311
|
|
312 Functional class atom type specification for an atom corresponds to:
|
|
313
|
|
314 Ar.CA.H.HBA.HBD.Hal.NI.PI.RA
|
|
315
|
|
316 *AtomTypes::FunctionalClassAtomTypes* module is used to assign
|
|
317 functional class atom types. It uses following definitions [ Ref
|
|
318 60-61, Ref 65-66 ]:
|
|
319
|
|
320 HydrogenBondDonor: NH, NH2, OH
|
|
321 HydrogenBondAcceptor: N[!H], O
|
|
322 PositivelyIonizable: +, NH2
|
|
323 NegativelyIonizable: -, C(=O)OH, S(=O)OH, P(=O)OH
|
|
324
|
|
325 --CompoundID *DataFieldName or LabelPrefixString*
|
|
326 This value is --CompoundIDMode specific and indicates how compound
|
|
327 ID is generated.
|
|
328
|
|
329 For *DataField* value of --CompoundIDMode option, it corresponds to
|
|
330 datafield label name whose value is used as compound ID; otherwise,
|
|
331 it's a prefix string used for generating compound IDs like
|
|
332 LabelPrefixString<Number>. Default value, *Cmpd*, generates compound
|
|
333 IDs which look like Cmpd<Number>.
|
|
334
|
|
335 Examples for *DataField* value of --CompoundIDMode:
|
|
336
|
|
337 MolID
|
|
338 ExtReg
|
|
339
|
|
340 Examples for *LabelPrefix* or *MolNameOrLabelPrefix* value of
|
|
341 --CompoundIDMode:
|
|
342
|
|
343 Compound
|
|
344
|
|
345 The value specified above generates compound IDs which correspond to
|
|
346 Compound<Number> instead of default value of Cmpd<Number>.
|
|
347
|
|
348 --CompoundIDLabel *text*
|
|
349 Specify compound ID column label for CSV/TSV text file(s) used
|
|
350 during *CompoundID* value of --DataFieldsMode option. Default value:
|
|
351 *CompoundID*.
|
|
352
|
|
353 --CompoundIDMode *DataField | MolName | LabelPrefix |
|
|
354 MolNameOrLabelPrefix*
|
|
355 Specify how to generate compound IDs and write to FP or CSV/TSV text
|
|
356 file(s) along with generated fingerprints for *FP | text | all*
|
|
357 values of --output option: use a *SDFile(s)* datafield value; use
|
|
358 molname line from *SDFile(s)*; generate a sequential ID with
|
|
359 specific prefix; use combination of both MolName and LabelPrefix
|
|
360 with usage of LabelPrefix values for empty molname lines.
|
|
361
|
|
362 Possible values: *DataField | MolName | LabelPrefix |
|
|
363 MolNameOrLabelPrefix*. Default value: *LabelPrefix*.
|
|
364
|
|
365 For *MolNameAndLabelPrefix* value of --CompoundIDMode, molname line
|
|
366 in *SDFile(s)* takes precedence over sequential compound IDs
|
|
367 generated using *LabelPrefix* and only empty molname values are
|
|
368 replaced with sequential compound IDs.
|
|
369
|
|
370 This is only used for *CompoundID* value of --DataFieldsMode option.
|
|
371
|
|
372 --DataFields *"FieldLabel1,FieldLabel2,..."*
|
|
373 Comma delimited list of *SDFiles(s)* data fields to extract and
|
|
374 write to CSV/TSV text file(s) along with generated fingerprints for
|
|
375 *text | all* values of --output option.
|
|
376
|
|
377 This is only used for *Specify* value of --DataFieldsMode option.
|
|
378
|
|
379 Examples:
|
|
380
|
|
381 Extreg
|
|
382 MolID,CompoundName
|
|
383
|
|
384 -d, --DataFieldsMode *All | Common | Specify | CompoundID*
|
|
385 Specify how data fields in *SDFile(s)* are transferred to output
|
|
386 CSV/TSV text file(s) along with generated fingerprints for *text |
|
|
387 all* values of --output option: transfer all SD data field; transfer
|
|
388 SD data files common to all compounds; extract specified data
|
|
389 fields; generate a compound ID using molname line, a compound
|
|
390 prefix, or a combination of both. Possible values: *All | Common |
|
|
391 specify | CompoundID*. Default value: *CompoundID*.
|
|
392
|
|
393 -f, --Filter *Yes | No*
|
|
394 Specify whether to check and filter compound data in SDFile(s).
|
|
395 Possible values: *Yes or No*. Default value: *Yes*.
|
|
396
|
|
397 By default, compound data is checked before calculating fingerprints
|
|
398 and compounds containing atom data corresponding to non-element
|
|
399 symbols or no atom data are ignored.
|
|
400
|
|
401 --FingerprintsLabel *text*
|
|
402 SD data label or text file column label to use for fingerprints
|
|
403 string in output SD or CSV/TSV text file(s) specified by --output.
|
|
404 Default value: *TopologicalAtomTripletsFingerprints*.
|
|
405
|
|
406 -h, --help
|
|
407 Print this help message.
|
|
408
|
|
409 -k, --KeepLargestComponent *Yes | No*
|
|
410 Generate fingerprints for only the largest component in molecule.
|
|
411 Possible values: *Yes or No*. Default value: *Yes*.
|
|
412
|
|
413 For molecules containing multiple connected components, fingerprints
|
|
414 can be generated in two different ways: use all connected components
|
|
415 or just the largest connected component. By default, all atoms
|
|
416 except for the largest connected component are deleted before
|
|
417 generation of fingerprints.
|
|
418
|
|
419 --MinDistance *number*
|
|
420 Minimum bond distance between atom triplets for generating
|
|
421 topological atom triplets. Default value: *1*. Valid values:
|
|
422 positive integers and less than --MaxDistance.
|
|
423
|
|
424 --MaxDistance *number*
|
|
425 Maximum bond distance between atom triplets for generating
|
|
426 topological atom triplets. Default value: *10*. Valid values:
|
|
427 positive integers and greater than --MinDistance.
|
|
428
|
|
429 --OutDelim *comma | tab | semicolon*
|
|
430 Delimiter for output CSV/TSV text file(s). Possible values: *comma,
|
|
431 tab, or semicolon* Default value: *comma*
|
|
432
|
|
433 --output *SD | FP | text | all*
|
|
434 Type of output files to generate. Possible values: *SD, FP, text, or
|
|
435 all*. Default value: *text*.
|
|
436
|
|
437 -o, --overwrite
|
|
438 Overwrite existing files.
|
|
439
|
|
440 -q, --quote *Yes | No*
|
|
441 Put quote around column values in output CSV/TSV text file(s).
|
|
442 Possible values: *Yes or No*. Default value: *Yes*.
|
|
443
|
|
444 -r, --root *RootName*
|
|
445 New file name is generated using the root: <Root>.<Ext>. Default for
|
|
446 new file names: <SDFileName><TopologicalAtomTripletsFP>.<Ext>. The
|
|
447 file type determines <Ext> value. The sdf, fpf, csv, and tsv <Ext>
|
|
448 values are used for SD, FP, comma/semicolon, and tab delimited text
|
|
449 files, respectively.This option is ignored for multiple input files.
|
|
450
|
|
451 -u, --UseTriangleInequality *Yes | No*
|
|
452 Specify whether to imply triangle distance inequality test to
|
|
453 distances between atom pairs in atom triplets during generation of
|
|
454 atom triplets generation. Possible values: *Yes or No*. Default
|
|
455 value: *No*.
|
|
456
|
|
457 Triangle distance inequality test implies that distance or binned
|
|
458 distance between any two atom pairs in an atom triplet must be less
|
|
459 than the sum of distances or binned distances between other two
|
|
460 atoms pairs and greater than the difference of their distances.
|
|
461
|
|
462 For atom triplet ATx-Dyz-ATy-Dxz-ATz-Dxy to satisfy triangle inequality:
|
|
463
|
|
464 Dyz > |Dxz - Dxy| and Dyz < Dxz + Dxy
|
|
465 Dxz > |Dyz - Dxy| and Dyz < Dyz + Dxy
|
|
466 Dxy > |Dyz - Dxz| and Dxy < Dyz + Dxz
|
|
467
|
|
468 -v, --VectorStringFormat *IDsAndValuesString | IDsAndValuesPairsString |
|
|
469 ValuesAndIDsString | ValuesAndIDsPairsString*
|
|
470 Format of fingerprints vector string data in output SD, FP or
|
|
471 CSV/TSV text file(s) specified by --output option. Possible values:
|
|
472 *IDsAndValuesString | IDsAndValuesPairsString | ValuesAndIDsString |
|
|
473 ValuesAndIDsPairsString*. Default value: *IDsAndValuesString*.
|
|
474
|
|
475 Examples:
|
|
476
|
|
477 FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAtomTypes:M
|
|
478 inDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesString;C.X1
|
|
479 .BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D1
|
|
480 0-C.X3.BO4-D9 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 C.X1.BO1.H3-D1
|
|
481 -C.X2.BO2.H2-D4-C.X2.BO2.H2-D5 C.X1.BO1.H3-D1-C.X2.BO2.H2-D6-C.X3....;
|
|
482 1 2 2 2 2 2 2 2 8 8 4 8 4 4 2 2 2 2 4 2 2 2 4 2 2 2 2 1 2 2 4 4 4 2 2
|
|
483 2 4 4 4 8 4 4 2 4 4 4 2 4 4 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 8...
|
|
484
|
|
485 FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAtomTypes:M
|
|
486 inDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesPairsString
|
|
487 ;C.X1.BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 1 C.X1.BO1.H3-D1-C.X2.BO
|
|
488 2.H2-D10-C.X3.BO4-D9 2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 2 C.X
|
|
489 1.BO1.H3-D1-C.X2.BO2.H2-D4-C.X2.BO2.H2-D5 2 C.X1.BO1.H3-D1-C.X2.BO2.H2
|
|
490 -D6-C.X3.BO3.H1-D5 2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D6-C.X3.BO3.H1-D7 2...
|
|
491
|
|
492 -w, --WorkingDir *DirName*
|
|
493 Location of working directory. Default value: current directory.
|
|
494
|
|
495 EXAMPLES
|
|
496 To generate topological atom triplets fingerprints corresponding to bond
|
|
497 distances from 1 through 10 using atomic invariants atom types in
|
|
498 IDsAndValuesString format and create a SampleTATFP.csv file containing
|
|
499 sequential compound IDs along with fingerprints vector strings data,
|
|
500 type:
|
|
501
|
|
502 % TopologicalAtomTripletsFingerprints.pl -r SampleTATFP -o Sample.sdf
|
|
503
|
|
504 To generate topological atom triplets fingerprints corresponding to bond
|
|
505 distances from 1 through 10 using atomic invariants atom types in
|
|
506 IDsAndValuesString format and create SampleTATFP.sdf, SampleTATFP.fpf
|
|
507 and SampleTATFP.csv files containing sequential compound IDs in CSV file
|
|
508 along with fingerprints vector strings data, type:
|
|
509
|
|
510 % TopologicalAtomTripletsFingerprints.pl --output all -r SampleTATFP
|
|
511 -o Sample.sdf
|
|
512
|
|
513 To generate topological atom triplets fingerprints corresponding to bond
|
|
514 distances from 1 through 10 using atomic invariants atom types in
|
|
515 IDsAndValuesPairsString format and create a SampleTATFP.csv file
|
|
516 containing sequential compound IDs along with fingerprints vector
|
|
517 strings data, type:
|
|
518
|
|
519 % TopologicalAtomTripletsFingerprints.pl --VectorStringFormat
|
|
520 IDsAndValuesPairsString -r SampleTATFP -o Sample.sdf
|
|
521
|
|
522 To generate topological atom triplets fingerprints corresponding to bond
|
|
523 distances from 1 through 10 using DREIDING atom types in
|
|
524 IDsAndValuesString format and create a SampleTATFP.csv file containing
|
|
525 sequential compound IDs along with fingerprints vector strings data,
|
|
526 type:
|
|
527
|
|
528 % TopologicalAtomTripletsFingerprints.pl -a DREIDINGAtomTypes
|
|
529 -r SampleTATFP -o Sample.sdf
|
|
530
|
|
531 To generate topological atom triplets fingerprints corresponding to bond
|
|
532 distances from 1 through 10 using E-state atom types in
|
|
533 IDsAndValuesString format and create a SampleTATFP.csv file containing
|
|
534 sequential compound IDs along with fingerprints vector strings data,
|
|
535 type:
|
|
536
|
|
537 % TopologicalAtomTripletsFingerprints.pl -a EStateAtomTypes
|
|
538 -r SampleTATFP -o Sample.sdf
|
|
539
|
|
540 To generate topological atom triplets fingerprints corresponding to bond
|
|
541 distances from 1 through 10 using functional class atom types in
|
|
542 IDsAndValuesString format and create a SampleTATFP.csv file containing
|
|
543 sequential compound IDs along with fingerprints vector strings data,
|
|
544 type:
|
|
545
|
|
546 % TopologicalAtomTripletsFingerprints.pl -a FunctionalClassAtomTypes
|
|
547 -r SampleTATFP -o Sample.sdf
|
|
548
|
|
549 To generate topological atom triplets fingerprints corresponding to bond
|
|
550 distances from 1 through 10 using DREIDING atom types in
|
|
551 IDsAndValuesString format and create a SampleTATFP.csv file containing
|
|
552 sequential compound IDs along with fingerprints vector strings data,
|
|
553 type:
|
|
554
|
|
555 % TopologicalAtomTripletsFingerprints.pl -a DREIDINGAtomTypes
|
|
556 -r SampleTATFP -o Sample.sdf
|
|
557
|
|
558 To generate topological atom triplets fingerprints corresponding to bond
|
|
559 distances from 1 through 10 using MM94 atom types in IDsAndValuesString
|
|
560 format and create a SampleTATFP.csv file containing sequential compound
|
|
561 IDs along with fingerprints vector strings data, type:
|
|
562
|
|
563 % TopologicalAtomTripletsFingerprints.pl -a MMFF94AtomTypes
|
|
564 -r SampleTATFP -o Sample.sdf
|
|
565
|
|
566 To generate topological atom triplets fingerprints corresponding to bond
|
|
567 distances from 1 through 10 using SLogP atom types in IDsAndValuesString
|
|
568 format and create a SampleTATFP.csv file containing sequential compound
|
|
569 IDs along with fingerprints vector strings data, type:
|
|
570
|
|
571 % TopologicalAtomTripletsFingerprints.pl -a SLogPAtomTypes
|
|
572 -r SampleTATFP -o Sample.sdf
|
|
573
|
|
574 To generate topological atom triplets fingerprints corresponding to bond
|
|
575 distances from 1 through 10 using SYBYL atom types in IDsAndValuesString
|
|
576 format and create a SampleTATFP.csv file containing sequential compound
|
|
577 IDs along with fingerprints vector strings data, type:
|
|
578
|
|
579 % TopologicalAtomTripletsFingerprints.pl -a SYBYLAtomTypes
|
|
580 -r SampleTATFP -o Sample.sdf
|
|
581
|
|
582 To generate topological atom triplets fingerprints corresponding to bond
|
|
583 distances from 1 through 10 using TPSA atom types in IDsAndValuesString
|
|
584 format and create a SampleTATFP.csv file containing sequential compound
|
|
585 IDs along with fingerprints vector strings data, type:
|
|
586
|
|
587 % TopologicalAtomTripletsFingerprints.pl -a TPSAAtomTypes
|
|
588 -r SampleTATFP -o Sample.sdf
|
|
589
|
|
590 To generate topological atom triplets fingerprints corresponding to bond
|
|
591 distances from 1 through 10 using UFF atom types in IDsAndValuesString
|
|
592 format and create a SampleTATFP.csv file containing sequential compound
|
|
593 IDs along with fingerprints vector strings data, type:
|
|
594
|
|
595 % TopologicalAtomTripletsFingerprints.pl -a UFFAtomTypes
|
|
596 -r SampleTATFP -o Sample.sdf
|
|
597
|
|
598 To generate topological atom triplets fingerprints corresponding to bond
|
|
599 distances from 1 through 6 using atomic invariants atom types in
|
|
600 IDsAndValuesString format and create a SampleTATFP.csv file containing
|
|
601 sequential compound IDs along with fingerprints vector strings data,
|
|
602 type:
|
|
603
|
|
604 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes
|
|
605 --MinDistance 1 --MaxDistance 6 -r SampleTATFP -o Sample.sdf
|
|
606
|
|
607 To generate topological atom triplets fingerprints corresponding to bond
|
|
608 distances from 1 through 10 using only AS,X atomic invariants atom types
|
|
609 in IDsAndValuesString format and create a SampleTATFP.csv file
|
|
610 containing sequential compound IDs along with fingerprints vector
|
|
611 strings data, type:
|
|
612
|
|
613 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes
|
|
614 --AtomicInvariantsToUse "AS,X" --MinDistance 1 --MaxDistance 6
|
|
615 -r SampleTATFP -o Sample.sdf
|
|
616
|
|
617 To generate topological atom triplets fingerprints corresponding to bond
|
|
618 distances from 1 through 10 using atomic invariants atom types in
|
|
619 IDsAndValuesString format and create a SampleTATFP.csv file containing
|
|
620 compound ID from molecule name line along with fingerprints vector
|
|
621 strings data, type:
|
|
622
|
|
623 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes
|
|
624 --DataFieldsMode CompoundID -CompoundIDMode MolName
|
|
625 -r SampleTATFP -o Sample.sdf
|
|
626
|
|
627 To generate topological atom triplets fingerprints corresponding to bond
|
|
628 distances from 1 through 10 using atomic invariants atom types in
|
|
629 IDsAndValuesString format and create a SampleTATFP.csv file containing
|
|
630 compound IDs using specified data field along with fingerprints vector
|
|
631 strings data, type:
|
|
632
|
|
633 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes
|
|
634 --DataFieldsMode CompoundID -CompoundIDMode DataField --CompoundID
|
|
635 Mol_ID -r SampleTATFP -o Sample.sdf
|
|
636
|
|
637 To generate topological atom triplets fingerprints corresponding to bond
|
|
638 distances from 1 through 10 using atomic invariants atom types in
|
|
639 IDsAndValuesString format and create a SampleTATFP.csv file containing
|
|
640 compound ID using combination of molecule name line and an explicit
|
|
641 compound prefix along with fingerprints vector strings data, type:
|
|
642
|
|
643 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes
|
|
644 --DataFieldsMode CompoundID -CompoundIDMode MolnameOrLabelPrefix
|
|
645 --CompoundID Cmpd --CompoundIDLabel MolID -r SampleTATFP -o Sample.sdf
|
|
646
|
|
647 To generate topological atom triplets fingerprints corresponding to bond
|
|
648 distances from 1 through 10 using atomic invariants atom types in
|
|
649 IDsAndValuesString format and create a SampleTATFP.csv file containing
|
|
650 specific data fields columns along with fingerprints vector strings
|
|
651 data, type:
|
|
652
|
|
653 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes
|
|
654 --DataFieldsMode Specify --DataFields Mol_ID -r SampleTATFP
|
|
655 -o Sample.sdf
|
|
656
|
|
657 To generate topological atom triplets fingerprints corresponding to bond
|
|
658 distances from 1 through 10 using atomic invariants atom types in
|
|
659 IDsAndValuesString format and create a SampleTATFP.csv file containing
|
|
660 common data fields columns along with fingerprints vector strings data,
|
|
661 type:
|
|
662
|
|
663 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes
|
|
664 --DataFieldsMode Common -r SampleTATFP -o Sample.sdf
|
|
665
|
|
666 To generate topological atom triplets fingerprints corresponding to bond
|
|
667 distances from 1 through 10 using atomic invariants atom types in
|
|
668 IDsAndValuesString format and create SampleTATFP.sdf, SampleTATFP.fpf
|
|
669 and SampleTATFP.csv files containing all data fields columns in CSV file
|
|
670 along with fingerprints data, type:
|
|
671
|
|
672 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes
|
|
673 --DataFieldsMode All --output all -r SampleTATFP
|
|
674 -o Sample.sdf
|
|
675
|
|
676 AUTHOR
|
|
677 Manish Sud <msud@san.rr.com>
|
|
678
|
|
679 SEE ALSO
|
|
680 InfoFingerprintsFiles.pl, SimilarityMatricesFingerprints.pl,
|
|
681 AtomNeighborhoodsFingerprints.pl, ExtendedConnectivityFingerprints.pl,
|
|
682 MACCSKeysFingerprints.pl, PathLengthFingerprints.pl,
|
|
683 TopologicalAtomTorsionsFingerprints.pl,
|
|
684 TopologicalPharmacophoreAtomPairsFingerprints.pl,
|
|
685 TopologicalPharmacophoreAtomTripletsFingerprints.pl
|
|
686
|
|
687 COPYRIGHT
|
|
688 Copyright (C) 2015 Manish Sud. All rights reserved.
|
|
689
|
|
690 This file is part of MayaChemTools.
|
|
691
|
|
692 MayaChemTools is free software; you can redistribute it and/or modify it
|
|
693 under the terms of the GNU Lesser General Public License as published by
|
|
694 the Free Software Foundation; either version 3 of the License, or (at
|
|
695 your option) any later version.
|
|
696
|