comparison docs/scripts/txt/TopologicalAtomTripletsFingerprints.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:4816e4a8ae95
1 NAME
2 TopologicalAtomTripletsFingerprints.pl - Generate topological atom
3 triplets fingerprints for SD files
4
5 SYNOPSIS
6 TopologicalAtomTripletsFingerprints.pl SDFile(s)...
7
8 TopologicalAtomTripletsFingerprints.pl [--AromaticityModel
9 *AromaticityModelType*] [-a, --AtomIdentifierType
10 *AtomicInvariantsAtomTypes*] [--AtomicInvariantsToUse
11 *"AtomicInvariant,AtomicInvariant..."*] [--FunctionalClassesToUse
12 *"FunctionalClass1,FunctionalClass2..."*] [--CompoundID *DataFieldName
13 or LabelPrefixString*] [--CompoundIDLabel *text*] [--CompoundIDMode]
14 [--DataFields *"FieldLabel1,FieldLabel2,..."*] [-d, --DataFieldsMode
15 *All | Common | Specify | CompoundID*] [-f, --Filter *Yes | No*]
16 [--FingerprintsLabel *text*] [-h, --help] [-k, --KeepLargestComponent
17 *Yes | No*] [--MinDistance *number*] [--MaxDistance *number*]
18 [--OutDelim *comma | tab | semicolon*] [--output *SD | FP | text | all*]
19 [-o, --overwrite] [-q, --quote *Yes | No*] [-r, --root *RootName*] [-u,
20 --UseTriangleInequality *Yes | No*] [-v, --VectorStringFormat
21 *ValuesString, IDsAndValuesString | IDsAndValuesPairsString |
22 ValuesAndIDsString | ValuesAndIDsPairsString*] [-w, --WorkingDir
23 dirname] SDFile(s)...
24
25 DESCRIPTION
26 Generate topological atom triplets fingerprints for *SDFile(s)* and
27 create appropriate SD, FP or CSV/TSV text file(s) containing
28 fingerprints vector strings corresponding to molecular fingerprints.
29
30 Multiple SDFile names are separated by spaces. The valid file extensions
31 are *.sdf* and *.sd*. All other file names are ignored. All the SD files
32 in a current directory can be specified either by **.sdf* or the current
33 directory name.
34
35 The current release of MayaChemTools supports generation of topological
36 atom triplets fingerprints corresponding to following -a,
37 --AtomIdentifierTypes:
38
39 AtomicInvariantsAtomTypes, DREIDINGAtomTypes, EStateAtomTypes,
40 FunctionalClassAtomTypes, MMFF94AtomTypes, SLogPAtomTypes,
41 SYBYLAtomTypes, TPSAAtomTypes, UFFAtomTypes
42
43 Based on the values specified for -a, --AtomIdentifierType and
44 --AtomicInvariantsToUse, initial atom types are assigned to all
45 non-hydrogen atoms in a molecule. Using the distance matrix for the
46 molecule and initial atom types assigned to non-hydrogen atoms, all
47 unique atom pairs within --MinDistance and --MaxDistance are identified
48 and counted. An atom triplet identifier is generated for each unique
49 atom triplet; the format of the atom triplet identifier is:
50
51 <ATx>-Dyz-<ATy>-Dxz-<ATz>-Dxy
52
53 ATx, ATy, ATz: Atom types assigned to atom x, atom y, and atom z
54 Dxy: Distance between atom x and atom y
55 Dxz: Distance between atom x and atom z
56 Dyz: Distance between atom y and atom z
57
58 where <AT1>-D23 <= <AT2>-D13 <= <AT3>-D12
59
60 The atom triplet identifiers for all unique atom triplets corresponding
61 to non-hydrogen atoms constitute topological atom triplets fingerprints
62 of the molecule.
63
64 Example of *SD* file containing topological atom triplets fingerprints
65 string data:
66
67 ... ...
68 ... ...
69 $$$$
70 ... ...
71 ... ...
72 ... ...
73 41 44 0 0 0 0 0 0 0 0999 V2000
74 -3.3652 1.4499 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
75 ... ...
76 2 3 1 0 0 0 0
77 ... ...
78 M END
79 > <CmpdID>
80 Cmpd1
81
82 > <TopologicalAtomTripletsFingerprints>
83 FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAtomTypes:Mi
84 nDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesString;C.X1.B
85 O1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D10-C
86 .X3.BO4-D9 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 C.X1.BO1.H3-D1...;
87 1 2 2 2 2 2 2 2 8 8 4 8 4 4 2 2 2 2 4 2 2 2 4 2 2 2 2 1 2 2 4 4 4 2 2 2
88 4 4 4 8 4 4 2 4 4 4 2 4 4 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 8 8 ...
89
90 $$$$
91 ... ...
92 ... ...
93
94 Example of *FP* file containing topological atom triplets fingerprints
95 string data:
96
97 #
98 # Package = MayaChemTools 7.4
99 # Release Date = Oct 21, 2010
100 #
101 # TimeStamp = Fri Mar 11 15:24:01 2011
102 #
103 # FingerprintsStringType = FingerprintsVector
104 #
105 # Description = TopologicalAtomTriplets:AtomicInvariantsAtomTypes:Mi...
106 # VectorStringFormat = IDsAndValuesString
107 # VectorValuesType = NumericalValues
108 #
109 Cmpd1 3096;C.X1.BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2...;1 2 2 2 2...
110 Cmpd2 1093;C.X1.BO1.H3-D1-C.X1.BO1.H3-D3-C.X2.BO2.H2-D4...;2 2 2 2 2...
111 ... ...
112 ... ..
113
114 Example of CSV *Text* file containing topological atom triplets
115 fingerprints string data:
116
117 "CompoundID","TopologicalAtomTripletsFingerprints"
118 "Cmpd1","FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAto
119 mTypes:MinDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesStri
120 ng;C.X1.BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 C.X1.BO1.H3-D1-C.X2.BO2
121 .H2-D10-C.X3.BO4-D9 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 C.X1....;
122 1 2 2 2 2 2 2 2 8 8 4 8 4 4 2 2 2 2 4 2 2 2 4 2 2 2 2 1 2 2 4 4 4 2 2 2
123 4 4 4 8 4 4 2 4 4 4 2 4 4 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 8 8 ...
124 ... ...
125 ... ...
126
127 The current release of MayaChemTools generates the following types of
128 topological atom triplets fingerprints vector strings:
129
130 FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAtomTypes:M
131 inDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesString;C.X1
132 .BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D1
133 0-C.X3.BO4-D9 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 C.X1.BO1.H3-D1
134 -C.X2.BO2.H2-D4-C.X2.BO2.H2-D5 C.X1.BO1.H3-D1-C.X2.BO2.H2-D6-C.X3....;
135 1 2 2 2 2 2 2 2 8 8 4 8 4 4 2 2 2 2 4 2 2 2 4 2 2 2 2 1 2 2 4 4 4 2 2
136 2 4 4 4 8 4 4 2 4 4 4 2 4 4 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 8...
137
138 FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAtomTypes:M
139 inDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesPairsString
140 ;C.X1.BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 1 C.X1.BO1.H3-D1-C.X2.BO
141 2.H2-D10-C.X3.BO4-D9 2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 2 C.X
142 1.BO1.H3-D1-C.X2.BO2.H2-D4-C.X2.BO2.H2-D5 2 C.X1.BO1.H3-D1-C.X2.BO2.H2
143 -D6-C.X3.BO3.H1-D5 2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D6-C.X3.BO3.H1-D7 2...
144
145 FingerprintsVector;TopologicalAtomTriplets:DREIDINGAtomTypes:MinDistan
146 ce1:MaxDistance10;2377;NumericalValues;IDsAndValuesString;C_2-D1-C_2-D
147 9-C_3-D10 C_2-D1-C_2-D9-C_R-D10 C_2-D1-C_3-D1-C_3-D2 C_2-D1-C_3-D10-C_
148 3-D9 C_2-D1-C_3-D2-C_3-D3 C_2-D1-C_3-D2-C_R-D3 C_2-D1-C_3-D3-C_3-D4 C_
149 2-D1-C_3-D3-N_R-D4 C_2-D1-C_3-D3-O_3-D2 C_2-D1-C_3-D4-C_3-D5 C_2-D...;
150 1 1 1 2 1 1 3 1 1 2 2 1 1 1 1 1 1 1 1 2 1 3 4 5 1 1 6 4 2 2 3 1 1 1 2
151 2 1 2 1 1 2 2 2 1 2 1 2 1 1 3 3 2 6 4 2 1 1 1 2 2 1 1 1 1 1 1 1 1 1...
152
153 FingerprintsVector;TopologicalAtomTriplets:EStateAtomTypes:MinDistance
154 1:MaxDistance10;3298;NumericalValues;IDsAndValuesString;aaCH-D1-aaCH-D
155 1-aaCH-D2 aaCH-D1-aaCH-D1-aasC-D2 aaCH-D1-aaCH-D10-aaCH-D9 aaCH-D1-aaC
156 H-D10-aasC-D9 aaCH-D1-aaCH-D2-aaCH-D3 aaCH-D1-aaCH-D2-aasC-D1 aaCH-D1-
157 aaCH-D2-aasC-D3 aaCH-D1-aaCH-D3-aasC-D2 aaCH-D1-aaCH-D4-aasC-D5 aa...;
158 6 4 24 4 16 8 8 4 8 8 8 12 10 14 4 16 24 4 12 2 2 4 1 10 2 2 15 2 2 2
159 2 2 2 14 4 2 2 2 2 1 2 10 2 2 4 1 2 4 8 3 3 3 4 6 4 2 2 3 3 1 1 1 2 1
160 2 2 4 2 3 2 1 2 4 5 3 2 2 1 2 4 3 2 8 12 6 2 2 4 4 7 1 4 2 4 2 2 2 ...
161
162 FingerprintsVector;TopologicalAtomTriplets:FunctionalClassAtomTypes:Mi
163 nDistance1:MaxDistance10;2182;NumericalValues;IDsAndValuesString;Ar-D1
164 -Ar-D1-Ar-D2 Ar-D1-Ar-D1-Ar.HBA-D2 Ar-D1-Ar-D10-Ar-D9 Ar-D1-Ar-D10-Hal
165 -D9 Ar-D1-Ar-D2-Ar-D2 Ar-D1-Ar-D2-Ar-D3 Ar-D1-Ar-D2-Ar.HBA-D1 Ar-D1-Ar
166 -D2-Ar.HBA-D2 Ar-D1-Ar-D2-Ar.HBA-D3 Ar-D1-Ar-D2-HBD-D1 Ar-D1-Ar-D2...;
167 27 1 32 2 2 63 3 2 1 2 1 2 3 1 1 40 3 1 2 2 2 2 4 2 2 47 4 2 2 1 2 1 5
168 2 2 51 4 3 1 3 1 9 1 1 50 3 3 4 1 9 50 2 2 3 3 5 45 1 1 1 2 1 2 2 3 3
169 4 4 3 2 1 1 3 4 5 5 3 1 2 3 2 3 5 7 2 7 3 7 1 1 2 2 2 2 3 1 4 3 1 2...
170
171 FingerprintsVector;TopologicalAtomTriplets:MMFF94AtomTypes:MinDistance
172 1:MaxDistance10;2966;NumericalValues;IDsAndValuesString;C5A-D1-C5A-D1-
173 N5-D2 C5A-D1-C5A-D2-C5B-D2 C5A-D1-C5A-D3-CB-D2 C5A-D1-C5A-D3-CR-D2 C5A
174 -D1-C5B-D1-C5B-D2 C5A-D1-C5B-D2-C=ON-D1 C5A-D1-C5B-D2-CB-D1 C5A-D1-C5B
175 -D3-C=ON-D2 C5A-D1-C5B-D3-CB-D2 C5A-D1-C=ON-D3-NC=O-D2 C5A-D1-C=ON-D3-
176 O=CN-D2 C5A-D1-C=ON-D4-NC=O-D3 C5A-D1-C=ON-D4-O=CN-D3 C5A-D1-CB-D1-...
177
178 FingerprintsVector;TopologicalAtomTriplets:SLogPAtomTypes:MinDistance1
179 :MaxDistance10;3710;NumericalValues;IDsAndValuesString;C1-D1-C1-D1-C11
180 -D2 C1-D1-C1-D1-CS-D2 C1-D1-C1-D10-C5-D9 C1-D1-C1-D3-C10-D2 C1-D1-C1-D
181 3-C5-D2 C1-D1-C1-D3-CS-D2 C1-D1-C1-D3-CS-D4 C1-D1-C1-D4-C10-D5 C1-D1-C
182 1-D4-C11-D5 C1-D1-C1-D5-C10-D4 C1-D1-C1-D5-C5-D4 C1-D1-C1-D6-C11-D7 C1
183 -D1-C1-D6-CS-D5 C1-D1-C1-D6-CS-D7 C1-D1-C1-D8-C11-D9 C1-D1-C1-D8-CS...
184
185 FingerprintsVector;TopologicalAtomTriplets:SYBYLAtomTypes:MinDistance1
186 :MaxDistance10;2332;NumericalValues;IDsAndValuesString;C.2-D1-C.2-D9-C
187 .3-D10 C.2-D1-C.2-D9-C.ar-D10 C.2-D1-C.3-D1-C.3-D2 C.2-D1-C.3-D10-C.3-
188 D9 C.2-D1-C.3-D2-C.3-D3 C.2-D1-C.3-D2-C.ar-D3 C.2-D1-C.3-D3-C.3-D4 C.2
189 -D1-C.3-D3-N.ar-D4 C.2-D1-C.3-D3-O.3-D2 C.2-D1-C.3-D4-C.3-D5 C.2-D1-C.
190 3-D5-C.3-D6 C.2-D1-C.3-D5-O.3-D4 C.2-D1-C.3-D6-C.3-D7 C.2-D1-C.3-D7...
191
192 FingerprintsVector;TopologicalAtomTriplets:TPSAAtomTypes:MinDistance1:
193 MaxDistance10;1007;NumericalValues;IDsAndValuesString;N21-D1-N7-D3-Non
194 e-D4 N21-D1-N7-D5-None-D4 N21-D1-None-D1-None-D2 N21-D1-None-D2-None-D
195 2 N21-D1-None-D2-None-D3 N21-D1-None-D3-None-D4 N21-D1-None-D4-None-D5
196 N21-D1-None-D4-O3-D3 N21-D1-None-D4-O4-D3 N21-D1-None-D5-None-D6 N21-
197 D1-None-D6-None-D7 N21-D1-None-D6-O4-D5 N21-D1-None-D7-None-D8 N21-...
198
199 FingerprintsVector;TopologicalAtomTriplets:UFFAtomTypes:MinDistance1:M
200 axDistance10;2377;NumericalValues;IDsAndValuesString;C_2-D1-C_2-D9-C_3
201 -D10 C_2-D1-C_2-D9-C_R-D10 C_2-D1-C_3-D1-C_3-D2 C_2-D1-C_3-D10-C_3-D9
202 C_2-D1-C_3-D2-C_3-D3 C_2-D1-C_3-D2-C_R-D3 C_2-D1-C_3-D3-C_3-D4 C_2-D1-
203 C_3-D3-N_R-D4 C_2-D1-C_3-D3-O_3-D2 C_2-D1-C_3-D4-C_3-D5 C_2-D1-C_3-D5-
204 C_3-D6 C_2-D1-C_3-D5-O_3-D4 C_2-D1-C_3-D6-C_3-D7 C_2-D1-C_3-D7-C_3-...
205
206 OPTIONS
207 --AromaticityModel *MDLAromaticityModel | TriposAromaticityModel |
208 MMFFAromaticityModel | ChemAxonBasicAromaticityModel |
209 ChemAxonGeneralAromaticityModel | DaylightAromaticityModel |
210 MayaChemToolsAromaticityModel*
211 Specify aromaticity model to use during detection of aromaticity.
212 Possible values in the current release are: *MDLAromaticityModel,
213 TriposAromaticityModel, MMFFAromaticityModel,
214 ChemAxonBasicAromaticityModel, ChemAxonGeneralAromaticityModel,
215 DaylightAromaticityModel or MayaChemToolsAromaticityModel*. Default
216 value: *MayaChemToolsAromaticityModel*.
217
218 The supported aromaticity model names along with model specific
219 control parameters are defined in AromaticityModelsData.csv, which
220 is distributed with the current release and is available under
221 lib/data directory. Molecule.pm module retrieves data from this file
222 during class instantiation and makes it available to method
223 DetectAromaticity for detecting aromaticity corresponding to a
224 specific model.
225
226 -a, --AtomIdentifierType *AtomicInvariantsAtomTypes | DREIDINGAtomTypes
227 | EStateAtomTypes | FunctionalClassAtomTypes | MMFF94AtomTypes |
228 SLogPAtomTypes | SYBYLAtomTypes | TPSAAtomTypes | UFFAtomTypes*
229 Specify atom identifier type to use for assignment of initial atom
230 identifier to non-hydrogen atoms during calculation of topological
231 atom triplets fingerprints. Possible values in the current release
232 are: *AtomicInvariantsAtomTypes, DREIDINGAtomTypes, EStateAtomTypes,
233 FunctionalClassAtomTypes, MMFF94AtomTypes, SLogPAtomTypes,
234 SYBYLAtomTypes, TPSAAtomTypes, UFFAtomTypes*. Default value:
235 *AtomicInvariantsAtomTypes*.
236
237 --AtomicInvariantsToUse *"AtomicInvariant,AtomicInvariant..."*
238 This value is used during *AtomicInvariantsAtomTypes* value of a,
239 --AtomIdentifierType option. It's a list of comma separated valid
240 atomic invariant atom types.
241
242 Possible values for atomic invariants are: *AS, X, BO, LBO, SB, DB,
243 TB, H, Ar, RA, FC, MN, SM*. Default value: *AS,X,BO,H,FC*.
244
245 The atomic invariants abbreviations correspond to:
246
247 AS = Atom symbol corresponding to element symbol
248
249 X<n> = Number of non-hydrogen atom neighbors or heavy atoms
250 BO<n> = Sum of bond orders to non-hydrogen atom neighbors or heavy atoms
251 LBO<n> = Largest bond order of non-hydrogen atom neighbors or heavy atoms
252 SB<n> = Number of single bonds to non-hydrogen atom neighbors or heavy atoms
253 DB<n> = Number of double bonds to non-hydrogen atom neighbors or heavy atoms
254 TB<n> = Number of triple bonds to non-hydrogen atom neighbors or heavy atoms
255 H<n> = Number of implicit and explicit hydrogens for atom
256 Ar = Aromatic annotation indicating whether atom is aromatic
257 RA = Ring atom annotation indicating whether atom is a ring
258 FC<+n/-n> = Formal charge assigned to atom
259 MN<n> = Mass number indicating isotope other than most abundant isotope
260 SM<n> = Spin multiplicity of atom. Possible values: 1 (singlet), 2 (doublet) or
261 3 (triplet)
262
263 Atom type generated by AtomTypes::AtomicInvariantsAtomTypes class
264 corresponds to:
265
266 AS.X<n>.BO<n>.LBO<n>.<SB><n>.<DB><n>.<TB><n>.H<n>.Ar.RA.FC<+n/-n>.MN<n>.SM<n>
267
268 Except for AS which is a required atomic invariant in atom types,
269 all other atomic invariants are optional. Atom type specification
270 doesn't include atomic invariants with zero or undefined values.
271
272 In addition to usage of abbreviations for specifying atomic
273 invariants, the following descriptive words are also allowed:
274
275 X : NumOfNonHydrogenAtomNeighbors or NumOfHeavyAtomNeighbors
276 BO : SumOfBondOrdersToNonHydrogenAtoms or SumOfBondOrdersToHeavyAtoms
277 LBO : LargestBondOrderToNonHydrogenAtoms or LargestBondOrderToHeavyAtoms
278 SB : NumOfSingleBondsToNonHydrogenAtoms or NumOfSingleBondsToHeavyAtoms
279 DB : NumOfDoubleBondsToNonHydrogenAtoms or NumOfDoubleBondsToHeavyAtoms
280 TB : NumOfTripleBondsToNonHydrogenAtoms or NumOfTripleBondsToHeavyAtoms
281 H : NumOfImplicitAndExplicitHydrogens
282 Ar : Aromatic
283 RA : RingAtom
284 FC : FormalCharge
285 MN : MassNumber
286 SM : SpinMultiplicity
287
288 *AtomTypes::AtomicInvariantsAtomTypes* module is used to assign
289 atomic invariant atom types.
290
291 --FunctionalClassesToUse *"FunctionalClass1,FunctionalClass2..."*
292 This value is used during *FunctionalClassAtomTypes* value of a,
293 --AtomIdentifierType option. It's a list of comma separated valid
294 functional classes.
295
296 Possible values for atom functional classes are: *Ar, CA, H, HBA,
297 HBD, Hal, NI, PI, RA*. Default value [ Ref 24 ]:
298 *HBD,HBA,PI,NI,Ar,Hal*.
299
300 The functional class abbreviations correspond to:
301
302 HBD: HydrogenBondDonor
303 HBA: HydrogenBondAcceptor
304 PI : PositivelyIonizable
305 NI : NegativelyIonizable
306 Ar : Aromatic
307 Hal : Halogen
308 H : Hydrophobic
309 RA : RingAtom
310 CA : ChainAtom
311
312 Functional class atom type specification for an atom corresponds to:
313
314 Ar.CA.H.HBA.HBD.Hal.NI.PI.RA
315
316 *AtomTypes::FunctionalClassAtomTypes* module is used to assign
317 functional class atom types. It uses following definitions [ Ref
318 60-61, Ref 65-66 ]:
319
320 HydrogenBondDonor: NH, NH2, OH
321 HydrogenBondAcceptor: N[!H], O
322 PositivelyIonizable: +, NH2
323 NegativelyIonizable: -, C(=O)OH, S(=O)OH, P(=O)OH
324
325 --CompoundID *DataFieldName or LabelPrefixString*
326 This value is --CompoundIDMode specific and indicates how compound
327 ID is generated.
328
329 For *DataField* value of --CompoundIDMode option, it corresponds to
330 datafield label name whose value is used as compound ID; otherwise,
331 it's a prefix string used for generating compound IDs like
332 LabelPrefixString<Number>. Default value, *Cmpd*, generates compound
333 IDs which look like Cmpd<Number>.
334
335 Examples for *DataField* value of --CompoundIDMode:
336
337 MolID
338 ExtReg
339
340 Examples for *LabelPrefix* or *MolNameOrLabelPrefix* value of
341 --CompoundIDMode:
342
343 Compound
344
345 The value specified above generates compound IDs which correspond to
346 Compound<Number> instead of default value of Cmpd<Number>.
347
348 --CompoundIDLabel *text*
349 Specify compound ID column label for CSV/TSV text file(s) used
350 during *CompoundID* value of --DataFieldsMode option. Default value:
351 *CompoundID*.
352
353 --CompoundIDMode *DataField | MolName | LabelPrefix |
354 MolNameOrLabelPrefix*
355 Specify how to generate compound IDs and write to FP or CSV/TSV text
356 file(s) along with generated fingerprints for *FP | text | all*
357 values of --output option: use a *SDFile(s)* datafield value; use
358 molname line from *SDFile(s)*; generate a sequential ID with
359 specific prefix; use combination of both MolName and LabelPrefix
360 with usage of LabelPrefix values for empty molname lines.
361
362 Possible values: *DataField | MolName | LabelPrefix |
363 MolNameOrLabelPrefix*. Default value: *LabelPrefix*.
364
365 For *MolNameAndLabelPrefix* value of --CompoundIDMode, molname line
366 in *SDFile(s)* takes precedence over sequential compound IDs
367 generated using *LabelPrefix* and only empty molname values are
368 replaced with sequential compound IDs.
369
370 This is only used for *CompoundID* value of --DataFieldsMode option.
371
372 --DataFields *"FieldLabel1,FieldLabel2,..."*
373 Comma delimited list of *SDFiles(s)* data fields to extract and
374 write to CSV/TSV text file(s) along with generated fingerprints for
375 *text | all* values of --output option.
376
377 This is only used for *Specify* value of --DataFieldsMode option.
378
379 Examples:
380
381 Extreg
382 MolID,CompoundName
383
384 -d, --DataFieldsMode *All | Common | Specify | CompoundID*
385 Specify how data fields in *SDFile(s)* are transferred to output
386 CSV/TSV text file(s) along with generated fingerprints for *text |
387 all* values of --output option: transfer all SD data field; transfer
388 SD data files common to all compounds; extract specified data
389 fields; generate a compound ID using molname line, a compound
390 prefix, or a combination of both. Possible values: *All | Common |
391 specify | CompoundID*. Default value: *CompoundID*.
392
393 -f, --Filter *Yes | No*
394 Specify whether to check and filter compound data in SDFile(s).
395 Possible values: *Yes or No*. Default value: *Yes*.
396
397 By default, compound data is checked before calculating fingerprints
398 and compounds containing atom data corresponding to non-element
399 symbols or no atom data are ignored.
400
401 --FingerprintsLabel *text*
402 SD data label or text file column label to use for fingerprints
403 string in output SD or CSV/TSV text file(s) specified by --output.
404 Default value: *TopologicalAtomTripletsFingerprints*.
405
406 -h, --help
407 Print this help message.
408
409 -k, --KeepLargestComponent *Yes | No*
410 Generate fingerprints for only the largest component in molecule.
411 Possible values: *Yes or No*. Default value: *Yes*.
412
413 For molecules containing multiple connected components, fingerprints
414 can be generated in two different ways: use all connected components
415 or just the largest connected component. By default, all atoms
416 except for the largest connected component are deleted before
417 generation of fingerprints.
418
419 --MinDistance *number*
420 Minimum bond distance between atom triplets for generating
421 topological atom triplets. Default value: *1*. Valid values:
422 positive integers and less than --MaxDistance.
423
424 --MaxDistance *number*
425 Maximum bond distance between atom triplets for generating
426 topological atom triplets. Default value: *10*. Valid values:
427 positive integers and greater than --MinDistance.
428
429 --OutDelim *comma | tab | semicolon*
430 Delimiter for output CSV/TSV text file(s). Possible values: *comma,
431 tab, or semicolon* Default value: *comma*
432
433 --output *SD | FP | text | all*
434 Type of output files to generate. Possible values: *SD, FP, text, or
435 all*. Default value: *text*.
436
437 -o, --overwrite
438 Overwrite existing files.
439
440 -q, --quote *Yes | No*
441 Put quote around column values in output CSV/TSV text file(s).
442 Possible values: *Yes or No*. Default value: *Yes*.
443
444 -r, --root *RootName*
445 New file name is generated using the root: <Root>.<Ext>. Default for
446 new file names: <SDFileName><TopologicalAtomTripletsFP>.<Ext>. The
447 file type determines <Ext> value. The sdf, fpf, csv, and tsv <Ext>
448 values are used for SD, FP, comma/semicolon, and tab delimited text
449 files, respectively.This option is ignored for multiple input files.
450
451 -u, --UseTriangleInequality *Yes | No*
452 Specify whether to imply triangle distance inequality test to
453 distances between atom pairs in atom triplets during generation of
454 atom triplets generation. Possible values: *Yes or No*. Default
455 value: *No*.
456
457 Triangle distance inequality test implies that distance or binned
458 distance between any two atom pairs in an atom triplet must be less
459 than the sum of distances or binned distances between other two
460 atoms pairs and greater than the difference of their distances.
461
462 For atom triplet ATx-Dyz-ATy-Dxz-ATz-Dxy to satisfy triangle inequality:
463
464 Dyz > |Dxz - Dxy| and Dyz < Dxz + Dxy
465 Dxz > |Dyz - Dxy| and Dyz < Dyz + Dxy
466 Dxy > |Dyz - Dxz| and Dxy < Dyz + Dxz
467
468 -v, --VectorStringFormat *IDsAndValuesString | IDsAndValuesPairsString |
469 ValuesAndIDsString | ValuesAndIDsPairsString*
470 Format of fingerprints vector string data in output SD, FP or
471 CSV/TSV text file(s) specified by --output option. Possible values:
472 *IDsAndValuesString | IDsAndValuesPairsString | ValuesAndIDsString |
473 ValuesAndIDsPairsString*. Default value: *IDsAndValuesString*.
474
475 Examples:
476
477 FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAtomTypes:M
478 inDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesString;C.X1
479 .BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D1
480 0-C.X3.BO4-D9 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 C.X1.BO1.H3-D1
481 -C.X2.BO2.H2-D4-C.X2.BO2.H2-D5 C.X1.BO1.H3-D1-C.X2.BO2.H2-D6-C.X3....;
482 1 2 2 2 2 2 2 2 8 8 4 8 4 4 2 2 2 2 4 2 2 2 4 2 2 2 2 1 2 2 4 4 4 2 2
483 2 4 4 4 8 4 4 2 4 4 4 2 4 4 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 8...
484
485 FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAtomTypes:M
486 inDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesPairsString
487 ;C.X1.BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 1 C.X1.BO1.H3-D1-C.X2.BO
488 2.H2-D10-C.X3.BO4-D9 2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 2 C.X
489 1.BO1.H3-D1-C.X2.BO2.H2-D4-C.X2.BO2.H2-D5 2 C.X1.BO1.H3-D1-C.X2.BO2.H2
490 -D6-C.X3.BO3.H1-D5 2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D6-C.X3.BO3.H1-D7 2...
491
492 -w, --WorkingDir *DirName*
493 Location of working directory. Default value: current directory.
494
495 EXAMPLES
496 To generate topological atom triplets fingerprints corresponding to bond
497 distances from 1 through 10 using atomic invariants atom types in
498 IDsAndValuesString format and create a SampleTATFP.csv file containing
499 sequential compound IDs along with fingerprints vector strings data,
500 type:
501
502 % TopologicalAtomTripletsFingerprints.pl -r SampleTATFP -o Sample.sdf
503
504 To generate topological atom triplets fingerprints corresponding to bond
505 distances from 1 through 10 using atomic invariants atom types in
506 IDsAndValuesString format and create SampleTATFP.sdf, SampleTATFP.fpf
507 and SampleTATFP.csv files containing sequential compound IDs in CSV file
508 along with fingerprints vector strings data, type:
509
510 % TopologicalAtomTripletsFingerprints.pl --output all -r SampleTATFP
511 -o Sample.sdf
512
513 To generate topological atom triplets fingerprints corresponding to bond
514 distances from 1 through 10 using atomic invariants atom types in
515 IDsAndValuesPairsString format and create a SampleTATFP.csv file
516 containing sequential compound IDs along with fingerprints vector
517 strings data, type:
518
519 % TopologicalAtomTripletsFingerprints.pl --VectorStringFormat
520 IDsAndValuesPairsString -r SampleTATFP -o Sample.sdf
521
522 To generate topological atom triplets fingerprints corresponding to bond
523 distances from 1 through 10 using DREIDING atom types in
524 IDsAndValuesString format and create a SampleTATFP.csv file containing
525 sequential compound IDs along with fingerprints vector strings data,
526 type:
527
528 % TopologicalAtomTripletsFingerprints.pl -a DREIDINGAtomTypes
529 -r SampleTATFP -o Sample.sdf
530
531 To generate topological atom triplets fingerprints corresponding to bond
532 distances from 1 through 10 using E-state atom types in
533 IDsAndValuesString format and create a SampleTATFP.csv file containing
534 sequential compound IDs along with fingerprints vector strings data,
535 type:
536
537 % TopologicalAtomTripletsFingerprints.pl -a EStateAtomTypes
538 -r SampleTATFP -o Sample.sdf
539
540 To generate topological atom triplets fingerprints corresponding to bond
541 distances from 1 through 10 using functional class atom types in
542 IDsAndValuesString format and create a SampleTATFP.csv file containing
543 sequential compound IDs along with fingerprints vector strings data,
544 type:
545
546 % TopologicalAtomTripletsFingerprints.pl -a FunctionalClassAtomTypes
547 -r SampleTATFP -o Sample.sdf
548
549 To generate topological atom triplets fingerprints corresponding to bond
550 distances from 1 through 10 using DREIDING atom types in
551 IDsAndValuesString format and create a SampleTATFP.csv file containing
552 sequential compound IDs along with fingerprints vector strings data,
553 type:
554
555 % TopologicalAtomTripletsFingerprints.pl -a DREIDINGAtomTypes
556 -r SampleTATFP -o Sample.sdf
557
558 To generate topological atom triplets fingerprints corresponding to bond
559 distances from 1 through 10 using MM94 atom types in IDsAndValuesString
560 format and create a SampleTATFP.csv file containing sequential compound
561 IDs along with fingerprints vector strings data, type:
562
563 % TopologicalAtomTripletsFingerprints.pl -a MMFF94AtomTypes
564 -r SampleTATFP -o Sample.sdf
565
566 To generate topological atom triplets fingerprints corresponding to bond
567 distances from 1 through 10 using SLogP atom types in IDsAndValuesString
568 format and create a SampleTATFP.csv file containing sequential compound
569 IDs along with fingerprints vector strings data, type:
570
571 % TopologicalAtomTripletsFingerprints.pl -a SLogPAtomTypes
572 -r SampleTATFP -o Sample.sdf
573
574 To generate topological atom triplets fingerprints corresponding to bond
575 distances from 1 through 10 using SYBYL atom types in IDsAndValuesString
576 format and create a SampleTATFP.csv file containing sequential compound
577 IDs along with fingerprints vector strings data, type:
578
579 % TopologicalAtomTripletsFingerprints.pl -a SYBYLAtomTypes
580 -r SampleTATFP -o Sample.sdf
581
582 To generate topological atom triplets fingerprints corresponding to bond
583 distances from 1 through 10 using TPSA atom types in IDsAndValuesString
584 format and create a SampleTATFP.csv file containing sequential compound
585 IDs along with fingerprints vector strings data, type:
586
587 % TopologicalAtomTripletsFingerprints.pl -a TPSAAtomTypes
588 -r SampleTATFP -o Sample.sdf
589
590 To generate topological atom triplets fingerprints corresponding to bond
591 distances from 1 through 10 using UFF atom types in IDsAndValuesString
592 format and create a SampleTATFP.csv file containing sequential compound
593 IDs along with fingerprints vector strings data, type:
594
595 % TopologicalAtomTripletsFingerprints.pl -a UFFAtomTypes
596 -r SampleTATFP -o Sample.sdf
597
598 To generate topological atom triplets fingerprints corresponding to bond
599 distances from 1 through 6 using atomic invariants atom types in
600 IDsAndValuesString format and create a SampleTATFP.csv file containing
601 sequential compound IDs along with fingerprints vector strings data,
602 type:
603
604 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes
605 --MinDistance 1 --MaxDistance 6 -r SampleTATFP -o Sample.sdf
606
607 To generate topological atom triplets fingerprints corresponding to bond
608 distances from 1 through 10 using only AS,X atomic invariants atom types
609 in IDsAndValuesString format and create a SampleTATFP.csv file
610 containing sequential compound IDs along with fingerprints vector
611 strings data, type:
612
613 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes
614 --AtomicInvariantsToUse "AS,X" --MinDistance 1 --MaxDistance 6
615 -r SampleTATFP -o Sample.sdf
616
617 To generate topological atom triplets fingerprints corresponding to bond
618 distances from 1 through 10 using atomic invariants atom types in
619 IDsAndValuesString format and create a SampleTATFP.csv file containing
620 compound ID from molecule name line along with fingerprints vector
621 strings data, type:
622
623 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes
624 --DataFieldsMode CompoundID -CompoundIDMode MolName
625 -r SampleTATFP -o Sample.sdf
626
627 To generate topological atom triplets fingerprints corresponding to bond
628 distances from 1 through 10 using atomic invariants atom types in
629 IDsAndValuesString format and create a SampleTATFP.csv file containing
630 compound IDs using specified data field along with fingerprints vector
631 strings data, type:
632
633 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes
634 --DataFieldsMode CompoundID -CompoundIDMode DataField --CompoundID
635 Mol_ID -r SampleTATFP -o Sample.sdf
636
637 To generate topological atom triplets fingerprints corresponding to bond
638 distances from 1 through 10 using atomic invariants atom types in
639 IDsAndValuesString format and create a SampleTATFP.csv file containing
640 compound ID using combination of molecule name line and an explicit
641 compound prefix along with fingerprints vector strings data, type:
642
643 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes
644 --DataFieldsMode CompoundID -CompoundIDMode MolnameOrLabelPrefix
645 --CompoundID Cmpd --CompoundIDLabel MolID -r SampleTATFP -o Sample.sdf
646
647 To generate topological atom triplets fingerprints corresponding to bond
648 distances from 1 through 10 using atomic invariants atom types in
649 IDsAndValuesString format and create a SampleTATFP.csv file containing
650 specific data fields columns along with fingerprints vector strings
651 data, type:
652
653 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes
654 --DataFieldsMode Specify --DataFields Mol_ID -r SampleTATFP
655 -o Sample.sdf
656
657 To generate topological atom triplets fingerprints corresponding to bond
658 distances from 1 through 10 using atomic invariants atom types in
659 IDsAndValuesString format and create a SampleTATFP.csv file containing
660 common data fields columns along with fingerprints vector strings data,
661 type:
662
663 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes
664 --DataFieldsMode Common -r SampleTATFP -o Sample.sdf
665
666 To generate topological atom triplets fingerprints corresponding to bond
667 distances from 1 through 10 using atomic invariants atom types in
668 IDsAndValuesString format and create SampleTATFP.sdf, SampleTATFP.fpf
669 and SampleTATFP.csv files containing all data fields columns in CSV file
670 along with fingerprints data, type:
671
672 % TopologicalAtomTripletsFingerprints.pl -a AtomicInvariantsAtomTypes
673 --DataFieldsMode All --output all -r SampleTATFP
674 -o Sample.sdf
675
676 AUTHOR
677 Manish Sud <msud@san.rr.com>
678
679 SEE ALSO
680 InfoFingerprintsFiles.pl, SimilarityMatricesFingerprints.pl,
681 AtomNeighborhoodsFingerprints.pl, ExtendedConnectivityFingerprints.pl,
682 MACCSKeysFingerprints.pl, PathLengthFingerprints.pl,
683 TopologicalAtomTorsionsFingerprints.pl,
684 TopologicalPharmacophoreAtomPairsFingerprints.pl,
685 TopologicalPharmacophoreAtomTripletsFingerprints.pl
686
687 COPYRIGHT
688 Copyright (C) 2015 Manish Sud. All rights reserved.
689
690 This file is part of MayaChemTools.
691
692 MayaChemTools is free software; you can redistribute it and/or modify it
693 under the terms of the GNU Lesser General Public License as published by
694 the Free Software Foundation; either version 3 of the License, or (at
695 your option) any later version.
696