comparison docs/scripts/txt/TopologicalAtomPairsFingerprints.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:4816e4a8ae95
1 NAME
2 TopologicalAtomPairsFingerprints.pl - Generate topological atom pairs
3 fingerprints for SD files
4
5 SYNOPSIS
6 TopologicalAtomPairsFingerprints.pl SDFile(s)...
7
8 TopologicalAtomPairsFingerprints.pl [--AromaticityModel
9 *AromaticityModelType*] [-a, --AtomIdentifierType
10 *AtomicInvariantsAtomTypes*] [--AtomicInvariantsToUse
11 *"AtomicInvariant,AtomicInvariant..."*] [--FunctionalClassesToUse
12 *"FunctionalClass1,FunctionalClass2..."*] [--CompoundID *DataFieldName
13 or LabelPrefixString*] [--CompoundIDLabel *text*] [--CompoundIDMode]
14 [--DataFields *"FieldLabel1,FieldLabel2,..."*] [-d, --DataFieldsMode
15 *All | Common | Specify | CompoundID*] [-f, --Filter *Yes | No*]
16 [--FingerprintsLabel *text*] [-h, --help] [-k, --KeepLargestComponent
17 *Yes | No*] [--MinDistance *number*] [--MaxDistance *number*]
18 [--OutDelim *comma | tab | semicolon*] [--output *SD | FP | text | all*]
19 [-o, --overwrite] [-q, --quote *Yes | No*] [-r, --root *RootName*] [-v,
20 --VectorStringFormat *ValuesString, IDsAndValuesString |
21 IDsAndValuesPairsString | ValuesAndIDsString | ValuesAndIDsPairsString*]
22 [-w, --WorkingDir dirname] SDFile(s)...
23
24 DESCRIPTION
25 Generate topological atom pairs fingerprints [ Ref 57, Ref 59, Ref 72 ]
26 for *SDFile(s)* and create appropriate SD, FP or CSV/TSV text file(s)
27 containing fingerprints vector strings corresponding to molecular
28 fingerprints.
29
30 Multiple SDFile names are separated by spaces. The valid file extensions
31 are *.sdf* and *.sd*. All other file names are ignored. All the SD files
32 in a current directory can be specified either by **.sdf* or the current
33 directory name.
34
35 The current release of MayaChemTools supports generation of topological
36 atom pairs corresponding to following -a, --AtomIdentifierTypes:
37
38 AtomicInvariantsAtomTypes, DREIDINGAtomTypes, EStateAtomTypes,
39 FunctionalClassAtomTypes, MMFF94AtomTypes, SLogPAtomTypes,
40 SYBYLAtomTypes, TPSAAtomTypes, UFFAtomTypes
41
42 Based on the values specified for -a, --AtomIdentifierType and
43 --AtomicInvariantsToUse, initial atom types are assigned to all
44 non-hydrogen atoms in a molecule. Using the distance matrix for the
45 molecule and initial atom types assigned to non-hydrogen atoms, all
46 unique atom pairs within --MinDistance and --MaxDistance are identified
47 and counted. An atom pair identifier is generated for each unique atom
48 pair; the format of the atom pair identifier is:
49
50 <AtomType1>-D<n>-<AtomType2>
51
52 AtomType1, AtomType2: Atom types assigned to atom1 and atom2
53 D: Distance between atom1 and atom2
54
55 where AtomType1 <= AtomType2
56
57 The atom pair identifiers for all unique atom pairs corresponding to
58 non-hydrogen atoms constitute topological atom pairs fingerprints of the
59 molecule.
60
61 Example of *SD* file containing topological atom pairs fingerprints
62 string data:
63
64 ... ...
65 ... ...
66 $$$$
67 ... ...
68 ... ...
69 ... ...
70 41 44 0 0 0 0 0 0 0 0999 V2000
71 -3.3652 1.4499 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
72 ... ...
73 2 3 1 0 0 0 0
74 ... ...
75 M END
76 > <CmpdID>
77 Cmpd1
78
79 > <TopologicalAtomPairsFingerprints>
80 FingerprintsVector;TopologicalAtomPairs:AtomicInvariantsAtomTypes:MinDi
81 stance1:MaxDistance10;223;NumericalValues;IDsAndValuesString;C.X1.BO1.H
82 3-D1-C.X3.BO3.H1 C.X2.BO2.H2-D1-C.X2.BO2.H2 C.X2.BO2.H2-D1-C.X3.BO3.H1
83 C.X2.BO2.H2-D1-C.X3.BO4 C.X2.BO2.H2-D1-N.X3.BO3 C.X2.BO3.H1-D1-C.X2...;
84 2 1 4 1 1 10 8 1 2 6 1 2 2 1 2 1 2 2 1 2 1 5 1 10 12 2 2 1 2 1 9 1 3 1
85 1 1 2 2 1 3 6 1 6 14 2 2 2 3 1 3 1 8 2 2 1 3 2 6 1 2 2 5 1 3 1 23 1 ...
86
87 $$$$
88 ... ...
89 ... ...
90
91 Example of *FP* file containing topological atom pairs fingerprints
92 string data:
93
94 #
95 # Package = MayaChemTools 7.4
96 # Release Date = Oct 21, 2010
97 #
98 # TimeStamp = Fri Mar 11 15:04:36 2011
99 #
100 # FingerprintsStringType = FingerprintsVector
101 #
102 # Description = TopologicalAtomPairs:AtomicInvariantsAtomTypes:MinDi...
103 # VectorStringFormat = IDsAndValuesString
104 # VectorValuesType = NumericalValues
105 #
106 Cmpd1 223;C.X1.BO1.H3-D1-C.X3.BO3.H1 C.X2.BO2.H2-D1-C.X2.BO2.H2...;1 1...
107 Cmpd2 128;C.X1.BO1.H3-D1-C.X2.BO2.H2 C.X1.BO1.H3-D1-C.X3.BO4...;1 1...
108 ... ...
109 ... ..
110
111 Example of CSV *Text* file containing topological atom pairs
112 fingerprints string data:
113
114 "CompoundID","TopologicalAtomPairsFingerprints"
115 "Cmpd1","FingerprintsVector;TopologicalAtomPairs:AtomicInvariantsAtomTy
116 pes:MinDistance1:MaxDistance10;223;NumericalValues;IDsAndValuesString;C
117 .X1.BO1.H3-D1-C.X3.BO3.H1 C.X2.BO2.H2-D1-C.X2.BO2.H2 C.X2.BO2.H2-D1-C.X
118 3.BO3.H1C.X2.BO2.H2-D1-C.X3.BO4 C.X2.BO2.H2-D1-N.X3.BO3 C.X2.BO3.H1...;
119 2 1 4 1 1 10 8 1 2 6 1 2 2 1 2 1 2 2 1 2 1 5 1 10 12 2 2 1 2 1 9 1 3 1
120 1 1 2 2 1 3 6 1 6 14 2 2 2 3 1 3 1 8 2 2 1 3 2 6 1 2 2 5 1 3 1 23 1 ...
121 ... ...
122 ... ...
123
124 The current release of MayaChemTools generates the following types of
125 topological atom pairs fingerprints vector strings:
126
127 FingerprintsVector;TopologicalAtomPairs:AtomicInvariantsAtomTypes:MinD
128 istance1:MaxDistance10;223;NumericalValues;IDsAndValuesString;C.X1.BO1
129 .H3-D1-C.X3.BO3.H1 C.X2.BO2.H2-D1-C.X2.BO2.H2 C.X2.BO2.H2-D1-C.X3.BO3.
130 H1 C.X2.BO2.H2-D1-C.X3.BO4 C.X2.BO2.H2-D1-N.X3.BO3 C.X2.BO3.H1-D1-...;
131 2 1 4 1 1 10 8 1 2 6 1 2 2 1 2 1 2 2 1 2 1 5 1 10 12 2 2 1 2 1 9 1 3 1
132 1 1 2 2 1 3 6 1 6 14 2 2 2 3 1 3 1 8 2 2 1 3 2 6 1 2 2 5 1 3 1 23 1...
133
134 FingerprintsVector;TopologicalAtomPairs:AtomicInvariantsAtomTypes:MinD
135 istance1:MaxDistance10;223;NumericalValues;IDsAndValuesPairsString;C.X
136 1.BO1.H3-D1-C.X3.BO3.H1 2 C.X2.BO2.H2-D1-C.X2.BO2.H2 1 C.X2.BO2.H2-D1-
137 C.X3.BO3.H1 4 C.X2.BO2.H2-D1-C.X3.BO4 1 C.X2.BO2.H2-D1-N.X3.BO3 1 C.X2
138 .BO3.H1-D1-C.X2.BO3.H1 10 C.X2.BO3.H1-D1-C.X3.BO4 8 C.X3.BO3.H1-D1-C.X
139 3.BO4 1 C.X3.BO3.H1-D1-O.X1.BO1.H1 2 C.X3.BO4-D1-C.X3.BO4 6 C.X3.BO...
140
141 FingerprintsVector;TopologicalAtomPairs:DREIDINGAtomTypes:MinDistance1
142 :MaxDistance10;157;NumericalValues;IDsAndValuesString;C_2-D1-C_3 C_2-D
143 1-C_R C_2-D1-N_3 C_2-D1-O_2 C_2-D1-O_3 C_3-D1-C_3 C_3-D1-C_R C_3-D1-N_
144 R C_3-D1-O_3 C_R-D1-C_R C_R-D1-F_ C_R-D1-N_3 C_R-D1-N_R C_2-D2-C_3 C_2
145 1 1 1 2 1 7 1 1 2 23 1 1 2 1 3 5 5 2 1 5 28 2 3 3 1 1 1 2 4 1 1 4 9 3
146 1 4 24 2 4 3 3 4 5 5 14 1 1 2 3 22 1 3 4 4 1 1 1 1 2 2 5 1 4 21 3 1...
147
148 FingerprintsVector;TopologicalAtomPairs:EStateAtomTypes:MinDistance1:M
149 axDistance10;251;NumericalValues;IDsAndValuesString;aaCH-D1-aaCH aaCH-
150 D1-aasC aasC-D1-aasC aasC-D1-aasN aasC-D1-dssC aasC-D1-sF aasC-D1-ssNH
151 aasC-D1-sssCH aasN-D1-ssCH2 dO-D1-dssC dssC-D1-sOH dssC-D1-ssCH2 d...;
152 10 8 5 2 1 1 1 1 1 2 1 1 1 2 2 1 4 10 12 2 2 6 3 1 3 2 2 1 1 1 1 1 1 1
153 1 1 5 2 1 1 6 12 2 2 2 2 6 1 3 2 2 5 2 2 1 2 1 1 1 1 1 1 3 1 3 19 2...
154
155 FingerprintsVector;TopologicalAtomPairs:FunctionalClassAtomTypes:MinDi
156 stance1:MaxDistance10;144;NumericalValues;IDsAndValuesString;Ar-D1-Ar
157 Ar-D1-Ar.HBA Ar-D1-HBD Ar-D1-Hal Ar-D1-None Ar.HBA-D1-None HBA-D1-NI H
158 BA-D1-None HBA.HBD-D1-NI HBA.HBD-D1-None HBD-D1-None NI-D1-None No...;
159 23 2 1 1 2 1 1 1 1 2 1 1 7 28 3 1 3 2 8 2 1 1 1 5 1 5 24 3 3 4 2 13 4
160 1 1 4 1 5 22 4 4 3 1 19 1 1 1 1 1 2 2 3 1 1 8 25 4 5 2 3 1 26 1 4 1 ...
161
162 FingerprintsVector;TopologicalAtomPairs:MMFF94AtomTypes:MinDistance1:M
163 axDistance10;227;NumericalValues;IDsAndValuesPairsString;C5A-D1-C5B 2
164 C5A-D1-CB 1 C5A-D1-CR 1 C5A-D1-N5 2 C5B-D1-C5B 1 C5B-D1-C=ON 1 C5B-D1-
165 CB 1 C=ON-D1-NC=O 1 C=ON-D1-O=CN 1 CB-D1-CB 18 CB-D1-F 1 CB-D1-NC=O 1
166 COO-D1-CR 1 COO-D1-O=CO 1 COO-D1-OC=O 1 CR-D1-CR 7 CR-D1-N5 1 CR-D1-OR
167 2 C5A-D2-C5A 1 C5A-D2-C5B 2 C5A-D2-C=ON 1 C5A-D2-CB 3 C5A-D2-CR 4 ...
168
169 FingerprintsVector;TopologicalAtomPairs:SLogPAtomTypes:MinDistance1:Ma
170 xDistance10;329;NumericalValues;IDsAndValuesPairsString;C1-D1-C10 1 C1
171 -D1-C11 2 C1-D1-C5 1 C1-D1-CS 4 C10-D1-N11 1 C11-D1-C21 1 C14-D1-C18 2
172 C14-D1-F 1 C18-D1-C18 10 C18-D1-C20 4 C18-D1-C22 2 C20-D1-C20 3 C20-D
173 1-C21 1 C20-D1-N11 1 C21-D1-C21 1 C21-D1-C5 1 C21-D1-N11 1 C22-D1-N4 1
174 C5-D1-N4 1 C5-D1-O10 1 C5-D1-O2 1 C5-D1-O9 1 CS-D1-O2 2 C1-D2-C1 3...
175
176 FingerprintsVector;TopologicalAtomPairs:SYBYLAtomTypes:MinDistance1:Ma
177 xDistance10;159;NumericalValues;IDsAndValuesPairsString;C.2-D1-C.3 1 C
178 .2-D1-C.ar 1 C.2-D1-N.am 1 C.2-D1-O.2 1 C.2-D1-O.co2 2 C.3-D1-C.3 7 C.
179 3-D1-C.ar 1 C.3-D1-N.ar 1 C.3-D1-O.3 2 C.ar-D1-C.ar 23 C.ar-D1-F 1 C.a
180 r-D1-N.am 1 C.ar-D1-N.ar 2 C.2-D2-C.3 1 C.2-D2-C.ar 3 C.3-D2-C.3 5 C.3
181 -D2-C.ar 5 C.3-D2-N.ar 2 C.3-D2-O.3 4 C.3-D2-O.co2 2 C.ar-D2-C.ar 2...
182
183 FingerprintsVector;TopologicalAtomPairs:TPSAAtomTypes:MinDistance1:Max
184 Distance10;64;NumericalValues;IDsAndValuesPairsString;N21-D1-None 3 N7
185 -D1-None 2 None-D1-None 34 None-D1-O3 2 None-D1-O4 3 N21-D2-None 5 N7-
186 D2-None 3 N7-D2-O3 1 None-D2-None 44 None-D2-O3 2 None-D2-O4 5 O3-D2-O
187 4 1 N21-D3-None 7 N7-D3-None 4 None-D3-None 45 None-D3-O3 4 None-D3-O4
188 5 N21-D4-N7 1 N21-D4-None 5 N21-D4-O3 1 N21-D4-O4 1 N7-D4-None 4 N...
189
190 FingerprintsVector;TopologicalAtomPairs:UFFAtomTypes:MinDistance1:MaxD
191 istance10;157;NumericalValues;IDsAndValuesPairsString;C_2-D1-C_3 1 C_2
192 -D1-C_R 1 C_2-D1-N_3 1 C_2-D1-O_2 2 C_2-D1-O_3 1 C_3-D1-C_3 7 C_3-D1-C
193 _R 1 C_3-D1-N_R 1 C_3-D1-O_3 2 C_R-D1-C_R 23 C_R-D1-F_ 1 C_R-D1-N_3 1
194 C_R-D1-N_R 2 C_2-D2-C_3 1 C_2-D2-C_R 3 C_3-D2-C_3 5 C_3-D2-C_R 5 C_3-D
195 2-N_R 2 C_3-D2-O_2 1 C_3-D2-O_3 5 C_R-D2-C_R 28 C_R-D2-F_ 2 C_R-D2-...
196
197 OPTIONS
198 --AromaticityModel *MDLAromaticityModel | TriposAromaticityModel |
199 MMFFAromaticityModel | ChemAxonBasicAromaticityModel |
200 ChemAxonGeneralAromaticityModel | DaylightAromaticityModel |
201 MayaChemToolsAromaticityModel*
202 Specify aromaticity model to use during detection of aromaticity.
203 Possible values in the current release are: *MDLAromaticityModel,
204 TriposAromaticityModel, MMFFAromaticityModel,
205 ChemAxonBasicAromaticityModel, ChemAxonGeneralAromaticityModel,
206 DaylightAromaticityModel or MayaChemToolsAromaticityModel*. Default
207 value: *MayaChemToolsAromaticityModel*.
208
209 The supported aromaticity model names along with model specific
210 control parameters are defined in AromaticityModelsData.csv, which
211 is distributed with the current release and is available under
212 lib/data directory. Molecule.pm module retrieves data from this file
213 during class instantiation and makes it available to method
214 DetectAromaticity for detecting aromaticity corresponding to a
215 specific model.
216
217 -a, --AtomIdentifierType *AtomicInvariantsAtomTypes | DREIDINGAtomTypes
218 | EStateAtomTypes | FunctionalClassAtomTypes | MMFF94AtomTypes |
219 SLogPAtomTypes | SYBYLAtomTypes | TPSAAtomTypes | UFFAtomTypes*
220 Specify atom identifier type to use for assignment of initial atom
221 identifier to non-hydrogen atoms during calculation of topological
222 atom pairs fingerprints. Possible values in the current release are:
223 *AtomicInvariantsAtomTypes, DREIDINGAtomTypes, EStateAtomTypes,
224 FunctionalClassAtomTypes, MMFF94AtomTypes, SLogPAtomTypes,
225 SYBYLAtomTypes, TPSAAtomTypes, UFFAtomTypes*. Default value:
226 *AtomicInvariantsAtomTypes*.
227
228 --AtomicInvariantsToUse *"AtomicInvariant,AtomicInvariant..."*
229 This value is used during *AtomicInvariantsAtomTypes* value of a,
230 --AtomIdentifierType option. It's a list of comma separated valid
231 atomic invariant atom types.
232
233 Possible values for atomic invariants are: *AS, X, BO, LBO, SB, DB,
234 TB, H, Ar, RA, FC, MN, SM*. Default value: *AS,X,BO,H,FC*.
235
236 The atomic invariants abbreviations correspond to:
237
238 AS = Atom symbol corresponding to element symbol
239
240 X<n> = Number of non-hydrogen atom neighbors or heavy atoms
241 BO<n> = Sum of bond orders to non-hydrogen atom neighbors or heavy atoms
242 LBO<n> = Largest bond order of non-hydrogen atom neighbors or heavy atoms
243 SB<n> = Number of single bonds to non-hydrogen atom neighbors or heavy atoms
244 DB<n> = Number of double bonds to non-hydrogen atom neighbors or heavy atoms
245 TB<n> = Number of triple bonds to non-hydrogen atom neighbors or heavy atoms
246 H<n> = Number of implicit and explicit hydrogens for atom
247 Ar = Aromatic annotation indicating whether atom is aromatic
248 RA = Ring atom annotation indicating whether atom is a ring
249 FC<+n/-n> = Formal charge assigned to atom
250 MN<n> = Mass number indicating isotope other than most abundant isotope
251 SM<n> = Spin multiplicity of atom. Possible values: 1 (singlet), 2 (doublet) or
252 3 (triplet)
253
254 Atom type generated by AtomTypes::AtomicInvariantsAtomTypes class
255 corresponds to:
256
257 AS.X<n>.BO<n>.LBO<n>.<SB><n>.<DB><n>.<TB><n>.H<n>.Ar.RA.FC<+n/-n>.MN<n>.SM<n>
258
259 Except for AS which is a required atomic invariant in atom types,
260 all other atomic invariants are optional. Atom type specification
261 doesn't include atomic invariants with zero or undefined values.
262
263 In addition to usage of abbreviations for specifying atomic
264 invariants, the following descriptive words are also allowed:
265
266 X : NumOfNonHydrogenAtomNeighbors or NumOfHeavyAtomNeighbors
267 BO : SumOfBondOrdersToNonHydrogenAtoms or SumOfBondOrdersToHeavyAtoms
268 LBO : LargestBondOrderToNonHydrogenAtoms or LargestBondOrderToHeavyAtoms
269 SB : NumOfSingleBondsToNonHydrogenAtoms or NumOfSingleBondsToHeavyAtoms
270 DB : NumOfDoubleBondsToNonHydrogenAtoms or NumOfDoubleBondsToHeavyAtoms
271 TB : NumOfTripleBondsToNonHydrogenAtoms or NumOfTripleBondsToHeavyAtoms
272 H : NumOfImplicitAndExplicitHydrogens
273 Ar : Aromatic
274 RA : RingAtom
275 FC : FormalCharge
276 MN : MassNumber
277 SM : SpinMultiplicity
278
279 *AtomTypes::AtomicInvariantsAtomTypes* module is used to assign
280 atomic invariant atom types.
281
282 --FunctionalClassesToUse *"FunctionalClass1,FunctionalClass2..."*
283 This value is used during *FunctionalClassAtomTypes* value of a,
284 --AtomIdentifierType option. It's a list of comma separated valid
285 functional classes.
286
287 Possible values for atom functional classes are: *Ar, CA, H, HBA,
288 HBD, Hal, NI, PI, RA*. Default value [ Ref 24 ]:
289 *HBD,HBA,PI,NI,Ar,Hal*.
290
291 The functional class abbreviations correspond to:
292
293 HBD: HydrogenBondDonor
294 HBA: HydrogenBondAcceptor
295 PI : PositivelyIonizable
296 NI : NegativelyIonizable
297 Ar : Aromatic
298 Hal : Halogen
299 H : Hydrophobic
300 RA : RingAtom
301 CA : ChainAtom
302
303 Functional class atom type specification for an atom corresponds to:
304
305 Ar.CA.H.HBA.HBD.Hal.NI.PI.RA
306
307 *AtomTypes::FunctionalClassAtomTypes* module is used to assign
308 functional class atom types. It uses following definitions [ Ref
309 60-61, Ref 65-66 ]:
310
311 HydrogenBondDonor: NH, NH2, OH
312 HydrogenBondAcceptor: N[!H], O
313 PositivelyIonizable: +, NH2
314 NegativelyIonizable: -, C(=O)OH, S(=O)OH, P(=O)OH
315
316 --CompoundID *DataFieldName or LabelPrefixString*
317 This value is --CompoundIDMode specific and indicates how compound
318 ID is generated.
319
320 For *DataField* value of --CompoundIDMode option, it corresponds to
321 datafield label name whose value is used as compound ID; otherwise,
322 it's a prefix string used for generating compound IDs like
323 LabelPrefixString<Number>. Default value, *Cmpd*, generates compound
324 IDs which look like Cmpd<Number>.
325
326 Examples for *DataField* value of --CompoundIDMode:
327
328 MolID
329 ExtReg
330
331 Examples for *LabelPrefix* or *MolNameOrLabelPrefix* value of
332 --CompoundIDMode:
333
334 Compound
335
336 The value specified above generates compound IDs which correspond to
337 Compound<Number> instead of default value of Cmpd<Number>.
338
339 --CompoundIDLabel *text*
340 Specify compound ID column label for CSV/TSV text file(s) used
341 during *CompoundID* value of --DataFieldsMode option. Default value:
342 *CompoundID*.
343
344 --CompoundIDMode *DataField | MolName | LabelPrefix |
345 MolNameOrLabelPrefix*
346 Specify how to generate compound IDs and write to FP or CSV/TSV text
347 file(s) along with generated fingerprints for *FP | text | all*
348 values of --output option: use a *SDFile(s)* datafield value; use
349 molname line from *SDFile(s)*; generate a sequential ID with
350 specific prefix; use combination of both MolName and LabelPrefix
351 with usage of LabelPrefix values for empty molname lines.
352
353 Possible values: *DataField | MolName | LabelPrefix |
354 MolNameOrLabelPrefix*. Default value: *LabelPrefix*.
355
356 For *MolNameAndLabelPrefix* value of --CompoundIDMode, molname line
357 in *SDFile(s)* takes precedence over sequential compound IDs
358 generated using *LabelPrefix* and only empty molname values are
359 replaced with sequential compound IDs.
360
361 This is only used for *CompoundID* value of --DataFieldsMode option.
362
363 --DataFields *"FieldLabel1,FieldLabel2,..."*
364 Comma delimited list of *SDFiles(s)* data fields to extract and
365 write to CSV/TSV text file(s) along with generated fingerprints for
366 *text | both* values of --output option.
367
368 This is only used for *Specify* value of --DataFieldsMode option.
369
370 Examples:
371
372 Extreg
373 MolID,CompoundName
374
375 -d, --DataFieldsMode *All | Common | Specify | CompoundID*
376 Specify how data fields in *SDFile(s)* are transferred to output
377 CSV/TSV text file(s) along with generated fingerprints for *text |
378 both* values of --output option: transfer all SD data field;
379 transfer SD data files common to all compounds; extract specified
380 data fields; generate a compound ID using molname line, a compound
381 prefix, or a combination of both. Possible values: *All | Common |
382 specify | CompoundID*. Default value: *CompoundID*.
383
384 -f, --Filter *Yes | No*
385 Specify whether to check and filter compound data in SDFile(s).
386 Possible values: *Yes or No*. Default value: *Yes*.
387
388 By default, compound data is checked before calculating fingerprints
389 and compounds containing atom data corresponding to non-element
390 symbols or no atom data are ignored.
391
392 --FingerprintsLabel *text*
393 SD data label or text file column label to use for fingerprints
394 string in output SD or CSV/TSV text file(s) specified by --output.
395 Default value: *TopologicalAtomPairsFingerprints*.
396
397 -h, --help
398 Print this help message.
399
400 -k, --KeepLargestComponent *Yes | No*
401 Generate fingerprints for only the largest component in molecule.
402 Possible values: *Yes or No*. Default value: *Yes*.
403
404 For molecules containing multiple connected components, fingerprints
405 can be generated in two different ways: use all connected components
406 or just the largest connected component. By default, all atoms
407 except for the largest connected component are deleted before
408 generation of fingerprints.
409
410 --MinDistance *number*
411 Minimum bond distance between atom pairs for generating topological
412 atom pairs. Default value: *1*. Valid values: positive integers and
413 less than --MaxDistance.
414
415 --MaxDistance *number*
416 Maximum bond distance between atom pairs for generating topological
417 atom pairs. Default value: *10*. Valid values: positive integers and
418 greater than --MinDistance.
419
420 --OutDelim *comma | tab | semicolon*
421 Delimiter for output CSV/TSV text file(s). Possible values: *comma,
422 tab, or semicolon* Default value: *comma*
423
424 --output *SD | FP | text | all*
425 Type of output files to generate. Possible values: *SD, FP, text, or
426 all*. Default value: *text*.
427
428 -o, --overwrite
429 Overwrite existing files.
430
431 -q, --quote *Yes | No*
432 Put quote around column values in output CSV/TSV text file(s).
433 Possible values: *Yes or No*. Default value: *Yes*.
434
435 -r, --root *RootName*
436 New file name is generated using the root: <Root>.<Ext>. Default for
437 new file names: <SDFileName><TopologicalAtomPairsFP>.<Ext>. The file
438 type determines <Ext> value. The sdf, fpf, csv, and tsv <Ext> values
439 are used for SD, FP, comma/semicolon, and tab delimited text files,
440 respectively.This option is ignored for multiple input files.
441
442 -v, --VectorStringFormat *IDsAndValuesString | IDsAndValuesPairsString |
443 ValuesAndIDsString | ValuesAndIDsPairsString*
444 Format of fingerprints vector string data in output SD, FP or
445 CSV/TSV text file(s) specified by --output option. Possible values:
446 *IDsAndValuesString | IDsAndValuesPairsString | ValuesAndIDsString |
447 ValuesAndIDsPairsString*. Default value: *IDsAndValuesString*.
448
449 Examples:
450
451 FingerprintsVector;TopologicalAtomPairs:AtomicInvariantsAtomTypes:MinD
452 istance1:MaxDistance10;223;NumericalValues;IDsAndValuesString;C.X1.BO1
453 .H3-D1-C.X3.BO3.H1 C.X2.BO2.H2-D1-C.X2.BO2.H2 C.X2.BO2.H2-D1-C.X3.BO3.
454 H1 C.X2.BO2.H2-D1-C.X3.BO4 C.X2.BO2.H2-D1-N.X3.BO3 C.X2.BO3.H1-D1-...;
455 2 1 4 1 1 10 8 1 2 6 1 2 2 1 2 1 2 2 1 2 1 5 1 10 12 2 2 1 2 1 9 1 3 1
456 1 1 2 2 1 3 6 1 6 14 2 2 2 3 1 3 1 8 2 2 1 3 2 6 1 2 2 5 1 3 1 23 1...
457
458 FingerprintsVector;TopologicalAtomPairs:AtomicInvariantsAtomTypes:MinD
459 istance1:MaxDistance10;223;NumericalValues;IDsAndValuesPairsString;C.X
460 1.BO1.H3-D1-C.X3.BO3.H1 2 C.X2.BO2.H2-D1-C.X2.BO2.H2 1 C.X2.BO2.H2-D1-
461 C.X3.BO3.H1 4 C.X2.BO2.H2-D1-C.X3.BO4 1 C.X2.BO2.H2-D1-N.X3.BO3 1 C.X2
462 .BO3.H1-D1-C.X2.BO3.H1 10 C.X2.BO3.H1-D1-C.X3.BO4 8 C.X3.BO3.H1-D1-C.X
463 3.BO4 1 C.X3.BO3.H1-D1-O.X1.BO1.H1 2 C.X3.BO4-D1-C.X3.BO4 6 C.X3.BO...
464
465 -w, --WorkingDir *DirName*
466 Location of working directory. Default value: current directory.
467
468 EXAMPLES
469 To generate topological atom pairs fingerprints corresponding to bond
470 distances from 1 through 10 using atomic invariants atom types in
471 IDsAndValuesString format and create a SampleTAPFP.csv file containing
472 sequential compound IDs along with fingerprints vector strings data,
473 type:
474
475 % TopologicalAtomPairsFingerprints.pl -r SampleTAPFP -o Sample.sdf
476
477 To generate topological atom pairs fingerprints corresponding to bond
478 distances from 1 through 10 using atomic invariants atom types in
479 IDsAndValuesString format and create SampleTAPFP.sdf, SampleTAPFP.fpf
480 and SampleTAPFP.csv files containing sequential compound IDs in CSV file
481 along with fingerprints vector strings data, type:
482
483 % TopologicalAtomPairsFingerprints.pl --output all -r SampleTAPFP
484 -o Sample.sdf
485
486 To generate topological atom pairs fingerprints corresponding to bond
487 distances from 1 through 10 using DREIDING atom types in
488 IDsAndValuesString format and create a SampleTAPFP.csv file containing
489 sequential compound IDs along with fingerprints vector strings data,
490 type:
491
492 % TopologicalAtomPairsFingerprints.pl -a DREIDINGAtomTypes
493 -r SampleTAPFP -o Sample.sdf
494
495 To generate topological atom pairs fingerprints corresponding to bond
496 distances from 1 through 10 using E-state types in IDsAndValuesString
497 format and create a SampleTAPFP.csv file containing sequential compound
498 IDs along with fingerprints vector strings data, type:
499
500 % TopologicalAtomPairsFingerprints.pl -a EStateAtomTypes
501 -r SampleTAPFP -o Sample.sdf
502
503 To generate topological atom pairs fingerprints corresponding to bond
504 distances from 1 through 10 using DREIDING atom types in
505 IDsAndValuesString format and create a SampleTAPFP.csv file containing
506 sequential compound IDs along with fingerprints vector strings data,
507 type:
508
509 % TopologicalAtomPairsFingerprints.pl -a DREIDINGAtomTypes
510 -r SampleTAPFP -o Sample.sdf
511
512 To generate topological atom pairs fingerprints corresponding to bond
513 distances from 1 through 10 using functional class atom types in
514 IDsAndValuesString format and create a SampleTAPFP.csv file containing
515 sequential compound IDs along with fingerprints vector strings data,
516 type:
517
518 % TopologicalAtomPairsFingerprints.pl -a FunctionalClassAtomTypes
519 -r SampleTAPFP -o Sample.sdf
520
521 To generate topological atom pairs fingerprints corresponding to bond
522 distances from 1 through 10 using MMFF94 atom types in
523 IDsAndValuesString format and create a SampleTAPFP.csv file containing
524 sequential compound IDs along with fingerprints vector strings data,
525 type:
526
527 % TopologicalAtomPairsFingerprints.pl -a MMFF94AtomTypes
528 -r SampleTAPFP -o Sample.sdf
529
530 To generate topological atom pairs fingerprints corresponding to bond
531 distances from 1 through 10 using SLogP atom types in IDsAndValuesString
532 format and create a SampleTAPFP.csv file containing sequential compound
533 IDs along with fingerprints vector strings data, type:
534
535 % TopologicalAtomPairsFingerprints.pl -a SLogPAtomTypes
536 -r SampleTAPFP -o Sample.sdf
537
538 To generate topological atom pairs fingerprints corresponding to bond
539 distances from 1 through 10 using SYBYL atom types in IDsAndValuesString
540 format and create a SampleTAPFP.csv file containing sequential compound
541 IDs along with fingerprints vector strings data, type:
542
543 % TopologicalAtomPairsFingerprints.pl -a SYBYLAtomTypes
544 -r SampleTAPFP -o Sample.sdf
545
546 To generate topological atom pairs fingerprints corresponding to bond
547 distances from 1 through 10 using TPSA atom types in IDsAndValuesString
548 format and create a SampleTAPFP.csv file containing sequential compound
549 IDs along with fingerprints vector strings data, type:
550
551 % TopologicalAtomPairsFingerprints.pl -a TPSAAtomTypes
552 -r SampleTAPFP -o Sample.sdf
553
554 To generate topological atom pairs fingerprints corresponding to bond
555 distances from 1 through 10 using UFF atom types in IDsAndValuesString
556 format and create a SampleTAPFP.csv file containing sequential compound
557 IDs along with fingerprints vector strings data, type:
558
559 % TopologicalAtomPairsFingerprints.pl -a UFFAtomTypes
560 -r SampleTAPFP -o Sample.sdf
561
562 To generate topological atom pairs fingerprints corresponding to bond
563 distances from 1 through 10 using atomic invariants atom types in
564 IDsAndValuesPairsString format and create a SampleTAPFP.csv file
565 containing sequential compound IDs along with fingerprints vector
566 strings data, type:
567
568 % TopologicalAtomPairsFingerprints.pl --VectorStringFormat
569 IDsAndValuesPairsString -r SampleTAPFP -o Sample.sdf
570
571 To generate topological atom pairs fingerprints corresponding to bond
572 distances from 1 through 6 using atomic invariants atom types in
573 IDsAndValuesString format and create a SampleTAPFP.csv file containing
574 sequential compound IDs along with fingerprints vector strings data,
575 type:
576
577 % TopologicalAtomPairsFingerprints.pl -a AtomicInvariantsAtomTypes
578 --MinDistance 1 --MaxDistance 6 -r SampleTAPFP -o Sample.sdf
579
580 To generate topological atom pairs fingerprints corresponding to bond
581 distances from 1 through 10 using only AS,X atomic invariants atom types
582 in IDsAndValuesString format and create a SampleTAPFP.csv file
583 containing sequential compound IDs along with fingerprints vector
584 strings data, type:
585
586 % TopologicalAtomPairsFingerprints.pl -a AtomicInvariantsAtomTypes
587 --AtomicInvariantsToUse "AS,X" --MinDistance 1 --MaxDistance 6
588 -r SampleTAPFP -o Sample.sdf
589
590 To generate topological atom pairs fingerprints corresponding to bond
591 distances from 1 through 10 using atomic invariants atom types in
592 IDsAndValuesString format and create a SampleTAPFP.csv file containing
593 compound ID from molecule name line along with fingerprints vector
594 strings data, type:
595
596 % TopologicalAtomPairsFingerprints.pl -a AtomicInvariantsAtomTypes
597 --DataFieldsMode CompoundID -CompoundIDMode MolName
598 -r SampleTAPFP -o Sample.sdf
599
600 To generate topological atom pairs fingerprints corresponding to bond
601 distances from 1 through 10 using atomic invariants atom types in
602 IDsAndValuesString format and create a SampleTAPFP.csv file containing
603 compound IDs using specified data field along with fingerprints vector
604 strings data, type:
605
606 % TopologicalAtomPairsFingerprints.pl -a AtomicInvariantsAtomTypes
607 --DataFieldsMode CompoundID -CompoundIDMode DataField --CompoundID
608 Mol_ID -r SampleTAPFP -o Sample.sdf
609
610 To generate topological atom pairs fingerprints corresponding to bond
611 distances from 1 through 10 using atomic invariants atom types in
612 IDsAndValuesString format and create a SampleTAPFP.csv file containing
613 compound ID using combination of molecule name line and an explicit
614 compound prefix along with fingerprints vector strings data, type:
615
616 % TopologicalAtomPairsFingerprints.pl -a AtomicInvariantsAtomTypes
617 --DataFieldsMode CompoundID -CompoundIDMode MolnameOrLabelPrefix
618 --CompoundID Cmpd --CompoundIDLabel MolID -r SampleTAPFP -o Sample.sdf
619
620 To generate topological atom pairs fingerprints corresponding to bond
621 distances from 1 through 10 using atomic invariants atom types in
622 IDsAndValuesString format and create a SampleTAPFP.csv file containing
623 specific data fields columns along with fingerprints vector strings
624 data, type:
625
626 % TopologicalAtomPairsFingerprints.pl -a AtomicInvariantsAtomTypes
627 --DataFieldsMode Specify --DataFields Mol_ID -r SampleTAPFP
628 -o Sample.sdf
629
630 To generate topological atom pairs fingerprints corresponding to bond
631 distances from 1 through 10 using atomic invariants atom types in
632 IDsAndValuesString format and create a SampleTAPFP.csv file containing
633 common data fields columns along with fingerprints vector strings data,
634 type:
635
636 % TopologicalAtomPairsFingerprints.pl -a AtomicInvariantsAtomTypes
637 --DataFieldsMode Common -r SampleTAPFP -o Sample.sdf
638
639 To generate topological atom pairs fingerprints corresponding to bond
640 distances from 1 through 10 using atomic invariants atom types in
641 IDsAndValuesString format and create SampleTAPFP.sdf, SampleTAPFP.fpf
642 and SampleTAPFP.csv files containing all data fields columns in CSV file
643 along with fingerprints data, type:
644
645 % TopologicalAtomPairsFingerprints.pl -a AtomicInvariantsAtomTypes
646 --DataFieldsMode All --output all -r SampleTAPFP
647 -o Sample.sdf
648
649 AUTHOR
650 Manish Sud <msud@san.rr.com>
651
652 SEE ALSO
653 InfoFingerprintsFiles.pl, SimilarityMatricesFingerprints.pl,
654 AtomNeighborhoodsFingerprints.pl, ExtendedConnectivityFingerprints.pl,
655 MACCSKeysFingerprints.pl, PathLengthFingerprints.pl,
656 TopologicalAtomTorsionsFingerprints.pl,
657 TopologicalPharmacophoreAtomPairsFingerprints.pl,
658 TopologicalPharmacophoreAtomTripletsFingerprints.pl
659
660 COPYRIGHT
661 Copyright (C) 2015 Manish Sud. All rights reserved.
662
663 This file is part of MayaChemTools.
664
665 MayaChemTools is free software; you can redistribute it and/or modify it
666 under the terms of the GNU Lesser General Public License as published by
667 the Free Software Foundation; either version 3 of the License, or (at
668 your option) any later version.
669