comparison docs/scripts/txt/AtomNeighborhoodsFingerprints.txt @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:4816e4a8ae95
1 NAME
2 AtomNeighborhoodsFingerprints.pl - Generate atom neighborhoods
3 fingerprints for SD files
4
5 SYNOPSIS
6 AtomNeighborhoodsFingerprints.pl SDFile(s)...
7
8 AtomNeighborhoodsFingerprints.pl [--AromaticityModel
9 *AromaticityModelType*] [-a, --AtomIdentifierType
10 *AtomicInvariantsAtomTypes | DREIDINGAtomTypes | EStateAtomTypes |
11 MMFF94AtomTypes | SLogPAtomTypes | SYBYLAtomTypes | TPSAAtomTypes |
12 UFFAtomTypes*] [--AtomicInvariantsToUse
13 *"AtomicInvariant,AtomicInvariant..."*] [--FunctionalClassesToUse
14 *"FunctionalClass1,FunctionalClass2..."*] [--CompoundID *DataFieldName
15 or LabelPrefixString*] [--CompoundIDLabel *text*] [--CompoundIDMode]
16 [--DataFields *"FieldLabel1,FieldLabel2,..."*] [-d, --DataFieldsMode
17 *All | Common | Specify | CompoundID*] [-f, --Filter *Yes | No*]
18 [--FingerprintsLabel *text*] [-h, --help] [-k, --KeepLargestComponent
19 *Yes | No*] [--MinNeighborhoodRadius *number*] [--MaxNeighborhoodRadius
20 *number*] [--OutDelim *comma | tab | semicolon*] [--output *SD | FP |
21 text | all*] [-o, --overwrite] [-q, --quote *Yes | No*] [-r, --root
22 *RootName*] [-w, --WorkingDir dirname] SDFile(s)...
23
24 DESCRIPTION
25 Generate atom neighborhoods fingerprints [ Ref 53-56, Ref 73 ] for
26 *SDFile(s)* and create appropriate SD, FP or CSV/TSV text file(s)
27 containing fingerprints vector strings corresponding to molecular
28 fingerprints.
29
30 Multiple SDFile names are separated by spaces. The valid file extensions
31 are *.sdf* and *.sd*. All other file names are ignored. All the SD files
32 in a current directory can be specified either by **.sdf* or the current
33 directory name.
34
35 The current release of MayaChemTools supports generation of atom
36 neighborhoods fingerprints corresponding to following -a,
37 --AtomIdentifierTypes:
38
39 AtomicInvariantsAtomTypes, DREIDINGAtomTypes, EStateAtomTypes,
40 FunctionalClassAtomTypes, MMFF94AtomTypes, SLogPAtomTypes,
41 SYBYLAtomTypes, TPSAAtomTypes, UFFAtomTypes
42
43 Based on the values specified for -a, --AtomIdentifierType and
44 --AtomicInvariantsToUse, initial atom types are assigned to all
45 non-hydrogen atoms in a molecule. Using atom neighborhoods around each
46 non-hydrogen central atom corresponding to radii between specified
47 values --MinNeighborhoodRadius and --MaxNeighborhoodRadius, unique atom
48 types at each radii level are counted and an atom neighborhood
49 identifier is generated.
50
51 The format of an atom neighborhood identifier around a central
52 non-hydrogen atom at a specific radius is:
53
54 NR<n>-<AtomType>-ATC<n>
55
56 NR: Neighborhood radius
57 AtomType: Assigned atom type
58 ATC: Atom type count
59
60 The atom neighborhood identifier for a non-hydrogen central atom
61 corresponding to all specified radii is generated by concatenating
62 neighborhood identifiers at each radii by colon as a delimiter:
63
64 NR<n>-<AtomType>-ATC<n>:NR<n>-<AtomType>-ATC<n>:...
65
66 The atom neighborhood identifiers for all non-hydrogen central atoms at
67 all specified radii are concatenated using space as a delimiter and
68 constitute atom neighborhood fingerprint of the molecule.
69
70 Example of *SD* file containing atom neighborhood fingerprints string
71 data:
72
73 ... ...
74 ... ...
75 $$$$
76 ... ...
77 ... ...
78 ... ...
79 41 44 0 0 0 0 0 0 0 0999 V2000
80 -3.3652 1.4499 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
81 ... ...
82 2 3 1 0 0 0 0
83 ... ...
84 M END
85 > <CmpdID>
86 Cmpd1
87
88 > <AtomNeighborhoodsFingerprints>
89 FingerprintsVector;AtomNeighborhoods:AtomicInvariantsAtomTypes:MinRadiu
90 s0:MaxRadius2;41;AlphaNumericalValues;ValuesString;NR0-C.X1.BO1.H3-ATC1
91 :NR1-C.X3.BO3.H1-ATC1:NR2-C.X1.BO1.H3-ATC1:NR2-C.X3.BO4-ATC1 NR0-C.X1.B
92 O1.H3-ATC1:NR1-C.X3.BO3.H1-ATC1:NR2-C.X1.BO1.H3-ATC1:NR2-C.X3.BO4-ATC1
93 NR0-C.X2.BO2.H2-ATC1:NR1-C.X2.BO2.H2-ATC1:NR1-C.X3.BO3.H1-ATC1:NR2-C...
94
95 $$$$
96 ... ...
97 ... ...
98
99 Example of *FP* file containing atom neighborhood fingerprints string
100 data:
101
102 #
103 # Package = MayaChemTools 7.4
104 # Release Date = Oct 21, 2010
105 #
106 # TimeStamp = Fri Mar 11 14:15:27 2011
107 #
108 # FingerprintsStringType = FingerprintsVector
109 #
110 # Description = AtomNeighborhoods:AtomicInvariantsAtomTypes:MinRadiu...
111 # VectorStringFormat = ValuesString
112 # VectorValuesType = AlphaNumericalValues
113 #
114 Cmpd1 41;NR0-C.X1.BO1.H3-ATC1:NR1-C.X3.BO3.H1-ATC1:NR2-C.X1.BO1.H3-A...
115 Cmpd2 23;NR0-C.X1.BO1.H3-ATC1:NR1-C.X2.BO2.H2-ATC1:NR2-C.X3.BO3.H1-A...
116 ... ...
117 ... ..
118
119 Example of CSV *Text* file containing atom neighborhood fingerprints
120 string data:
121
122 "CompoundID","AtomNeighborhoodsFingerprints"
123 "Cmpd1","FingerprintsVector;AtomNeighborhoods:AtomicInvariantsAtomTypes
124 :MinRadius0:MaxRadius2;41;AlphaNumericalValues;ValuesString;NR0-C.X1.B
125 O1.H3-ATC1:NR1-C.X3.BO3.H1-ATC1:NR2-C.X1.BO1.H3-ATC1:NR2-C.X3.BO4-ATC1
126 NR0-C.X1.BO1.H3-ATC1:NR1-C.X3.BO3.H1-ATC1:NR2-C.X1.BO1.H3-ATC1:NR2-C.X3
127 .BO4-ATC1 NR0-C.X2.BO2.H2-ATC1:NR1-C.X2.BO2.H2-ATC1:NR1-C.X3.BO3.H1..."
128 ... ...
129 ... ...
130
131 The current release of MayaChemTools generates the following types of
132 atom neighborhoods fingerprints vector strings:
133
134 FingerprintsVector;AtomNeighborhoods:AtomicInvariantsAtomTypes:MinRadi
135 us0:MaxRadius2;41;AlphaNumericalValues;ValuesString;NR0-C.X1.BO1.H3-AT
136 C1:NR1-C.X3.BO3.H1-ATC1:NR2-C.X1.BO1.H3-ATC1:NR2-C.X3.BO4-ATC1 NR0-C.X
137 1.BO1.H3-ATC1:NR1-C.X3.BO3.H1-ATC1:NR2-C.X1.BO1.H3-ATC1:NR2-C.X3.BO4-A
138 TC1 NR0-C.X2.BO2.H2-ATC1:NR1-C.X2.BO2.H2-ATC1:NR1-C.X3.BO3.H1-ATC1:NR2
139 -C.X2.BO2.H2-ATC1:NR2-N.X3.BO3-ATC1:NR2-O.X1.BO1.H1-ATC1 NR0-C.X2.B...
140
141 FingerprintsVector;AtomNeighborhoods:DREIDINGAtomTypes:MinRadius0:MaxR
142 adius2;41;AlphaNumericalValues;ValuesString;NR0-C_2-ATC1:NR1-C_3-ATC1:
143 NR1-O_2-ATC1:NR1-O_3-ATC1:NR2-C_3-ATC1 NR0-C_2-ATC1:NR1-C_R-ATC1:NR1-N
144 _3-ATC1:NR1-O_2-ATC1:NR2-C_R-ATC3 NR0-C_3-ATC1:NR1-C_2-ATC1:NR1-C_3-AT
145 C1:NR2-C_3-ATC1:NR2-O_2-ATC1:NR2-O_3-ATC2 NR0-C_3-ATC1:NR1-C_3-ATC1:NR
146 1-N_R-ATC1:NR2-C_3-ATC1:NR2-C_R-ATC2 NR0-C_3-ATC1:NR1-C_3-ATC1:NR2-...
147
148 FingerprintsVector;AtomNeighborhoods:EStateAtomTypes:MinRadius0:MaxRad
149 ius2;41;AlphaNumericalValues;ValuesString;NR0-aaCH-ATC1:NR1-aaCH-ATC1:
150 NR1-aasC-ATC1:NR2-aaCH-ATC1:NR2-aasC-ATC1:NR2-sF-ATC1 NR0-aaCH-ATC1:NR
151 1-aaCH-ATC1:NR1-aasC-ATC1:NR2-aaCH-ATC1:NR2-aasC-ATC1:NR2-sF-ATC1 NR0-
152 aaCH-ATC1:NR1-aaCH-ATC1:NR1-aasC-ATC1:NR2-aaCH-ATC1:NR2-aasC-ATC2 NR0-
153 aaCH-ATC1:NR1-aaCH-ATC1:NR1-aasC-ATC1:NR2-aaCH-ATC1:NR2-aasC-ATC2 N...
154
155 FingerprintsVector;AtomNeighborhoods:FunctionalClassAtomTypes:MinRadiu
156 s0:MaxRadius2;41;AlphaNumericalValues;ValuesString;NR0-Ar-ATC1:NR1-Ar-
157 ATC1:NR1-Ar.HBA-ATC1:NR1-None-ATC1:NR2-Ar-ATC2:NR2-None-ATC4 NR0-Ar-AT
158 C1:NR1-Ar-ATC2:NR1-Ar.HBA-ATC1:NR2-Ar-ATC5:NR2-None-ATC1 NR0-Ar-ATC1:N
159 R1-Ar-ATC2:NR1-HBD-ATC1:NR2-Ar-ATC2:NR2-None-ATC1 NR0-Ar-ATC1:NR1-Ar-A
160 TC2:NR1-Hal-ATC1:NR2-Ar-ATC2 NR0-Ar-ATC1:NR1-Ar-ATC2:NR1-None-ATC1:...
161
162 FingerprintsVector;AtomNeighborhoods:MMFF94AtomTypes:MinRadius0:MaxRad
163 ius2;41;AlphaNumericalValues;ValuesString;NR0-C5A-ATC1:NR1-C5B-ATC1:NR
164 1-CB-ATC1:NR1-N5-ATC1:NR2-C5A-ATC1:NR2-C5B-ATC1:NR2-CB-ATC3:NR2-CR-ATC
165 1 NR0-C5A-ATC1:NR1-C5B-ATC1:NR1-CR-ATC1:NR1-N5-ATC1:NR2-C5A-ATC1:NR2-C
166 5B-ATC1:NR2-C=ON-ATC1:NR2-CR-ATC3 NR0-C5B-ATC1:NR1-C5A-ATC1:NR1-C5B-AT
167 C1:NR1-C=ON-ATC1:NR2-C5A-ATC1:NR2-CB-ATC1:NR2-CR-ATC1:NR2-N5-ATC1:N...
168
169 FingerprintsVector;AtomNeighborhoods:SLogPAtomTypes:MinRadius0:MaxRadi
170 us2;41;AlphaNumericalValues;ValuesString;NR0-C1-ATC1:NR1-C10-ATC1:NR1-
171 CS-ATC1:NR2-C1-ATC1:NR2-N11-ATC1:NR2-O2-ATC1 NR0-C1-ATC1:NR1-C11-ATC1:
172 NR2-C1-ATC1:NR2-C21-ATC1 NR0-C1-ATC1:NR1-C11-ATC1:NR2-C1-ATC1:NR2-C21-
173 ATC1 NR0-C1-ATC1:NR1-C5-ATC1:NR1-CS-ATC1:NR2-C1-ATC1:NR2-O2-ATC2:NR2-O
174 9-ATC1 NR0-C1-ATC1:NR1-CS-ATC2:NR2-C1-ATC2:NR2-O2-ATC2 NR0-C10-ATC1...
175
176 FingerprintsVector;AtomNeighborhoods:SYBYLAtomTypes:MinRadius0:MaxRadi
177 us2;41;AlphaNumericalValues;ValuesString;NR0-C.2-ATC1:NR1-C.3-ATC1:NR1
178 -O.co2-ATC2:NR2-C.3-ATC1 NR0-C.2-ATC1:NR1-C.ar-ATC1:NR1-N.am-ATC1:NR1-
179 O.2-ATC1:NR2-C.ar-ATC3 NR0-C.3-ATC1:NR1-C.2-ATC1:NR1-C.3-ATC1:NR2-C.3-
180 ATC1:NR2-O.3-ATC1:NR2-O.co2-ATC2 NR0-C.3-ATC1:NR1-C.3-ATC1:NR1-N.ar-AT
181 C1:NR2-C.3-ATC1:NR2-C.ar-ATC2 NR0-C.3-ATC1:NR1-C.3-ATC1:NR2-C.3-ATC...
182
183 FingerprintsVector;AtomNeighborhoods:TPSAAtomTypes:MinRadius0:MaxRadiu
184 s2;41;AlphaNumericalValues;ValuesString;NR0-N21-ATC1:NR1-None-ATC3:NR2
185 -None-ATC5 NR0-N7-ATC1:NR1-None-ATC2:NR2-None-ATC3:NR2-O3-ATC1 NR0-Non
186 e-ATC1:NR1-N21-ATC1:NR1-None-ATC1:NR2-None-ATC3 NR0-None-ATC1:NR1-N21-
187 ATC1:NR1-None-ATC2:NR2-None-ATC6 NR0-None-ATC1:NR1-N21-ATC1:NR1-None-A
188 TC2:NR2-None-ATC6 NR0-None-ATC1:NR1-N7-ATC1:NR1-None-ATC1:NR1-O3-AT...
189
190 FingerprintsVector;AtomNeighborhoods:UFFAtomTypes:MinRadius0:MaxRadius
191 2;41;AlphaNumericalValues;ValuesString;NR0-C_2-ATC1:NR1-C_3-ATC1:NR1-O
192 _2-ATC1:NR1-O_3-ATC1:NR2-C_3-ATC1 NR0-C_2-ATC1:NR1-C_R-ATC1:NR1-N_3-AT
193 C1:NR1-O_2-ATC1:NR2-C_R-ATC3 NR0-C_3-ATC1:NR1-C_2-ATC1:NR1-C_3-ATC1:NR
194 2-C_3-ATC1:NR2-O_2-ATC1:NR2-O_3-ATC2 NR0-C_3-ATC1:NR1-C_3-ATC1:NR1-N_R
195 -ATC1:NR2-C_3-ATC1:NR2-C_R-ATC2 NR0-C_3-ATC1:NR1-C_3-ATC1:NR2-C_3-A...
196
197 OPTIONS
198 --AromaticityModel *MDLAromaticityModel | TriposAromaticityModel |
199 MMFFAromaticityModel | ChemAxonBasicAromaticityModel |
200 ChemAxonGeneralAromaticityModel | DaylightAromaticityModel |
201 MayaChemToolsAromaticityModel*
202 Specify aromaticity model to use during detection of aromaticity.
203 Possible values in the current release are: *MDLAromaticityModel,
204 TriposAromaticityModel, MMFFAromaticityModel,
205 ChemAxonBasicAromaticityModel, ChemAxonGeneralAromaticityModel,
206 DaylightAromaticityModel or MayaChemToolsAromaticityModel*. Default
207 value: *MayaChemToolsAromaticityModel*.
208
209 The supported aromaticity model names along with model specific
210 control parameters are defined in AromaticityModelsData.csv, which
211 is distributed with the current release and is available under
212 lib/data directory. Molecule.pm module retrieves data from this file
213 during class instantiation and makes it available to method
214 DetectAromaticity for detecting aromaticity corresponding to a
215 specific model.
216
217 -a, --AtomIdentifierType *AtomicInvariantsAtomTypes | DREIDINGAtomTypes
218 | EStateAtomTypes | FunctionalClassAtomTypes | MMFF94AtomTypes |
219 SLogPAtomTypes | SYBYLAtomTypes | TPSAAtomTypes | UFFAtomTypes*
220 Specify atom identifier type to use for assignment of initial atom
221 identifier to non-hydrogen atoms during calculation of atom
222 neighborhoods fingerprints. Possible values in the current release
223 are: *AtomicInvariantsAtomTypes, DREIDINGAtomTypes, EStateAtomTypes,
224 FunctionalClassAtomTypes, MMFF94AtomTypes, SLogPAtomTypes,
225 SYBYLAtomTypes, TPSAAtomTypes, UFFAtomTypes*. Default value:
226 *AtomicInvariantsAtomTypes*.
227
228 --AtomicInvariantsToUse *"AtomicInvariant,AtomicInvariant..."*
229 This value is used during *AtomicInvariantsAtomTypes* value of a,
230 --AtomIdentifierType option. It's a list of comma separated valid
231 atomic invariant atom types.
232
233 Possible values for atomic invariants are: *AS, X, BO, LBO, SB, DB,
234 TB, H, Ar, RA, FC, MN, SM*. Default value: *AS,X,BO,H,FC*.
235
236 The atomic invariants abbreviations correspond to:
237
238 AS = Atom symbol corresponding to element symbol
239
240 X<n> = Number of non-hydrogen atom neighbors or heavy atoms
241 BO<n> = Sum of bond orders to non-hydrogen atom neighbors or heavy atoms
242 LBO<n> = Largest bond order of non-hydrogen atom neighbors or heavy atoms
243 SB<n> = Number of single bonds to non-hydrogen atom neighbors or heavy atoms
244 DB<n> = Number of double bonds to non-hydrogen atom neighbors or heavy atoms
245 TB<n> = Number of triple bonds to non-hydrogen atom neighbors or heavy atoms
246 H<n> = Number of implicit and explicit hydrogens for atom
247 Ar = Aromatic annotation indicating whether atom is aromatic
248 RA = Ring atom annotation indicating whether atom is a ring
249 FC<+n/-n> = Formal charge assigned to atom
250 MN<n> = Mass number indicating isotope other than most abundant isotope
251 SM<n> = Spin multiplicity of atom. Possible values: 1 (singlet), 2 (doublet) or
252 3 (triplet)
253
254 Atom type generated by AtomTypes::AtomicInvariantsAtomTypes class
255 corresponds to:
256
257 AS.X<n>.BO<n>.LBO<n>.<SB><n>.<DB><n>.<TB><n>.H<n>.Ar.RA.FC<+n/-n>.MN<n>.SM<n>
258
259 Except for AS which is a required atomic invariant in atom types,
260 all other atomic invariants are optional. Atom type specification
261 doesn't include atomic invariants with zero or undefined values.
262
263 In addition to usage of abbreviations for specifying atomic
264 invariants, the following descriptive words are also allowed:
265
266 X : NumOfNonHydrogenAtomNeighbors or NumOfHeavyAtomNeighbors
267 BO : SumOfBondOrdersToNonHydrogenAtoms or SumOfBondOrdersToHeavyAtoms
268 LBO : LargestBondOrderToNonHydrogenAtoms or LargestBondOrderToHeavyAtoms
269 SB : NumOfSingleBondsToNonHydrogenAtoms or NumOfSingleBondsToHeavyAtoms
270 DB : NumOfDoubleBondsToNonHydrogenAtoms or NumOfDoubleBondsToHeavyAtoms
271 TB : NumOfTripleBondsToNonHydrogenAtoms or NumOfTripleBondsToHeavyAtoms
272 H : NumOfImplicitAndExplicitHydrogens
273 Ar : Aromatic
274 RA : RingAtom
275 FC : FormalCharge
276 MN : MassNumber
277 SM : SpinMultiplicity
278
279 *AtomTypes::AtomicInvariantsAtomTypes* module is used to assign
280 atomic invariant atom types.
281
282 --FunctionalClassesToUse *"FunctionalClass1,FunctionalClass2..."*
283 This value is used during *FunctionalClassAtomTypes* value of a,
284 --AtomIdentifierType option. It's a list of comma separated valid
285 functional classes.
286
287 Possible values for atom functional classes are: *Ar, CA, H, HBA,
288 HBD, Hal, NI, PI, RA*. Default value [ Ref 24 ]:
289 *HBD,HBA,PI,NI,Ar,Hal*.
290
291 The functional class abbreviations correspond to:
292
293 HBD: HydrogenBondDonor
294 HBA: HydrogenBondAcceptor
295 PI : PositivelyIonizable
296 NI : NegativelyIonizable
297 Ar : Aromatic
298 Hal : Halogen
299 H : Hydrophobic
300 RA : RingAtom
301 CA : ChainAtom
302
303 Functional class atom type specification for an atom corresponds to:
304
305 Ar.CA.H.HBA.HBD.Hal.NI.PI.RA
306
307 *AtomTypes::FunctionalClassAtomTypes* module is used to assign
308 functional class atom types. It uses following definitions [ Ref
309 60-61, Ref 65-66 ]:
310
311 HydrogenBondDonor: NH, NH2, OH
312 HydrogenBondAcceptor: N[!H], O
313 PositivelyIonizable: +, NH2
314 NegativelyIonizable: -, C(=O)OH, S(=O)OH, P(=O)OH
315
316 --CompoundID *DataFieldName or LabelPrefixString*
317 This value is --CompoundIDMode specific and indicates how compound
318 ID is generated.
319
320 For *DataField* value of --CompoundIDMode option, it corresponds to
321 datafield label name whose value is used as compound ID; otherwise,
322 it's a prefix string used for generating compound IDs like
323 LabelPrefixString<Number>. Default value, *Cmpd*, generates compound
324 IDs which look like Cmpd<Number>.
325
326 Examples for *DataField* value of --CompoundIDMode:
327
328 MolID
329 ExtReg
330
331 Examples for *LabelPrefix* or *MolNameOrLabelPrefix* value of
332 --CompoundIDMode:
333
334 Compound
335
336 The value specified above generates compound IDs which correspond to
337 Compound<Number> instead of default value of Cmpd<Number>.
338
339 --CompoundIDLabel *text*
340 Specify compound ID column label for FP or CSV/TSV text file(s) used
341 during *CompoundID* value of --DataFieldsMode option. Default:
342 *CompoundID*.
343
344 --CompoundIDMode *DataField | MolName | LabelPrefix |
345 MolNameOrLabelPrefix*
346 Specify how to generate compound IDs and write to FP or CSV/TSV text
347 file(s) along with generated fingerprints for *FP | text | all*
348 values of --output option: use a *SDFile(s)* datafield value; use
349 molname line from *SDFile(s)*; generate a sequential ID with
350 specific prefix; use combination of both MolName and LabelPrefix
351 with usage of LabelPrefix values for empty molname lines.
352
353 Possible values: *DataField | MolName | LabelPrefix |
354 MolNameOrLabelPrefix*. Default: *LabelPrefix*.
355
356 For *MolNameAndLabelPrefix* value of --CompoundIDMode, molname line
357 in *SDFile(s)* takes precedence over sequential compound IDs
358 generated using *LabelPrefix* and only empty molname values are
359 replaced with sequential compound IDs.
360
361 This is only used for *CompoundID* value of --DataFieldsMode option.
362
363 --DataFields *"FieldLabel1,FieldLabel2,..."*
364 Comma delimited list of *SDFiles(s)* data fields to extract and
365 write to CSV/TSV text file(s) along with generated fingerprints for
366 *text | all* values of --output option.
367
368 This is only used for *Specify* value of --DataFieldsMode option.
369
370 Examples:
371
372 Extreg
373 MolID,CompoundName
374
375 -d, --DataFieldsMode *All | Common | Specify | CompoundID*
376 Specify how data fields in *SDFile(s)* are transferred to output
377 CSV/TSV text file(s) along with generated fingerprints for *text |
378 all* values of --output option: transfer all SD data field; transfer
379 SD data files common to all compounds; extract specified data
380 fields; generate a compound ID using molname line, a compound
381 prefix, or a combination of both. Possible values: *All | Common |
382 specify | CompoundID*. Default value: *CompoundID*.
383
384 -f, --Filter *Yes | No*
385 Specify whether to check and filter compound data in SDFile(s).
386 Possible values: *Yes or No*. Default value: *Yes*.
387
388 By default, compound data is checked before calculating fingerprints
389 and compounds containing atom data corresponding to non-element
390 symbols or no atom data are ignored.
391
392 --FingerprintsLabel *text*
393 SD data label or text file column label to use for fingerprints
394 string in output SD or CSV/TSV text file(s) specified by --output.
395 Default value: *AtomNeighborhoodsFingerprints*.
396
397 -h, --help
398 Print this help message.
399
400 -k, --KeepLargestComponent *Yes | No*
401 Generate fingerprints for only the largest component in molecule.
402 Possible values: *Yes or No*. Default value: *Yes*.
403
404 For molecules containing multiple connected components, fingerprints
405 can be generated in two different ways: use all connected components
406 or just the largest connected component. By default, all atoms
407 except for the largest connected component are deleted before
408 generation of fingerprints.
409
410 --MinNeighborhoodRadius *number*
411 Minimum atom neighborhood radius for generating atom neighborhoods.
412 Default value: *0*. Valid values: positive integers and less than
413 --MaxNeighborhoodRadius. Neighborhood radius of zero corresponds to
414 list of non-hydrogen atoms.
415
416 --MaxNeighborhoodRadius *number*
417 Maximum atom neighborhood radius for generating atom neighborhoods.
418 Default value: *2*. Valid values: positive integers and greater than
419 --MineighborhoodRadius.
420
421 --OutDelim *comma | tab | semicolon*
422 Delimiter for output CSV/TSV text file(s). Possible values: *comma,
423 tab, or semicolon* Default value: *comma*.
424
425 --output *SD | FP | text | all*
426 Type of output files to generate. Possible values: *SD, FP, text, or
427 all*. Default value: *text*.
428
429 -o, --overwrite
430 Overwrite existing files.
431
432 -q, --quote *Yes | No*
433 Put quote around column values in output CSV/TSV text file(s).
434 Possible values: *Yes or No*. Default value: *Yes*.
435
436 -r, --root *RootName*
437 New file name is generated using the root: <Root>.<Ext>. Default for
438 new file names: <SDFileName><AtomNeighborhoodsFP>.<Ext>. The file
439 type determines <Ext> value. The sdf, fpf, csv, and tsv <Ext> values
440 are used for SD, comma/semicolon, and tab delimited text files,
441 respectively.This option is ignored for multiple input files.
442
443 -w, --WorkingDir *DirName*
444 Location of working directory. Default: current directory.
445
446 EXAMPLES
447 To generate atom neighborhoods fingerprints corresponding to atom
448 neighborhood radii from 0 to 2 using atomic invariants atom types in
449 vector string format and create a SampleANFP.csv file containing
450 sequential compound IDs along with fingerprints vector strings data,
451 type:
452
453 % AtomNeighborhoodsFingerprints.pl -r SampleANFP -o Sample.sdf
454
455 To generate atom neighborhoods fingerprints corresponding to atom
456 neighborhood radii from 0 to 2 using DREIDING atom types in vector
457 string format and create a SampleANFP.csv file containing sequential
458 compound IDs along with fingerprints vector strings data, type:
459
460 % AtomNeighborhoodsFingerprints.pl -a DREIDINGAtomTypes -r SampleANFP
461 -o Sample.sdf
462
463 To generate atom neighborhoods fingerprints corresponding to atom
464 neighborhood radii from 0 to 2 using EStateAtomTypes types in vector
465 string format and create a SampleANFP.csv file containing sequential
466 compound IDs along with fingerprints vector strings data, type:
467
468 % AtomNeighborhoodsFingerprints.pl -a EStateAtomTypes -r SampleANFP
469 -o Sample.sdf
470
471 To generate atom neighborhoods fingerprints corresponding to atom
472 neighborhood radii from 0 to 2 using SYBYL atom types in vector string
473 format and create a SampleANFP.csv file containing sequential compound
474 IDs along with fingerprints vector strings data, type:
475
476 % AtomNeighborhoodsFingerprints.pl -a SYBYLAtomTypes -r SampleANFP
477 -o Sample.sdf
478
479 To generate atom neighborhoods fingerprints corresponding to atom
480 neighborhood radii from 0 to 2 using FunctionalClass atom types in
481 vector string format and create a SampleANFP.csv file containing
482 sequential compound IDs along with fingerprints vector strings data,
483 type:
484
485 % AtomNeighborhoodsFingerprints.pl -a FunctionalClassAtomTypes
486 -r SampleANFP -o Sample.sdf
487
488 To generate atom neighborhoods fingerprints corresponding to atom
489 neighborhood radii from 0 to 2 using MMFF94 atom types in vector string
490 format and create a SampleANFP.csv file containing sequential compound
491 IDs along with fingerprints vector strings data, type:
492
493 % AtomNeighborhoodsFingerprints.pl -a MMFF94AtomTypes -r SampleANFP
494 -o Sample.sdf
495
496 To generate atom neighborhoods fingerprints corresponding to atom
497 neighborhood radii from 0 to 2 using SLogP atom types in vector string
498 format and create a SampleANFP.csv file containing sequential compound
499 IDs along with fingerprints vector strings data, type:
500
501 % AtomNeighborhoodsFingerprints.pl -a SLogPAtomTypes -r SampleANFP
502 -o Sample.sdf
503
504 To generate atom neighborhoods fingerprints corresponding to atom
505 neighborhood radii from 0 to 2 using SYBYL atom types in vector string
506 format and create a SampleANFP.csv file containing sequential compound
507 IDs along with fingerprints vector strings data, type:
508
509 % AtomNeighborhoodsFingerprints.pl -a SYBYLAtomTypes -r SampleANFP
510 -o Sample.sdf
511
512 To generate atom neighborhoods fingerprints corresponding to atom
513 neighborhood radii from 0 to 2 using TPSA atom types in vector string
514 format and create a SampleANFP.csv file containing sequential compound
515 IDs along with fingerprints vector strings data, type:
516
517 % AtomNeighborhoodsFingerprints.pl -a TPSAAtomTypes -r SampleANFP
518 -o Sample.sdf
519
520 To generate atom neighborhoods fingerprints corresponding to atom
521 neighborhood radii from 0 to 2 using UFF atom types in vector string
522 format and create a SampleANFP.csv file containing sequential compound
523 IDs along with fingerprints vector strings data, type:
524
525 % AtomNeighborhoodsFingerprints.pl -a UFFAtomTypes -r SampleANFP
526 -o Sample.sdf
527
528 To generate atom neighborhoods fingerprints corresponding to atom
529 neighborhood radii from 0 to 2 using atomic invariants atom types in
530 vector string format and create SampleANFP.sdf, SampleANFP.fpf and
531 SampleANFP.csv files containing sequential compound IDs in CSV file
532 along with fingerprints vector strings data, type:
533
534 % AtomNeighborhoodsFingerprints.pl --output all -r SampleANFP
535 -o Sample.sdf
536
537 To generate atom neighborhoods fingerprints corresponding to atom
538 neighborhood radii from 1 to 3 using atomic invariants atom types in
539 vector string format and create a SampleANFP.csv file containing
540 sequential compound IDs along with fingerprints vector strings data,
541 type:
542
543 % AtomNeighborhoodsFingerprints.pl -a AtomicInvariantsAtomTypes
544 --MinNeighborhoodRadius 1 --MaxNeighborhoodRadius 3 -r SampleANFP
545 -o Sample.sdf
546
547 To generate atom neighborhoods fingerprints corresponding to atom
548 neighborhood radii from 0 to 2 using only AS,X atomic invariants atom
549 types in vector string format and create a SampleANFP.csv file
550 containing sequential compound IDs along with fingerprints vector
551 strings data, type:
552
553 % AtomNeighborhoodsFingerprints.pl -a AtomicInvariantsAtomTypes
554 --AtomicInvariantsToUse "AS,X" --MinNeighborhoodRadius 0
555 --MaxNeighborhoodRadius 3 -r SampleANFP -o Sample.sdf
556
557 To generate atom neighborhoods fingerprints corresponding to atom
558 neighborhood radii from 0 to 2 using atomic invariants atom types in
559 vector string format and create a SampleANFP.csv file containing
560 compound ID from molecule name line along with fingerprints vector
561 strings data, type:
562
563 % AtomNeighborhoodsFingerprints.pl -a AtomicInvariantsAtomTypes
564 --DataFieldsMode CompoundID --CompoundIDMode MolName
565 -r SampleANFP -o Sample.sdf
566
567 To generate atom neighborhoods fingerprints corresponding to atom
568 neighborhood radii from 0 to 2 using atomic invariants atom types in
569 vector string format and create a SampleANFP.csv file containing
570 compound IDs using specified data field along with fingerprints vector
571 strings data, type:
572
573 % AtomNeighborhoodsFingerprints.pl -a AtomicInvariantsAtomTypes
574 --DataFieldsMode CompoundID --CompoundIDMode DataField --CompoundID
575 Mol_ID -r SampleANFP -o Sample.sdf
576
577 To generate atom neighborhoods fingerprints corresponding to atom
578 neighborhood radii from 0 to 2 using atomic invariants atom types in
579 vector string format and create a SampleANFP.csv file containing
580 compound ID using combination of molecule name line and an explicit
581 compound prefix along with fingerprints vector strings data, type:
582
583 % AtomNeighborhoodsFingerprints.pl -a AtomicInvariantsAtomTypes
584 --DataFieldsMode CompoundID --CompoundIDMode MolnameOrLabelPrefix
585 --CompoundID Cmpd --CompoundIDLabel MolID -r SampleANFP -o Sample.sdf
586
587 To generate atom neighborhoods fingerprints corresponding to atom
588 neighborhood radii from 0 to 2 using atomic invariants atom types in
589 vector string format and create a SampleANFP.csv file containing
590 specific data fields columns along with fingerprints vector strings
591 data, type:
592
593 % AtomNeighborhoodsFingerprints.pl -a AtomicInvariantsAtomTypes
594 --DataFieldsMode Specify --DataFields Mol_ID -r SampleANFP
595 -o Sample.sdf
596
597 To generate atom neighborhoods fingerprints corresponding to atom
598 neighborhood radii from 0 to 2 using atomic invariants atom types in
599 vector string format and create a SampleANFP.csv file containing common
600 data fields columns along with fingerprints vector strings data, type:
601
602 % AtomNeighborhoodsFingerprints.pl -a AtomicInvariantsAtomTypes
603 --DataFieldsMode Common -r SampleANFP -o Sample.sdf
604
605 To generate atom neighborhoods fingerprints corresponding to atom
606 neighborhood radii from 0 to 2 using atomic invariants atom types in
607 vector string format and create SampleANFP.sdf, SampleANFP.fpf and
608 SampleANFP.csv files containing all data fields columns in CSV file
609 along with fingerprints data, type:
610
611 % AtomNeighborhoodsFingerprints.pl -a AtomicInvariantsAtomTypes
612 --DataFieldsMode All --output all -r SampleANFP
613 -o Sample.sdf
614
615 AUTHOR
616 Manish Sud <msud@san.rr.com>
617
618 SEE ALSO
619 InfoFingerprintsFiles.pl, SimilarityMatricesFingerprints.pl,
620 SimilaritySearchingFingerprints.pl, ExtendedConnectivityFingerprints.pl,
621 MACCSKeysFingerprints.pl, PathLengthFingerprints.pl,
622 TopologicalAtomPairsFingerprints.pl,
623 TopologicalAtomTorsionsFingerprints.pl,
624 TopologicalPharmacophoreAtomPairsFingerprints.pl,
625 TopologicalPharmacophoreAtomTripletsFingerprints.pl
626
627 COPYRIGHT
628 Copyright (C) 2015 Manish Sud. All rights reserved.
629
630 This file is part of MayaChemTools.
631
632 MayaChemTools is free software; you can redistribute it and/or modify it
633 under the terms of the GNU Lesser General Public License as published by
634 the Free Software Foundation; either version 3 of the License, or (at
635 your option) any later version.
636