Mercurial > repos > deepakjadmin > mayatool3_test2
comparison docs/scripts/txt/PathLengthFingerprints.txt @ 0:4816e4a8ae95 draft default tip
Uploaded
author | deepakjadmin |
---|---|
date | Wed, 20 Jan 2016 09:23:18 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:4816e4a8ae95 |
---|---|
1 NAME | |
2 PathLengthFingerprints.pl - Generate atom path length based fingerprints | |
3 for SD files | |
4 | |
5 SYNOPSIS | |
6 PathLengthFingerprints.pl SDFile(s)... | |
7 | |
8 PathLengthFingerprints.pl [--AromaticityModel *AromaticityModelType*] | |
9 [-a, --AtomIdentifierType *AtomicInvariantsAtomTypes*] | |
10 [--AtomicInvariantsToUse *"AtomicInvariant1,AtomicInvariant2..."*] | |
11 [--FunctionalClassesToUse *"FunctionalClass1,FunctionalClass2..."*] | |
12 [--BitsOrder *Ascending | Descending*] [-b, --BitStringFormat | |
13 *BinaryString | HexadecimalString*] [--CompoundID *DataFieldName or | |
14 LabelPrefixString*] [--CompoundIDLabel *text*] [--CompoundIDMode | |
15 *DataField | MolName | LabelPrefix | MolNameOrLabelPrefix*] | |
16 [--DataFields *"FieldLabel1,FieldLabel2,... "*] [-d, --DataFieldsMode | |
17 *All | Common | Specify | CompoundID*] [--DetectAromaticity *Yes | No*] | |
18 [-f, --Filter *Yes | No*] [--FingerprintsLabel *text*] [--fold *Yes | | |
19 No*] [--FoldedSize *number*] [-h, --help] [-i, --IgnoreHydrogens *Yes | | |
20 No*] [-k, --KeepLargestComponent *Yes | No*] [-m, --mode *PathLengthBits | |
21 | PathLengthCount*] [--MinPathLength *number*] [--MaxPathLength | |
22 *number*] [-n, --NumOfBitsToSetPerPath *number*] [--OutDelim *comma | | |
23 tab | semicolon*] [--output *SD | FP | text | all*] [-q, --quote *Yes | | |
24 No*] [-r, --root *RootName*] [-p, --PathMode *AtomPathsWithoutRings | | |
25 AtomPathsWithRings | AllAtomPathsWithoutRings | AllAtomPathsWithRings*] | |
26 [-s, --size *number*] [-u, --UseBondSymbols *Yes | No*] | |
27 [--UsePerlCoreRandom *Yes | No*] [--UseUniquePaths *Yes | No*] [-q, | |
28 --quote *Yes | No*] [-r, --root *RootName*] [-v, --VectorStringFormat | |
29 *IDsAndValuesString | IDsAndValuesPairsString | ValuesAndIDsString | | |
30 ValuesAndIDsPairsString*] [-w, --WorkingDir dirname] SDFile(s)... | |
31 | |
32 DESCRIPTION | |
33 Generate atom path length fingerprints for *SDFile(s)* and create | |
34 appropriate SD, FP or CSV/TSV text file(s) containing fingerprints | |
35 bit-vector or vector strings corresponding to molecular fingerprints. | |
36 | |
37 Multiple SDFile names are separated by spaces. The valid file extensions | |
38 are *.sdf* and *.sd*. All other file names are ignored. All the SD files | |
39 in a current directory can be specified either by **.sdf* or the current | |
40 directory name. | |
41 | |
42 The current release of MayaChemTools supports generation of path length | |
43 fingerprints corresponding to following -a, --AtomIdentifierTypes: | |
44 | |
45 AtomicInvariantsAtomTypes, DREIDINGAtomTypes, EStateAtomTypes, | |
46 FunctionalClassAtomTypes, MMFF94AtomTypes, SLogPAtomTypes, | |
47 SYBYLAtomTypes, TPSAAtomTypes, UFFAtomTypes | |
48 | |
49 Based on the values specified for -p, --PathMode, --MinPathLength and | |
50 --MaxPathLength, all appropriate atom paths are generated for each atom | |
51 in the molecule and collected in a list and the list is filtered to | |
52 remove any structurally duplicate paths as indicated by the value of | |
53 --UseUniquePaths option. | |
54 | |
55 For each atom path in the filtered atom paths list, an atom path string | |
56 is created using value of -a, --AtomIdentifierType and specified values | |
57 to use for a particular atom identifier type. Value of -u, | |
58 --UseBondSymbols controls whether bond order symbols are used during | |
59 generation of atom path string. For each atom path, only | |
60 lexicographically smaller atom path strings are kept. | |
61 | |
62 For *PathLengthBits* value of -m, --mode option, each atom path is | |
63 hashed to a 32 bit unsigned integer key using TextUtil::HashCode | |
64 function. Using the hash key as a seed for a random number generator, a | |
65 random integer value between 0 and --Size is used to set corresponding | |
66 bits in the fingerprint bit-vector string. Value of | |
67 --NumOfBitsToSetPerPath option controls the number of time a random | |
68 number is generated to set corresponding bits. | |
69 | |
70 For * PathLengthCount* value of -m, --mode option, the number of times | |
71 an atom path appears is tracked and a fingerprints count-string | |
72 corresponding to count of atom paths is generated. | |
73 | |
74 Example of *SD* file containing path length fingerprints string data: | |
75 | |
76 ... ... | |
77 ... ... | |
78 $$$$ | |
79 ... ... | |
80 ... ... | |
81 ... ... | |
82 41 44 0 0 0 0 0 0 0 0999 V2000 | |
83 -3.3652 1.4499 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 | |
84 ... ... | |
85 2 3 1 0 0 0 0 | |
86 ... ... | |
87 M END | |
88 > <CmpdID> | |
89 Cmpd1 | |
90 | |
91 > <PathLengthFingerprints> | |
92 FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes:MinLengt | |
93 h1:MaxLength8;1024;HexadecimalString;Ascending;9c8460989ec8a49913991a66 | |
94 03130b0a19e8051c89184414953800cc2151082844a201042800130860308e8204d4028 | |
95 00831048940e44281c00060449a5000ac80c894114e006321264401600846c050164462 | |
96 08190410805000304a10205b0100e04c0038ba0fad0209c0ca8b1200012268b61c0026a | |
97 aa0660a11014a011d46 | |
98 | |
99 $$$$ | |
100 ... ... | |
101 ... ... | |
102 | |
103 Example of *FP* file containing path length fingerprints string data: | |
104 | |
105 # | |
106 # Package = MayaChemTools 7.4 | |
107 # ReleaseDate = Oct 21, 2010 | |
108 # | |
109 # TimeStamp = Mon Mar 7 15:14:01 2011 | |
110 # | |
111 # FingerprintsStringType = FingerprintsBitVector | |
112 # | |
113 # Description = PathLengthBits:AtomicInvariantsAtomTypes:MinLength1:... | |
114 # Size = 1024 | |
115 # BitStringFormat = HexadecimalString | |
116 # BitsOrder = Ascending | |
117 # | |
118 Cmpd1 9c8460989ec8a49913991a6603130b0a19e8051c89184414953800cc21510... | |
119 Cmpd2 000000249400840040100042011001001980410c000000001010088001120... | |
120 ... ... | |
121 ... .. | |
122 | |
123 Example of CSV *Text* file containing pathlength fingerprints string | |
124 data: | |
125 | |
126 "CompoundID","PathLengthFingerprints" | |
127 "Cmpd1","FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes | |
128 :MinLength1:MaxLength8;1024;HexadecimalString;Ascending;9c8460989ec8a4 | |
129 9913991a6603130b0a19e8051c89184414953800cc2151082844a20104280013086030 | |
130 8e8204d402800831048940e44281c00060449a5000ac80c894114e006321264401..." | |
131 ... ... | |
132 ... ... | |
133 | |
134 The current release of MayaChemTools generates the following types of | |
135 path length fingerprints bit-vector and vector strings: | |
136 | |
137 FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes:MinLeng | |
138 th1:MaxLength8;1024;BinaryString;Ascending;001000010011010101011000110 | |
139 0100010101011000101001011100110001000010001001101000001001001001001000 | |
140 0010110100000111001001000001001010100100100000000011000000101001011100 | |
141 0010000001000101010100000100111100110111011011011000000010110111001101 | |
142 0101100011000000010001000011000010100011101100001000001000100000000... | |
143 | |
144 FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes:MinLeng | |
145 th1:MaxLength8;1024;HexadecimalString;Ascending;48caa1315d82d91122b029 | |
146 42861c9409a4208182d12015509767bd0867653604481a8b1288000056090583603078 | |
147 9cedae54e26596889ab121309800900490515224208421502120a0dd9200509723ae89 | |
148 00024181b86c0122821d4e4880c38620dab280824b455404009f082003d52c212b4e6d | |
149 6ea05280140069c780290c43 | |
150 | |
151 FingerprintsVector;PathLengthCount:AtomicInvariantsAtomTypes:MinLength | |
152 1:MaxLength8;432;NumericalValues;IDsAndValuesPairsString;C.X1.BO1.H3 2 | |
153 C.X2.BO2.H2 4 C.X2.BO3.H1 14 C.X3.BO3.H1 3 C.X3.BO4 10 F.X1.BO1 1 N.X | |
154 2.BO2.H1 1 N.X3.BO3 1 O.X1.BO1.H1 3 O.X1.BO2 2 C.X1.BO1.H3C.X3.BO3.H1 | |
155 2 C.X2.BO2.H2C.X2.BO2.H2 1 C.X2.BO2.H2C.X3.BO3.H1 4 C.X2.BO2.H2C.X3.BO | |
156 4 1 C.X2.BO2.H2N.X3.BO3 1 C.X2.BO3.H1:C.X2.BO3.H1 10 C.X2.BO3.H1:C.... | |
157 | |
158 FingerprintsVector;PathLengthCount:DREIDINGAtomTypes:MinLength1:MaxLen | |
159 gth8;410;NumericalValues;IDsAndValuesPairsString;C_2 2 C_3 9 C_R 22 F_ | |
160 1 N_3 1 N_R 1 O_2 2 O_3 3 C_2=O_2 2 C_2C_3 1 C_2C_R 1 C_2N_3 1 C_2O_3 | |
161 1 C_3C_3 7 C_3C_R 1 C_3N_R 1 C_3O_3 2 C_R:C_R 21 C_R:N_R 2 C_RC_R 2 C | |
162 _RF_ 1 C_RN_3 1 C_2C_3C_3 1 C_2C_R:C_R 2 C_2N_3C_R 1 C_3C_2=O_2 1 C_3C | |
163 _2O_3 1 C_3C_3C_3 5 C_3C_3C_R 2 C_3C_3N_R 1 C_3C_3O_3 4 C_3C_R:C_R ... | |
164 | |
165 FingerprintsVector;PathLengthCount:EStateAtomTypes:MinLength1:MaxLengt | |
166 h8;454;NumericalValues;IDsAndValuesPairsString;aaCH 14 aasC 8 aasN 1 d | |
167 O 2 dssC 2 sCH3 2 sF 1 sOH 3 ssCH2 4 ssNH 1 sssCH 3 aaCH:aaCH 10 aaCH: | |
168 aasC 8 aasC:aasC 3 aasC:aasN 2 aasCaasC 2 aasCdssC 1 aasCsF 1 aasCssNH | |
169 1 aasCsssCH 1 aasNssCH2 1 dO=dssC 2 dssCsOH 1 dssCssCH2 1 dssCssNH 1 | |
170 sCH3sssCH 2 sOHsssCH 2 ssCH2ssCH2 1 ssCH2sssCH 4 aaCH:aaCH:aaCH 6 a... | |
171 | |
172 FingerprintsVector;PathLengthCount:FunctionalClassAtomTypes:MinLength1 | |
173 :MaxLength8;404;NumericalValues;IDsAndValuesPairsString;Ar 22 Ar.HBA 1 | |
174 HBA 2 HBA.HBD 3 HBD 1 Hal 1 NI 1 None 10 Ar.HBA:Ar 2 Ar.HBANone 1 Ar: | |
175 Ar 21 ArAr 2 ArHBD 1 ArHal 1 ArNone 2 HBA.HBDNI 1 HBA.HBDNone 2 HBA=NI | |
176 1 HBA=None 1 HBDNone 1 NINone 1 NoneNone 7 Ar.HBA:Ar:Ar 2 Ar.HBA:ArAr | |
177 1 Ar.HBA:ArNone 1 Ar.HBANoneNone 1 Ar:Ar.HBA:Ar 1 Ar:Ar.HBANone 2 ... | |
178 | |
179 FingerprintsVector;PathLengthCount:MMFF94AtomTypes:MinLength1:MaxLengt | |
180 h8;463;NumericalValues;IDsAndValuesPairsString;C5A 2 C5B 2 C=ON 1 CB 1 | |
181 8 COO 1 CR 9 F 1 N5 1 NC=O 1 O=CN 1 O=CO 1 OC=O 1 OR 2 C5A:C5B 2 C5A:N | |
182 5 2 C5ACB 1 C5ACR 1 C5B:C5B 1 C5BC=ON 1 C5BCB 1 C=ON=O=CN 1 C=ONNC=O 1 | |
183 CB:CB 18 CBF 1 CBNC=O 1 COO=O=CO 1 COOCR 1 COOOC=O 1 CRCR 7 CRN5 1 CR | |
184 OR 2 C5A:C5B:C5B 2 C5A:C5BC=ON 1 C5A:C5BCB 1 C5A:N5:C5A 1 C5A:N5CR ... | |
185 | |
186 FingerprintsVector;PathLengthCount:SLogPAtomTypes:MinLength1:MaxLength | |
187 8;518;NumericalValues;IDsAndValuesPairsString;C1 5 C10 1 C11 1 C14 1 C | |
188 18 14 C20 4 C21 2 C22 1 C5 2 CS 2 F 1 N11 1 N4 1 O10 1 O2 3 O9 1 C10C1 | |
189 1 C10N11 1 C11C1 2 C11C21 1 C14:C18 2 C14F 1 C18:C18 10 C18:C20 4 C18 | |
190 :C22 2 C1C5 1 C1CS 4 C20:C20 1 C20:C21 1 C20:N11 1 C20C20 2 C21:C21 1 | |
191 C21:N11 1 C21C5 1 C22N4 1 C5=O10 1 C5=O9 1 C5N4 1 C5O2 1 CSO2 2 C10... | |
192 | |
193 FingerprintsVector;PathLengthCount:SYBYLAtomTypes:MinLength1:MaxLength | |
194 8;412;NumericalValues;IDsAndValuesPairsString;C.2 2 C.3 9 C.ar 22 F 1 | |
195 N.am 1 N.ar 1 O.2 1 O.3 2 O.co2 2 C.2=O.2 1 C.2=O.co2 1 C.2C.3 1 C.2C. | |
196 ar 1 C.2N.am 1 C.2O.co2 1 C.3C.3 7 C.3C.ar 1 C.3N.ar 1 C.3O.3 2 C.ar:C | |
197 .ar 21 C.ar:N.ar 2 C.arC.ar 2 C.arF 1 C.arN.am 1 C.2C.3C.3 1 C.2C.ar:C | |
198 .ar 2 C.2N.amC.ar 1 C.3C.2=O.co2 1 C.3C.2O.co2 1 C.3C.3C.3 5 C.3C.3... | |
199 | |
200 FingerprintsVector;PathLengthCount:TPSAAtomTypes:MinLength1:MaxLength8 | |
201 ;331;NumericalValues;IDsAndValuesPairsString;N21 1 N7 1 None 34 O3 2 O | |
202 4 3 N21:None 2 N21None 1 N7None 2 None:None 21 None=O3 2 NoneNone 13 N | |
203 oneO4 3 N21:None:None 2 N21:NoneNone 2 N21NoneNone 1 N7None:None 2 N7N | |
204 one=O3 1 N7NoneNone 1 None:N21:None 1 None:N21None 2 None:None:None 20 | |
205 None:NoneNone 12 NoneN7None 1 NoneNone=O3 2 NoneNoneNone 8 NoneNon... | |
206 | |
207 FingerprintsVector;PathLengthCount:UFFAtomTypes:MinLength1:MaxLength8; | |
208 410;NumericalValues;IDsAndValuesPairsString;C_2 2 C_3 9 C_R 22 F_ 1 N_ | |
209 3 1 N_R 1 O_2 2 O_3 3 C_2=O_2 2 C_2C_3 1 C_2C_R 1 C_2N_3 1 C_2O_3 1 C_ | |
210 3C_3 7 C_3C_R 1 C_3N_R 1 C_3O_3 2 C_R:C_R 21 C_R:N_R 2 C_RC_R 2 C_RF_ | |
211 1 C_RN_3 1 C_2C_3C_3 1 C_2C_R:C_R 2 C_2N_3C_R 1 C_3C_2=O_2 1 C_3C_2O_3 | |
212 1 C_3C_3C_3 5 C_3C_3C_R 2 C_3C_3N_R 1 C_3C_3O_3 4 C_3C_R:C_R 1 C_3... | |
213 | |
214 OPTIONS | |
215 --AromaticityModel *MDLAromaticityModel | TriposAromaticityModel | | |
216 MMFFAromaticityModel | ChemAxonBasicAromaticityModel | | |
217 ChemAxonGeneralAromaticityModel | DaylightAromaticityModel | | |
218 MayaChemToolsAromaticityModel* | |
219 Specify aromaticity model to use during detection of aromaticity. | |
220 Possible values in the current release are: *MDLAromaticityModel, | |
221 TriposAromaticityModel, MMFFAromaticityModel, | |
222 ChemAxonBasicAromaticityModel, ChemAxonGeneralAromaticityModel, | |
223 DaylightAromaticityModel or MayaChemToolsAromaticityModel*. Default | |
224 value: *MayaChemToolsAromaticityModel*. | |
225 | |
226 The supported aromaticity model names along with model specific | |
227 control parameters are defined in AromaticityModelsData.csv, which | |
228 is distributed with the current release and is available under | |
229 lib/data directory. Molecule.pm module retrieves data from this file | |
230 during class instantiation and makes it available to method | |
231 DetectAromaticity for detecting aromaticity corresponding to a | |
232 specific model. | |
233 | |
234 This option is ignored during *No* value of --DetectAromaticity | |
235 option. | |
236 | |
237 -a, --AtomIdentifierType *AtomicInvariantsAtomTypes | DREIDINGAtomTypes | |
238 | EStateAtomTypes | FunctionalClassAtomTypes | MMFF94AtomTypes | | |
239 SLogPAtomTypes | SYBYLAtomTypes | TPSAAtomTypes | UFFAtomTypes* | |
240 Specify atom identifier type to use for assignment of atom types to | |
241 hydrogen and/or non-hydrogen atoms during calculation of atom types | |
242 fingerprints. Possible values in the current release are: | |
243 *AtomicInvariantsAtomTypes, DREIDINGAtomTypes, EStateAtomTypes, | |
244 FunctionalClassAtomTypes, MMFF94AtomTypes, SLogPAtomTypes, | |
245 SYBYLAtomTypes, TPSAAtomTypes, UFFAtomTypes*. Default value: | |
246 *AtomicInvariantsAtomTypes*. | |
247 | |
248 -a, --AtomIdentifierType *AtomicInvariantsAtomTypes | DREIDINGAtomTypes | |
249 | EStateAtomTypes | FunctionalClassAtomTypes | MMFF94AtomTypes | | |
250 SLogPAtomTypes | SYBYLAtomTypes | TPSAAtomTypes | UFFAtomTypes* | |
251 Specify atom identifier type to use during generation of atom path | |
252 strings corresponding to path length fingerprints. Possible values | |
253 in the current release are: *AtomicInvariantsAtomTypes, | |
254 DREIDINGAtomTypes, EStateAtomTypes, FunctionalClassAtomTypes, | |
255 MMFF94AtomTypes, SLogPAtomTypes, SYBYLAtomTypes, TPSAAtomTypes, | |
256 UFFAtomTypes*. Default value: *AtomicInvariantsAtomTypes*. | |
257 | |
258 --AtomicInvariantsToUse *"AtomicInvariant1,AtomicInvariant2..."* | |
259 This value is used during *AtomicInvariantsAtomTypes* value of a, | |
260 --AtomIdentifierType option. It's a list of comma separated valid | |
261 atomic invariant atom types. | |
262 | |
263 Possible values for atomic invariants are: *AS, X, BO, LBO, SB, DB, | |
264 TB, H, Ar, RA, FC, MN, SM*. Default value: *AS*. | |
265 | |
266 The atomic invariants abbreviations correspond to: | |
267 | |
268 AS = Atom symbol corresponding to element symbol | |
269 | |
270 X<n> = Number of non-hydrogen atom neighbors or heavy atoms | |
271 BO<n> = Sum of bond orders to non-hydrogen atom neighbors or heavy atoms | |
272 LBO<n> = Largest bond order of non-hydrogen atom neighbors or heavy atoms | |
273 SB<n> = Number of single bonds to non-hydrogen atom neighbors or heavy atoms | |
274 DB<n> = Number of double bonds to non-hydrogen atom neighbors or heavy atoms | |
275 TB<n> = Number of triple bonds to non-hydrogen atom neighbors or heavy atoms | |
276 H<n> = Number of implicit and explicit hydrogens for atom | |
277 Ar = Aromatic annotation indicating whether atom is aromatic | |
278 RA = Ring atom annotation indicating whether atom is a ring | |
279 FC<+n/-n> = Formal charge assigned to atom | |
280 MN<n> = Mass number indicating isotope other than most abundant isotope | |
281 SM<n> = Spin multiplicity of atom. Possible values: 1 (singlet), 2 (doublet) or | |
282 3 (triplet) | |
283 | |
284 Atom type generated by AtomTypes::AtomicInvariantsAtomTypes class | |
285 corresponds to: | |
286 | |
287 AS.X<n>.BO<n>.LBO<n>.<SB><n>.<DB><n>.<TB><n>.H<n>.Ar.RA.FC<+n/-n>.MN<n>.SM<n> | |
288 | |
289 Except for AS which is a required atomic invariant in atom types, | |
290 all other atomic invariants are optional. Atom type specification | |
291 doesn't include atomic invariants with zero or undefined values. | |
292 | |
293 In addition to usage of abbreviations for specifying atomic | |
294 invariants, the following descriptive words are also allowed: | |
295 | |
296 X : NumOfNonHydrogenAtomNeighbors or NumOfHeavyAtomNeighbors | |
297 BO : SumOfBondOrdersToNonHydrogenAtoms or SumOfBondOrdersToHeavyAtoms | |
298 LBO : LargestBondOrderToNonHydrogenAtoms or LargestBondOrderToHeavyAtoms | |
299 SB : NumOfSingleBondsToNonHydrogenAtoms or NumOfSingleBondsToHeavyAtoms | |
300 DB : NumOfDoubleBondsToNonHydrogenAtoms or NumOfDoubleBondsToHeavyAtoms | |
301 TB : NumOfTripleBondsToNonHydrogenAtoms or NumOfTripleBondsToHeavyAtoms | |
302 H : NumOfImplicitAndExplicitHydrogens | |
303 Ar : Aromatic | |
304 RA : RingAtom | |
305 FC : FormalCharge | |
306 MN : MassNumber | |
307 SM : SpinMultiplicity | |
308 | |
309 Examples: | |
310 | |
311 Benzene: Using value of *AS* for --AtomicInvariantsToUse, *Yes* for | |
312 UseBondSymbols, and * AllAtomPathsWithRings* for -p, --PathMode, | |
313 atom path strings generated are: | |
314 | |
315 C C:C C:C:C C:C:C:C C:C:C:C:C C:C:C:C:C:C C:C:C:C:C:C:C | |
316 | |
317 And using *AS,X,BO* for --AtomicInvariantsToUse generates following | |
318 atom path strings: | |
319 | |
320 C.X2.BO3 C.X2.BO3:C.X2.BO3 C.X2.BO3:C.X2.BO3:C.X2.BO3 | |
321 C.X2.BO3:C.X2.BO3:C.X2.BO3:C.X2.BO3 | |
322 C.X2.BO3:C.X2.BO3:C.X2.BO3:C.X2.BO3:C.X2.BO3 | |
323 C.X2.BO3:C.X2.BO3:C.X2.BO3:C.X2.BO3:C.X2.BO3:C.X2.BO3 | |
324 C.X2.BO3:C.X2.BO3:C.X2.BO3:C.X2.BO3:C.X2.BO3:C.X2.BO3:C.X2.BO3 | |
325 | |
326 Urea: Using value of *AS* for --AtomicInvariantsToUse, *Yes* for | |
327 UseBondSymbols, and * AllAtomPathsWithRings* for -p, --PathMode, | |
328 atom path strings are: | |
329 | |
330 C N O C=O CN NC=O NCN | |
331 | |
332 And using *AS,X,BO* for --AtomicInvariantsToUse generates following | |
333 atom path strings: | |
334 | |
335 C.X3.BO4 N.X1.BO1 O.X1.BO2 C.X3.BO4=O.X1.BO2 | |
336 C.X3.BO4N.X1.BO1 N.X1.BO1C.X3.BO4=O.X1.BO2 | |
337 N.X1.BO1C.X3.BO4N.X1.BO1 | |
338 | |
339 --FunctionalClassesToUse *"FunctionalClass1,FunctionalClass2..."* | |
340 This value is used during *FunctionalClassAtomTypes* value of a, | |
341 --AtomIdentifierType option. It's a list of comma separated valid | |
342 functional classes. | |
343 | |
344 Possible values for atom functional classes are: *Ar, CA, H, HBA, | |
345 HBD, Hal, NI, PI, RA*. Default value [ Ref 24 ]: | |
346 *HBD,HBA,PI,NI,Ar,Hal*. | |
347 | |
348 The functional class abbreviations correspond to: | |
349 | |
350 HBD: HydrogenBondDonor | |
351 HBA: HydrogenBondAcceptor | |
352 PI : PositivelyIonizable | |
353 NI : NegativelyIonizable | |
354 Ar : Aromatic | |
355 Hal : Halogen | |
356 H : Hydrophobic | |
357 RA : RingAtom | |
358 CA : ChainAtom | |
359 | |
360 Functional class atom type specification for an atom corresponds to: | |
361 | |
362 Ar.CA.H.HBA.HBD.Hal.NI.PI.RA | |
363 | |
364 *AtomTypes::FunctionalClassAtomTypes* module is used to assign | |
365 functional class atom types. It uses following definitions [ Ref | |
366 60-61, Ref 65-66 ]: | |
367 | |
368 HydrogenBondDonor: NH, NH2, OH | |
369 HydrogenBondAcceptor: N[!H], O | |
370 PositivelyIonizable: +, NH2 | |
371 NegativelyIonizable: -, C(=O)OH, S(=O)OH, P(=O)OH | |
372 | |
373 --BitsOrder *Ascending | Descending* | |
374 Bits order to use during generation of fingerprints bit-vector | |
375 string for *PathLengthBits* value of -m, --mode option. Possible | |
376 values: *Ascending, Descending*. Default: *Ascending*. | |
377 | |
378 *Ascending* bit order which corresponds to first bit in each byte as | |
379 the lowest bit as opposed to the highest bit. | |
380 | |
381 Internally, bits are stored in *Ascending* order using Perl vec | |
382 function. Regardless of machine order, big-endian or little-endian, | |
383 vec function always considers first string byte as the lowest byte | |
384 and first bit within each byte as the lowest bit. | |
385 | |
386 -b, --BitStringFormat *BinaryString | HexadecimalString* | |
387 Format of fingerprints bit-vector string data in output SD, FP or | |
388 CSV/TSV text file(s) specified by --output used during | |
389 *PathLengthBits* value of -m, --mode option. Possible values: | |
390 *BinaryString, HexadecimalString*. Default value: | |
391 *HexadecimalString*. | |
392 | |
393 *BinaryString* corresponds to an ASCII string containing 1s and 0s. | |
394 *HexadecimalString* contains bit values in ASCII hexadecimal format. | |
395 | |
396 Examples: | |
397 | |
398 FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes:MinLeng | |
399 th1:MaxLength8;1024;BinaryString;Ascending;001000010011010101011000110 | |
400 0100010101011000101001011100110001000010001001101000001001001001001000 | |
401 0010110100000111001001000001001010100100100000000011000000101001011100 | |
402 0010000001000101010100000100111100110111011011011000000010110111001101 | |
403 0101100011000000010001000011000010100011101100001000001000100000000... | |
404 | |
405 FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes:MinLeng | |
406 th1:MaxLength8;1024;HexadecimalString;Ascending;48caa1315d82d91122b029 | |
407 42861c9409a4208182d12015509767bd0867653604481a8b1288000056090583603078 | |
408 9cedae54e26596889ab121309800900490515224208421502120a0dd9200509723ae89 | |
409 00024181b86c0122821d4e4880c38620dab280824b455404009f082003d52c212b4e6d | |
410 6ea05280140069c780290c43 | |
411 | |
412 --CompoundID *DataFieldName or LabelPrefixString* | |
413 This value is --CompoundIDMode specific and indicates how compound | |
414 ID is generated. | |
415 | |
416 For *DataField* value of --CompoundIDMode option, it corresponds to | |
417 datafield label name whose value is used as compound ID; otherwise, | |
418 it's a prefix string used for generating compound IDs like | |
419 LabelPrefixString<Number>. Default value, *Cmpd*, generates compound | |
420 IDs which look like Cmpd<Number>. | |
421 | |
422 Examples for *DataField* value of --CompoundIDMode: | |
423 | |
424 MolID | |
425 ExtReg | |
426 | |
427 Examples for *LabelPrefix* or *MolNameOrLabelPrefix* value of | |
428 --CompoundIDMode: | |
429 | |
430 Compound | |
431 | |
432 The value specified above generates compound IDs which correspond to | |
433 Compound<Number> instead of default value of Cmpd<Number>. | |
434 | |
435 --CompoundIDLabel *text* | |
436 Specify compound ID column label for FP or CSV/TSV text file(s) used | |
437 during *CompoundID* value of --DataFieldsMode option. Default: | |
438 *CompoundID*. | |
439 | |
440 --CompoundIDMode *DataField | MolName | LabelPrefix | | |
441 MolNameOrLabelPrefix* | |
442 Specify how to generate compound IDs and write to FP or CSV/TSV text | |
443 file(s) along with generated fingerprints for *FP | text | all* | |
444 values of --output option: use a *SDFile(s)* datafield value; use | |
445 molname line from *SDFile(s)*; generate a sequential ID with | |
446 specific prefix; use combination of both MolName and LabelPrefix | |
447 with usage of LabelPrefix values for empty molname lines. | |
448 | |
449 Possible values: *DataField | MolName | LabelPrefix | | |
450 MolNameOrLabelPrefix*. Default: *LabelPrefix*. | |
451 | |
452 For *MolNameAndLabelPrefix* value of --CompoundIDMode, molname line | |
453 in *SDFile(s)* takes precedence over sequential compound IDs | |
454 generated using *LabelPrefix* and only empty molname values are | |
455 replaced with sequential compound IDs. | |
456 | |
457 This is only used for *CompoundID* value of --DataFieldsMode option. | |
458 | |
459 --DataFields *"FieldLabel1,FieldLabel2,... "* | |
460 Comma delimited list of *SDFiles(s)* data fields to extract and | |
461 write to CSV/TSV text file(s) along with generated fingerprints for | |
462 *text | all* values of --output option. | |
463 | |
464 This is only used for *Specify* value of --DataFieldsMode option. | |
465 | |
466 Examples: | |
467 | |
468 Extreg | |
469 MolID,CompoundName | |
470 | |
471 -d, --DataFieldsMode *All | Common | Specify | CompoundID* | |
472 Specify how data fields in *SDFile(s)* are transferred to output | |
473 CSV/TSV text file(s) along with generated fingerprints for *text | | |
474 all* values of --output option: transfer all SD data field; transfer | |
475 SD data files common to all compounds; extract specified data | |
476 fields; generate a compound ID using molname line, a compound | |
477 prefix, or a combination of both. Possible values: *All | Common | | |
478 specify | CompoundID*. Default value: *CompoundID*. | |
479 | |
480 --DetectAromaticity *Yes | No* | |
481 Detect aromaticity before generating fingerprints. Possible values: | |
482 *Yes or No*. Default value: *Yes*. | |
483 | |
484 *No* --DetectAromaticity forces usage of atom and bond aromaticity | |
485 values from *SDFile(s)* and skips the step which detects and assigns | |
486 aromaticity. | |
487 | |
488 *No* --DetectAromaticity value is only allowed uring | |
489 *AtomicInvariantsAtomTypes* value of -a, --AtomIdentifierType | |
490 options; for all possible values -a, --AtomIdentifierType values, it | |
491 must be *Yes*. | |
492 | |
493 -f, --Filter *Yes | No* | |
494 Specify whether to check and filter compound data in SDFile(s). | |
495 Possible values: *Yes or No*. Default value: *Yes*. | |
496 | |
497 By default, compound data is checked before calculating fingerprints | |
498 and compounds containing atom data corresponding to non-element | |
499 symbols or no atom data are ignored. | |
500 | |
501 --FingerprintsLabel *text* | |
502 SD data label or text file column label to use for fingerprints | |
503 string in output SD or CSV/TSV text file(s) specified by --output. | |
504 Default value: *PathLenghFingerprints*. | |
505 | |
506 --fold *Yes | No* | |
507 Fold fingerprints to increase bit density during *PathLengthBits* | |
508 value of -m, --mode option. Possible values: *Yes or No*. Default | |
509 value: *No*. | |
510 | |
511 --FoldedSize *number* | |
512 Size of folded fingerprint during *PathLengthBits* value of -m, | |
513 --mode option. Default value: *256*. Valid values correspond to any | |
514 positive integer which is less than -s, --size and meets the | |
515 criteria for its value. | |
516 | |
517 Examples: | |
518 | |
519 128 | |
520 512 | |
521 | |
522 -h, --help | |
523 Print this help message | |
524 | |
525 -i, --IgnoreHydrogens *Yes | No* | |
526 Ignore hydrogens during fingerprints generation. Possible values: | |
527 *Yes or No*. Default value: *Yes*. | |
528 | |
529 For *yes* value of -i, --IgnoreHydrogens, any explicit hydrogens are | |
530 also used for generation of atoms path lengths and fingerprints; | |
531 implicit hydrogens are still ignored. | |
532 | |
533 -k, --KeepLargestComponent *Yes | No* | |
534 Generate fingerprints for only the largest component in molecule. | |
535 Possible values: *Yes or No*. Default value: *Yes*. | |
536 | |
537 For molecules containing multiple connected components, fingerprints | |
538 can be generated in two different ways: use all connected components | |
539 or just the largest connected component. By default, all atoms | |
540 except for the largest connected component are deleted before | |
541 generation of fingerprints. | |
542 | |
543 -m, --mode *PathLengthBits | PathLengthCount* | |
544 Specify type of path length fingerprints to generate for molecules | |
545 in *SDFile(s)*. Possible values: *PathLengthBits, PathLengthCount*. | |
546 Default value: *PathLengthBits*. | |
547 | |
548 For *PathLengthBits* value of -m, --mode option, a fingerprint | |
549 bit-vector string containing zeros and ones is generated and for | |
550 *PathLengthCount* value, a fingerprint vector string corresponding | |
551 to number of atom paths is generated. | |
552 | |
553 --MinPathLength *number* | |
554 Minimum atom path length to include in fingerprints. Default value: | |
555 *1*. Valid values: positive integers and less than --MaxPathLength. | |
556 Path length of 1 correspond to a path containing only one atom. | |
557 | |
558 --MaxPathLength *number* | |
559 Maximum atom path length to include in fingerprints. Default value: | |
560 *8*. Valid values: positive integers and greater than | |
561 --MinPathLength. | |
562 | |
563 -n, --NumOfBitsToSetPerPath *number* | |
564 Number of bits to set per path during generation of fingerprints | |
565 bit-vector string for *PathLengthBits* value of -m, --mode option. | |
566 Default value: *1*. Valid values: positive integers. | |
567 | |
568 --OutDelim *comma | tab | semicolon* | |
569 Delimiter for output CSV/TSV text file(s). Possible values: *comma, | |
570 tab, or semicolon* Default value: *comma*. | |
571 | |
572 --output *SD | FP | text | all* | |
573 Type of output files to generate. Possible values: *SD, FP, text, or | |
574 all*. Default value: *text*. | |
575 | |
576 -o, --overwrite | |
577 Overwrite existing files. | |
578 | |
579 -p, --PathMode *AtomPathsWithoutRings | AtomPathsWithRings | | |
580 AllAtomPathsWithoutRings | AllAtomPathsWithRings* | |
581 Specify type of atom paths to use for generating pathlength | |
582 fingerprints for molecules in *SDFile(s)*. Possible | |
583 values:*AtomPathsWithoutRings, AtomPathsWithRings, | |
584 AllAtomPathsWithoutRings, AllAtomPathsWithRings*. Default value: | |
585 *AllAtomPathsWithRings*. | |
586 | |
587 For molecules with no rings, first two and last two options are | |
588 equivalent and generate same set of atom paths starting from each | |
589 atom with length between --MinPathLength and --MaxPathLength. | |
590 However, all these four options can result in the same set of final | |
591 atom paths for molecules containing fused, bridged or spiro rings. | |
592 | |
593 For molecules containing rings, atom paths starting from each atom | |
594 can be traversed in four different ways: | |
595 | |
596 *AtomPathsWithoutRings* - Atom paths containing no rings and without | |
597 sharing of bonds in traversed paths. | |
598 | |
599 *AtomPathsWithRings* - Atom paths containing rings and without any | |
600 sharing of bonds in traversed paths. | |
601 | |
602 *AllAtomPathsWithoutRings* - All possible atom paths containing no | |
603 rings and without any sharing of bonds in traversed paths. | |
604 | |
605 *AllAtomPathsWithRings* - All possible atom paths containing rings | |
606 and with sharing of bonds in traversed paths. | |
607 | |
608 Atom path traversal is terminated at the ring atom. | |
609 | |
610 Based on values specified for for -p, --PathMode, --MinPathLength | |
611 and --MaxPathLength, all appropriate atom paths are generated for | |
612 each atom in the molecule and collected in a list. | |
613 | |
614 For each atom path in the filtered atom paths list, an atom path | |
615 string is created using value of -a, --AtomIdentifierType and | |
616 specified values to use for a particular atom identifier type. Value | |
617 of -u, --UseBondSymbols controls whether bond order symbols are used | |
618 during generation of atom path string. Atom symbol corresponds to | |
619 element symbol and characters used to represent bond order are: *1 - | |
620 None; 2 - '='; 3 - '#'; 1.5 or aromatic - ':'; others: bond order | |
621 value*. By default, bond symbols are included in atom path strings. | |
622 Exclusion of bond symbols in atom path strings results in | |
623 fingerprints which correspond purely to atom paths without | |
624 considering bonds. | |
625 | |
626 UseUniquePaths controls the removal of structurally duplicate atom | |
627 path strings are removed from the list. | |
628 | |
629 For *PathLengthBits* value of -m, --mode option, each atom path is | |
630 hashed to a 32 bit unsigned integer key using TextUtil::HashCode | |
631 function. Using the hash key as a seed for a random number | |
632 generator, a random integer value between 0 and --Size is used to | |
633 set corresponding bits in the fingerprint bit-vector string. Value | |
634 of --NumOfBitsToSetPerPaths option controls the number of time a | |
635 random number is generated to set corresponding bits. | |
636 | |
637 For * PathLengthCount* value of -m, --mode option, the number of | |
638 times an atom path appears is tracked and a fingerprints | |
639 count-string corresponding to count of atom paths is generated. | |
640 | |
641 For molecule containing rings, combination of -p, --PathMode and | |
642 --UseBondSymbols allows generation of up to 8 different types of | |
643 atom path length strings: | |
644 | |
645 AllowSharedBonds AllowRings UseBondSymbols | |
646 | |
647 0 0 1 - AtomPathsNoCyclesWithBondSymbols | |
648 0 1 1 - AtomPathsWithCyclesWithBondSymbols | |
649 | |
650 1 0 1 - AllAtomPathsNoCyclesWithBondSymbols | |
651 1 1 1 - AllAtomPathsWithCyclesWithBondSymbols | |
652 [ DEFAULT ] | |
653 | |
654 0 0 0 - AtomPathsNoCyclesNoBondSymbols | |
655 0 1 0 - AtomPathsWithCyclesNoBondSymbols | |
656 | |
657 1 0 0 - AllAtomPathsNoCyclesNoBondSymbols | |
658 1 1 0 - AllAtomPathsWithCyclesNoWithBondSymbols | |
659 | |
660 Default atom path length fingerprints generation for molecules | |
661 containing rings with *AllAtomPathsWithRings* value for -p, | |
662 --PathMode, *Yes* value for --UseBondSymbols, *2* value for | |
663 --MinPathLength and *8* value for --MaxPathLength is the most time | |
664 consuming. Combinations of other options can substantially speed up | |
665 fingerprint generation for molecules containing complex ring | |
666 systems. | |
667 | |
668 Additionally, value for option -a, --AtomIdentifierType in | |
669 conjunction with corresponding specified values for atom types | |
670 changes the nature of atom path length strings and the fingerprints. | |
671 | |
672 -q, --quote *Yes | No* | |
673 Put quote around column values in output CSV/TSV text file(s). | |
674 Possible values: *Yes or No*. Default value: *Yes*. | |
675 | |
676 -r, --root *RootName* | |
677 New file name is generated using the root: <Root>.<Ext>. Default for | |
678 new file names: <SDFileName><PathLengthFP>.<Ext>. The file type | |
679 determines <Ext> value. The sdf, fpf, csv, and tsv <Ext> values are | |
680 used for SD, FP, comma/semicolon, and tab delimited text files, | |
681 respectively.This option is ignored for multiple input files. | |
682 | |
683 -s, --size *number* | |
684 Size of fingerprints. Default value: *1024*. Valid values correspond | |
685 to any positive integer which satisfies the following criteria: | |
686 power of 2, >= 32 and <= 2 ** 32. | |
687 | |
688 Examples: | |
689 | |
690 256 | |
691 512 | |
692 2048 | |
693 | |
694 -u, --UseBondSymbols *Yes | No* | |
695 Specify whether to use bond symbols for atom paths during generation | |
696 of atom path strings. Possible values: *Yes or No*. Default value: | |
697 *Yes*. | |
698 | |
699 *No* value option for -u, --UseBondSymbols allows the generation of | |
700 fingerprints corresponding purely to atoms disregarding all bonds. | |
701 | |
702 --UsePerlCoreRandom *Yes | No* | |
703 Specify whether to use Perl CORE::rand or MayaChemTools | |
704 MathUtil::random function during random number generation for | |
705 setting bits in fingerprints bit-vector strings. Possible values: | |
706 *Yes or No*. Default value: *Yes*. | |
707 | |
708 *No* value option for --UsePerlCoreRandom allows the generation of | |
709 fingerprints bit-vector strings which are same across different | |
710 platforms. | |
711 | |
712 The random number generator implemented in MayaChemTools is a | |
713 variant of linear congruential generator (LCG) as described by | |
714 Miller et al. [ Ref 120 ]. It is also referred to as Lehmer random | |
715 number generator or Park-Miller random number generator. | |
716 | |
717 Unlike Perl's core random number generator function rand, the random | |
718 number generator implemented in MayaChemTools, MathUtil::random, | |
719 generates consistent random values across different platforms for a | |
720 specific random seed and leads to generation of portable | |
721 fingerprints bit-vector strings. | |
722 | |
723 --UseUniquePaths *Yes | No* | |
724 Specify whether to use structurally unique atom paths during | |
725 generation of atom path strings. Possible values: *Yes or No*. | |
726 Default value: *Yes*. | |
727 | |
728 *No* value option for --UseUniquePaths allows usage of all atom | |
729 paths generated by -p, --PathMode option value for generation of | |
730 atom path strings leading to duplicate path count during | |
731 *PathLengthCount* value of -m, --mode option. It doesn't change | |
732 fingerprint string generated during *PathLengthBits* value of -m, | |
733 --mode. | |
734 | |
735 For example, during *AllAtomPathsWithRings* value of -p, --PathMode | |
736 option, benzene has 12 linear paths of length 2 and 12 cyclic paths | |
737 length of 7, but only 6 linear paths of length 2 and 1 cyclic path | |
738 of length 7 are structurally unique. | |
739 | |
740 -v, --VectorStringFormat *IDsAndValuesString | IDsAndValuesPairsString | | |
741 ValuesAndIDsString | ValuesAndIDsPairsString* | |
742 Format of fingerprints vector string data in output SD, FP or | |
743 CSV/TSV text file(s) specified by --output used during | |
744 *PathLengthCount* value of -m, --mode option. Possible values: | |
745 *IDsAndValuesString | IDsAndValuesPairsString | ValuesAndIDsString | | |
746 ValuesAndIDsPairsString*. Defaultvalue: *IDsAndValuesString*. | |
747 | |
748 Examples: | |
749 | |
750 FingerprintsVector;PathLengthCount:AtomicInvariantsAtomTypes:MinLength | |
751 1:MaxLength8;432;NumericalValues;IDsAndValuesPairsString;C.X1.BO1.H3 2 | |
752 C.X2.BO2.H2 4 C.X2.BO3.H1 14 C.X3.BO3.H1 3 C.X3.BO4 10 F.X1.BO1 1 N.X | |
753 2.BO2.H1 1 N.X3.BO3 1 O.X1.BO1.H1 3 O.X1.BO2 2 C.X1.BO1.H3C.X3.BO3.H1 | |
754 2 C.X2.BO2.H2C.X2.BO2.H2 1 C.X2.BO2.H2C.X3.BO3.H1 4 C.X2.BO2.H2C.X3.BO | |
755 4 1 C.X2.BO2.H2N.X3.BO3 1 C.X2.BO3.H1:C.X2.BO3.H1 10 C.X2.BO3.H1:C.... | |
756 | |
757 FingerprintsVector;PathLengthCount:EStateAtomTypes:MinLength1:MaxLengt | |
758 h8;454;NumericalValues;IDsAndValuesPairsString;aaCH 14 aasC 8 aasN 1 d | |
759 O 2 dssC 2 sCH3 2 sF 1 sOH 3 ssCH2 4 ssNH 1 sssCH 3 aaCH:aaCH 10 aaCH: | |
760 aasC 8 aasC:aasC 3 aasC:aasN 2 aasCaasC 2 aasCdssC 1 aasCsF 1 aasCssNH | |
761 1 aasCsssCH 1 aasNssCH2 1 dO=dssC 2 dssCsOH 1 dssCssCH2 1 dssCssNH 1 | |
762 sCH3sssCH 2 sOHsssCH 2 ssCH2ssCH2 1 ssCH2sssCH 4 aaCH:aaCH:aaCH 6 a... | |
763 | |
764 -w, --WorkingDir *DirName* | |
765 Location of working directory. Default: current directory. | |
766 | |
767 EXAMPLES | |
768 To generate path length fingerprints corresponding to all unique paths | |
769 from length 1 through 8 in hexadecimal bit-vector string format of size | |
770 1024 and create a SamplePLFPHex.csv file containing sequential compound | |
771 IDs along with fingerprints bit-vector strings data, type: | |
772 | |
773 % PathLengthFingerprints.pl -o -r SamplePLFPHex Sample.sdf | |
774 | |
775 To generate path length fingerprints corresponding to all unique paths | |
776 from length 1 through 8 in hexadecimal bit-vector string format of size | |
777 1024 and create SamplePLFPHex.sdf, SamplePLFPHex.fpf, and | |
778 SamplePLFPHex.csv files containing sequential compound IDs in CSV file | |
779 along with fingerprints bit-vector strings data, type: | |
780 | |
781 % PathLengthFingerprints.pl --output all -o -r SamplePLFPHex Sample.sdf | |
782 | |
783 To generate path length fingerprints corresponding to all unique paths | |
784 from length 1 through 8 in binary bit-vector string format of size 1024 | |
785 and create a SamplePLFPBin.csv file containing sequential compound IDs | |
786 along with fingerprints bit-vector strings data, type: | |
787 | |
788 % PathLengthFingerprints.pl --BitStringFormat BinaryString --size 2048 | |
789 -o -r SamplePLFPBin Sample.sdf | |
790 | |
791 To generate path length fingerprints corresponding to count of all | |
792 unique paths from length 1 through 8 in IDsAndValuesString format and | |
793 create a SamplePLFPCount.csv file containing sequential compound IDs | |
794 along with fingerprints vector strings data, type: | |
795 | |
796 % PathLengthFingerprints.pl -m PathLengthCount -o -r SamplePLFPCount | |
797 Sample.sdf | |
798 | |
799 To generate path length fingerprints corresponding to count of all | |
800 unique paths from length 1 through 8 in IDsAndValuesString format using | |
801 E-state atom types and create a SamplePLFPCount.csv file containing | |
802 sequential compound IDs along with fingerprints vector strings data, | |
803 type: | |
804 | |
805 % PathLengthFingerprints.pl -m PathLengthCount --AtomIdentifierType | |
806 EStateAtomTypes -o -r SamplePLFPCount Sample.sdf | |
807 | |
808 To generate path length fingerprints corresponding to count of all | |
809 unique paths from length 1 through 8 in IDsAndValuesString format using | |
810 SLogP atom types and create a SamplePLFPCount.csv file containing | |
811 sequential compound IDs along with fingerprints vector strings data, | |
812 type: | |
813 | |
814 % PathLengthFingerprints.pl -m PathLengthCount --AtomIdentifierType | |
815 SLogPAtomTypes -o -r SamplePLFPCount Sample.sdf | |
816 | |
817 To generate path length fingerprints corresponding to count of all | |
818 unique paths from length 1 through 8 in IDsAndValuesString format and | |
819 create a SamplePLFPCount.csv file containing sequential compound IDs | |
820 along with fingerprints vector strings data, type: | |
821 | |
822 % PathLengthFingerprints.pl -m PathLengthCount --VectorStringFormat | |
823 ValuesAndIDsPairsString -o -r SamplePLFPCount Sample.sdf | |
824 | |
825 To generate path length fingerprints corresponding to count of all | |
826 unique paths from length 1 through 8 in IDsAndValuesString format using | |
827 AS,X,BO as atomic invariants and create a SamplePLFPCount.csv file | |
828 containing sequential compound IDs along with fingerprints vector | |
829 strings data, type: | |
830 | |
831 % PathLengthFingerprints.pl -m PathLengthCount --AtomIdentifierType | |
832 AtomicInvariantsAtomTypes --AtomicInvariantsToUse "AS,X,BO" -o | |
833 -r SamplePLFPCount Sample.sdf | |
834 | |
835 To generate path length fingerprints corresponding to count of all paths | |
836 from length 1 through 8 in IDsAndValuesString format and create a | |
837 SamplePLFPCount.csv file containing compound IDs from MolName line along | |
838 with fingerprints vector strings data, type: | |
839 | |
840 % PathLengthFingerprints.pl -m PathLengthCount --UseUniquePaths No | |
841 -o --CompoundIDMode MolName -r SamplePLFPCount --UseUniquePaths No | |
842 Sample.sdf | |
843 | |
844 To generate path length fingerprints corresponding to all unique paths | |
845 from length 1 through 8 in hexadecimal bit-vector string format of size | |
846 512 after folding and create SamplePLFPHex.sdf, SamplePLFPHex.fpf, and | |
847 SamplePLFPHex.sdf files containing sequential compound IDs along with | |
848 fingerprints bit-vector strings data, type: | |
849 | |
850 % PathLengthFingerprints.pl --output all --Fold Yes --FoldedSize 512 | |
851 -o -r SamplePLFPHex Sample.sdf | |
852 | |
853 To generate path length fingerprints corresponding to all unique paths | |
854 from length 1 through 8 containing no rings and without sharing of bonds | |
855 in hexadecimal bit-vector string format of size 1024 and create a | |
856 SamplePLFPHex.csv file containing sequential compound IDs along with | |
857 fingerprints bit-vector strings data and all data fields, type: | |
858 | |
859 % PathLengthFingerprints.pl -p AtomPathsWithoutRings --DataFieldsMode All | |
860 -o -r SamplePLFPHex Sample.sdf | |
861 | |
862 To generate path length fingerprints corresponding to all unique paths | |
863 from length 1 through 8 containing rings and without sharing of bonds in | |
864 hexadecimal bit-vector string format of size 1024 and create a | |
865 SamplePLFPHex.tsv file containing compound IDs derived from combination | |
866 of molecule name line and an explicit compound prefix along with | |
867 fingerprints bit-vector strings data and all data fields, type: | |
868 | |
869 % PathLengthFingerprints.pl -p AtomPathsWithRings --DataFieldsMode | |
870 CompoundID --CompoundIDMode MolnameOrLabelPrefix --CompoundID Cmpd | |
871 --CompoundIDLabel MolID --FingerprintsLabel PathLengthFP --OutDelim Tab | |
872 -r SamplePLFPHex -o Sample.sdf | |
873 | |
874 To generate path length fingerprints corresponding to count of all | |
875 unique paths from length 1 through 8 in IDsAndValuesString format and | |
876 create a SamplePLFPCount.csv file containing sequential compound IDs | |
877 along with fingerprints vector strings data using aromaticity specified | |
878 in SD file, type: | |
879 | |
880 % PathLengthFingerprints.pl -m PathLengthCount --DetectAromaticity No | |
881 -o -r SamplePLFPCount Sample.sdf | |
882 | |
883 To generate path length fingerprints corresponding to all unique paths | |
884 from length 2 through 6 in hexadecimal bit-vector string format of size | |
885 1024 and create a SamplePLFPHex.csv file containing sequential compound | |
886 IDs along with fingerprints bit-vector strings data, type: | |
887 | |
888 % PathLengthFingerprints.pl --MinPathLength 2 --MaxPathLength 6 | |
889 -o -r SamplePLFPHex Sample.sdf | |
890 | |
891 AUTHOR | |
892 Manish Sud <msud@san.rr.com> | |
893 | |
894 SEE ALSO | |
895 InfoFingerprintsFiles.pl, SimilarityMatricesFingerprints.pl, | |
896 AtomNeighborhoodsFingerprints.pl, ExtendedConnectivityFingerprints.pl, | |
897 MACCSKeysFingerprints.pl, TopologicalAtomPairsFingerprints.pl, | |
898 TopologicalAtomTorsionsFingerprints.pl, | |
899 TopologicalPharmacophoreAtomPairsFingerprints.pl, | |
900 TopologicalPharmacophoreAtomTripletsFingerprints.pl | |
901 | |
902 COPYRIGHT | |
903 Copyright (C) 2015 Manish Sud. All rights reserved. | |
904 | |
905 This file is part of MayaChemTools. | |
906 | |
907 MayaChemTools is free software; you can redistribute it and/or modify it | |
908 under the terms of the GNU Lesser General Public License as published by | |
909 the Free Software Foundation; either version 3 of the License, or (at | |
910 your option) any later version. | |
911 |