Mercurial > repos > deepakjadmin > mayatool3_test2
comparison docs/scripts/txt/SimilarityMatricesFingerprints.txt @ 0:4816e4a8ae95 draft default tip
Uploaded
author | deepakjadmin |
---|---|
date | Wed, 20 Jan 2016 09:23:18 -0500 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:4816e4a8ae95 |
---|---|
1 NAME | |
2 SimilarityMatricesFingerprints.pl - Calculate similarity matrices using | |
3 fingerprints strings data in SD, FP and CSV/TSV text file(s) | |
4 | |
5 SYNOPSIS | |
6 SimilarityMatricesFingerprints.pl SDFile(s) FPFile(s) TextFile(s)... | |
7 | |
8 SimilarityMatricesFingerprints.pl [--alpha *number*] [--beta *number*] | |
9 [-b, --BitVectorComparisonMode *All | "TanimotoSimilarity,[ | |
10 TverskySimilarity, ... ]"*] [-c, --ColMode *ColNum | ColLabel*] | |
11 [--CompoundIDCol *col number | col name*] [--CompoundIDPrefix *text*] | |
12 [--CompoundIDField *DataFieldName*] [--CompoundIDMode *DataField | | |
13 MolName | LabelPrefix | MolNameOrLabelPrefix*] [-d, --detail | |
14 *InfoLevel*] [-f, --fast] [--FingerprintsCol *col number | col name*] | |
15 [--FingerprintsField *FieldLabel*] [-h, --help] [--InDelim *comma | | |
16 semicolon*] [--InputDataMode *LoadInMemory | ScanFile*] [-m, --mode | |
17 *AutoDetect | FingerprintsBitVectorString | FingerprintsVectorString*] | |
18 [--OutDelim *comma | tab | semicolon*] [--OutMatrixFormat | |
19 *RowsAndColumns | IDPairsAndValue*] [--OutMatrixType *FullMatrix | | |
20 UpperTriangularMatrix | LowerTriangularMatrix*] [-o, --overwrite] [-p, | |
21 --precision *number*] [-q, --quote *Yes | No*] [-r, --root *RootName*] | |
22 [-v, --VectorComparisonMode *All | "TanimotoSimilairy, [ | |
23 ManhattanDistance, ...]"*] [--VectorComparisonFormulism *All | | |
24 "AlgebraicForm, [BinaryForm, SetTheoreticForm]"*] [-w, --WorkingDir | |
25 dirname] SDFile(s) FPFile(s) TextFile(s)... | |
26 | |
27 DESCRIPTION | |
28 Calculate similarity matrices using fingerprint bit-vector or vector | |
29 strings data in *SD, FP and CSV/TSV* text file(s) and generate CSV/TSV | |
30 text file(s) containing values for specified similarity and distance | |
31 coefficients. | |
32 | |
33 The scripts SimilarityMatrixSDFiles.pl and SimilarityMatrixTextFiles.pl | |
34 have been removed from the current release of MayaChemTools and their | |
35 functionality merged with this script. | |
36 | |
37 The valid *SDFile* extensions are *.sdf* and *.sd*. All SD files in a | |
38 current directory can be specified either by **.sdf* or the current | |
39 directory name. | |
40 | |
41 The valid *FPFile* extensions are *.fpf* and *.fp*. All FP files in a | |
42 current directory can be specified either by **.fpf* or the current | |
43 directory name. | |
44 | |
45 The valid *TextFile* extensions are *.csv* and *.tsv* for | |
46 comma/semicolon and tab delimited text files respectively. All other | |
47 file names are ignored. All text files in a current directory can be | |
48 specified by **.csv*, **.tsv*, or the current directory name. The | |
49 --indelim option determines the format of *TextFile(s)*. Any file which | |
50 doesn't correspond to the format indicated by --indelim option is | |
51 ignored. | |
52 | |
53 Example of *FP* file containing fingerprints bit-vector string data: | |
54 | |
55 # | |
56 # Package = MayaChemTools 7.4 | |
57 # ReleaseDate = Oct 21, 2010 | |
58 # | |
59 # TimeStamp = Mon Mar 7 15:14:01 2011 | |
60 # | |
61 # FingerprintsStringType = FingerprintsBitVector | |
62 # | |
63 # Description = PathLengthBits:AtomicInvariantsAtomTypes:MinLength1:... | |
64 # Size = 1024 | |
65 # BitStringFormat = HexadecimalString | |
66 # BitsOrder = Ascending | |
67 # | |
68 Cmpd1 9c8460989ec8a49913991a6603130b0a19e8051c89184414953800cc21510... | |
69 Cmpd2 000000249400840040100042011001001980410c000000001010088001120... | |
70 ... ... | |
71 ... .. | |
72 | |
73 Example of *FP* file containing fingerprints vector string data: | |
74 | |
75 # | |
76 # Package = MayaChemTools 7.4 | |
77 # ReleaseDate = Oct 21, 2010 | |
78 # | |
79 # TimeStamp = Mon Mar 7 15:14:01 2011 | |
80 # | |
81 # FingerprintsStringType = FingerprintsVector | |
82 # | |
83 # Description = PathLengthBits:AtomicInvariantsAtomTypes:MinLength1:... | |
84 # VectorStringFormat = IDsAndValuesString | |
85 # VectorValuesType = NumericalValues | |
86 # | |
87 Cmpd1 338;C F N O C:C C:N C=O CC CF CN CO C:C:C C:C:N C:CC C:CF C:CN C: | |
88 N:C C:NC CC:N CC=O CCC CCN CCO CNC NC=O O=CO C:C:C:C C:C:C:N C:C:CC...; | |
89 33 1 2 5 21 2 2 12 1 3 3 20 2 10 2 2 1 2 2 2 8 2 5 1 1 1 19 2 8 2 2 2 2 | |
90 6 2 2 2 2 2 2 2 2 3 2 2 1 4 1 5 1 1 18 6 2 2 1 2 10 2 1 2 1 2 2 2 2 ... | |
91 Cmpd2 103;C N O C=N C=O CC CN CO CC=O CCC CCN CCO CNC N=CN NC=O NCN O=C | |
92 O C CC=O CCCC CCCN CCCO CCNC CNC=N CNC=O CNCN CCCC=O CCCCC CCCCN CC...; | |
93 15 4 4 1 2 13 5 2 2 15 5 3 2 2 1 1 1 2 17 7 6 5 1 1 1 2 15 8 5 7 2 2 2 2 | |
94 1 2 1 1 3 15 7 6 8 3 4 4 3 2 2 1 2 3 14 2 4 7 4 4 4 4 1 1 1 2 1 1 1 ... | |
95 ... ... | |
96 ... ... | |
97 | |
98 Example of *SD* file containing fingerprints bit-vector string data: | |
99 | |
100 ... ... | |
101 ... ... | |
102 $$$$ | |
103 ... ... | |
104 ... ... | |
105 ... ... | |
106 41 44 0 0 0 0 0 0 0 0999 V2000 | |
107 -3.3652 1.4499 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 | |
108 ... ... | |
109 2 3 1 0 0 0 0 | |
110 ... ... | |
111 M END | |
112 > <CmpdID> | |
113 Cmpd1 | |
114 | |
115 > <PathLengthFingerprints> | |
116 FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes:MinLengt | |
117 h1:MaxLength8;1024;HexadecimalString;Ascending;9c8460989ec8a49913991a66 | |
118 03130b0a19e8051c89184414953800cc2151082844a201042800130860308e8204d4028 | |
119 00831048940e44281c00060449a5000ac80c894114e006321264401600846c050164462 | |
120 08190410805000304a10205b0100e04c0038ba0fad0209c0ca8b1200012268b61c0026a | |
121 aa0660a11014a011d46 | |
122 | |
123 $$$$ | |
124 ... ... | |
125 ... ... | |
126 | |
127 Example of CSV *Text* file containing fingerprints bit-vector string | |
128 data: | |
129 | |
130 "CompoundID","PathLengthFingerprints" | |
131 "Cmpd1","FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes | |
132 :MinLength1:MaxLength8;1024;HexadecimalString;Ascending;9c8460989ec8a4 | |
133 9913991a6603130b0a19e8051c89184414953800cc2151082844a20104280013086030 | |
134 8e8204d402800831048940e44281c00060449a5000ac80c894114e006321264401..." | |
135 ... ... | |
136 ... ... | |
137 | |
138 The current release of MayaChemTools supports the following types of | |
139 fingerprint bit-vector and vector strings: | |
140 | |
141 FingerprintsVector;AtomNeighborhoods:AtomicInvariantsAtomTypes:MinRadi | |
142 us0:MaxRadius2;41;AlphaNumericalValues;ValuesString;NR0-C.X1.BO1.H3-AT | |
143 C1:NR1-C.X3.BO3.H1-ATC1:NR2-C.X1.BO1.H3-ATC1:NR2-C.X3.BO4-ATC1 NR0-C.X | |
144 1.BO1.H3-ATC1:NR1-C.X3.BO3.H1-ATC1:NR2-C.X1.BO1.H3-ATC1:NR2-C.X3.BO4-A | |
145 TC1 NR0-C.X2.BO2.H2-ATC1:NR1-C.X2.BO2.H2-ATC1:NR1-C.X3.BO3.H1-ATC1:NR2 | |
146 -C.X2.BO2.H2-ATC1:NR2-N.X3.BO3-ATC1:NR2-O.X1.BO1.H1-ATC1 NR0-C.X2.B... | |
147 | |
148 FingerprintsVector;AtomTypesCount:AtomicInvariantsAtomTypes:ArbitraryS | |
149 ize;10;NumericalValues;IDsAndValuesString;C.X1.BO1.H3 C.X2.BO2.H2 C.X2 | |
150 .BO3.H1 C.X3.BO3.H1 C.X3.BO4 F.X1.BO1 N.X2.BO2.H1 N.X3.BO3 O.X1.BO1.H1 | |
151 O.X1.BO2;2 4 14 3 10 1 1 1 3 2 | |
152 | |
153 FingerprintsVector;AtomTypesCount:SLogPAtomTypes:ArbitrarySize;16;Nume | |
154 ricalValues;IDsAndValuesString;C1 C10 C11 C14 C18 C20 C21 C22 C5 CS F | |
155 N11 N4 O10 O2 O9;5 1 1 1 14 4 2 1 2 2 1 1 1 1 3 1 | |
156 | |
157 FingerprintsVector;AtomTypesCount:SLogPAtomTypes:FixedSize;67;OrderedN | |
158 umericalValues;IDsAndValuesString;C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C | |
159 12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 C25 C26 C27 CS N1 N | |
160 2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 N14 NS O1 O2 O3 O4 O5 O6 O7 O8 | |
161 O9 O10 O11 O12 OS F Cl Br I Hal P S1 S2 S3 Me1 Me2;5 0 0 0 2 0 0 0 0 1 | |
162 1 0 0 1 0 0 0 14 0 4 2 1 0 0 0 0 0 2 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0... | |
163 | |
164 FingerprintsVector;EStateIndicies:ArbitrarySize;11;NumericalValues;IDs | |
165 AndValuesString;SaaCH SaasC SaasN SdO SdssC SsCH3 SsF SsOH SssCH2 SssN | |
166 H SsssCH;24.778 4.387 1.993 25.023 -1.435 3.975 14.006 29.759 -0.073 3 | |
167 .024 -2.270 | |
168 | |
169 FingerprintsVector;EStateIndicies:FixedSize;87;OrderedNumericalValues; | |
170 ValuesString;0 0 0 0 0 0 0 3.975 0 -0.073 0 0 24.778 -2.270 0 0 -1.435 | |
171 4.387 0 0 0 0 0 0 3.024 0 0 0 0 0 0 0 1.993 0 29.759 25.023 0 0 0 0 1 | |
172 4.006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | |
173 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | |
174 | |
175 FingerprintsVector;ExtendedConnectivity:AtomicInvariantsAtomTypes:Radi | |
176 us2;60;AlphaNumericalValues;ValuesString;73555770 333564680 352413391 | |
177 666191900 1001270906 1371674323 1481469939 1977749791 2006158649 21414 | |
178 08799 49532520 64643108 79385615 96062769 273726379 564565671 85514103 | |
179 5 906706094 988546669 1018231313 1032696425 1197507444 1331250018 1338 | |
180 532734 1455473691 1607485225 1609687129 1631614296 1670251330 17303... | |
181 | |
182 FingerprintsVector;ExtendedConnectivityCount:AtomicInvariantsAtomTypes | |
183 :Radius2;60;NumericalValues;IDsAndValuesString;73555770 333564680 3524 | |
184 13391 666191900 1001270906 1371674323 1481469939 1977749791 2006158649 | |
185 2141408799 49532520 64643108 79385615 96062769 273726379 564565671...; | |
186 3 2 1 1 14 1 2 10 4 3 1 1 1 1 2 1 2 1 1 1 2 3 1 1 2 1 3 3 8 2 2 2 6 2 | |
187 1 2 1 1 2 1 1 1 2 1 1 2 1 2 1 1 1 1 1 1 1 1 1 2 1 1 | |
188 | |
189 FingerprintsBitVector;ExtendedConnectivityBits:AtomicInvariantsAtomTyp | |
190 es:Radius2;1024;BinaryString;Ascending;0000000000000000000000000000100 | |
191 0000000001010000000110000011000000000000100000000000000000000000100001 | |
192 1000000110000000000000000000000000010011000000000000000000000000010000 | |
193 0000000000000000000000000010000000000000000001000000000000000000000000 | |
194 0000000000010000100001000000000000101000000000000000100000000000000... | |
195 | |
196 FingerprintsVector;ExtendedConnectivity:FunctionalClassAtomTypes:Radiu | |
197 s2;57;AlphaNumericalValues;ValuesString;24769214 508787397 850393286 8 | |
198 62102353 981185303 1231636850 1649386610 1941540674 263599683 32920567 | |
199 1 571109041 639579325 683993318 723853089 810600886 885767127 90326012 | |
200 7 958841485 981022393 1126908698 1152248391 1317567065 1421489994 1455 | |
201 632544 1557272891 1826413669 1983319256 2015750777 2029559552 20404... | |
202 | |
203 FingerprintsVector;ExtendedConnectivity:EStateAtomTypes:Radius2;62;Alp | |
204 haNumericalValues;ValuesString;25189973 528584866 662581668 671034184 | |
205 926543080 1347067490 1738510057 1759600920 2034425745 2097234755 21450 | |
206 44754 96779665 180364292 341712110 345278822 386540408 387387308 50430 | |
207 1706 617094135 771528807 957666640 997798220 1158349170 1291258082 134 | |
208 1138533 1395329837 1420277211 1479584608 1486476397 1487556246 1566... | |
209 | |
210 FingerprintsBitVector;MACCSKeyBits;166;BinaryString;Ascending;00000000 | |
211 0000000000000000000000000000000001001000010010000000010010000000011100 | |
212 0100101010111100011011000100110110000011011110100110111111111111011111 | |
213 11111111111110111000 | |
214 | |
215 FingerprintsBitVector;MACCSKeyBits;322;BinaryString;Ascending;11101011 | |
216 1110011111100101111111000111101100110000000000000011100010000000000000 | |
217 0000000000000000000000000000000000000000000000101000000000000000000000 | |
218 0000000000000000000000000000000000000000000000000000000000000000000000 | |
219 0000000000000000000000000000000000000011000000000000000000000000000000 | |
220 0000000000000000000000000000000000000000 | |
221 | |
222 FingerprintsVector;MACCSKeyCount;166;OrderedNumericalValues;ValuesStri | |
223 ng;0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | |
224 0 0 0 0 0 0 0 1 0 0 3 0 0 0 0 4 0 0 2 0 0 0 0 0 0 0 0 2 0 0 2 0 0 0 0 | |
225 0 0 0 0 1 1 8 0 0 0 1 0 0 1 0 1 0 1 0 3 1 3 1 0 0 0 1 2 0 11 1 0 0 0 | |
226 5 0 0 1 2 0 1 1 0 0 0 0 0 1 1 0 1 1 1 1 0 4 0 0 1 1 0 4 6 1 1 1 2 1 1 | |
227 3 5 2 2 0 5 3 5 1 1 2 5 1 2 1 2 4 8 3 5 5 2 2 0 3 5 4 1 | |
228 | |
229 FingerprintsVector;MACCSKeyCount;322;OrderedNumericalValues;ValuesStri | |
230 ng;14 8 2 0 2 0 4 4 2 1 4 0 0 2 5 10 5 2 1 0 0 2 0 5 13 3 28 5 5 3 0 0 | |
231 0 4 2 1 1 0 1 1 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 5 3 0 0 0 1 0 | |
232 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | |
233 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 2 0 0 0 0 0 0 0 0 0 | |
234 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... | |
235 | |
236 FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes:MinLeng | |
237 th1:MaxLength8;1024;BinaryString;Ascending;001000010011010101011000110 | |
238 0100010101011000101001011100110001000010001001101000001001001001001000 | |
239 0010110100000111001001000001001010100100100000000011000000101001011100 | |
240 0010000001000101010100000100111100110111011011011000000010110111001101 | |
241 0101100011000000010001000011000010100011101100001000001000100000000... | |
242 | |
243 FingerprintsVector;PathLengthCount:AtomicInvariantsAtomTypes:MinLength | |
244 1:MaxLength8;432;NumericalValues;IDsAndValuesPairsString;C.X1.BO1.H3 2 | |
245 C.X2.BO2.H2 4 C.X2.BO3.H1 14 C.X3.BO3.H1 3 C.X3.BO4 10 F.X1.BO1 1 N.X | |
246 2.BO2.H1 1 N.X3.BO3 1 O.X1.BO1.H1 3 O.X1.BO2 2 C.X1.BO1.H3C.X3.BO3.H1 | |
247 2 C.X2.BO2.H2C.X2.BO2.H2 1 C.X2.BO2.H2C.X3.BO3.H1 4 C.X2.BO2.H2C.X3.BO | |
248 4 1 C.X2.BO2.H2N.X3.BO3 1 C.X2.BO3.H1:C.X2.BO3.H1 10 C.X2.BO3.H1:C.... | |
249 | |
250 FingerprintsVector;PathLengthCount:MMFF94AtomTypes:MinLength1:MaxLengt | |
251 h8;463;NumericalValues;IDsAndValuesPairsString;C5A 2 C5B 2 C=ON 1 CB 1 | |
252 8 COO 1 CR 9 F 1 N5 1 NC=O 1 O=CN 1 O=CO 1 OC=O 1 OR 2 C5A:C5B 2 C5A:N | |
253 5 2 C5ACB 1 C5ACR 1 C5B:C5B 1 C5BC=ON 1 C5BCB 1 C=ON=O=CN 1 C=ONNC=O 1 | |
254 CB:CB 18 CBF 1 CBNC=O 1 COO=O=CO 1 COOCR 1 COOOC=O 1 CRCR 7 CRN5 1 CR | |
255 OR 2 C5A:C5B:C5B 2 C5A:C5BC=ON 1 C5A:C5BCB 1 C5A:N5:C5A 1 C5A:N5CR ... | |
256 | |
257 FingerprintsVector;TopologicalAtomPairs:AtomicInvariantsAtomTypes:MinD | |
258 istance1:MaxDistance10;223;NumericalValues;IDsAndValuesString;C.X1.BO1 | |
259 .H3-D1-C.X3.BO3.H1 C.X2.BO2.H2-D1-C.X2.BO2.H2 C.X2.BO2.H2-D1-C.X3.BO3. | |
260 H1 C.X2.BO2.H2-D1-C.X3.BO4 C.X2.BO2.H2-D1-N.X3.BO3 C.X2.BO3.H1-D1-...; | |
261 2 1 4 1 1 10 8 1 2 6 1 2 2 1 2 1 2 2 1 2 1 5 1 10 12 2 2 1 2 1 9 1 3 1 | |
262 1 1 2 2 1 3 6 1 6 14 2 2 2 3 1 3 1 8 2 2 1 3 2 6 1 2 2 5 1 3 1 23 1... | |
263 | |
264 FingerprintsVector;TopologicalAtomPairs:FunctionalClassAtomTypes:MinDi | |
265 stance1:MaxDistance10;144;NumericalValues;IDsAndValuesString;Ar-D1-Ar | |
266 Ar-D1-Ar.HBA Ar-D1-HBD Ar-D1-Hal Ar-D1-None Ar.HBA-D1-None HBA-D1-NI H | |
267 BA-D1-None HBA.HBD-D1-NI HBA.HBD-D1-None HBD-D1-None NI-D1-None No...; | |
268 23 2 1 1 2 1 1 1 1 2 1 1 7 28 3 1 3 2 8 2 1 1 1 5 1 5 24 3 3 4 2 13 4 | |
269 1 1 4 1 5 22 4 4 3 1 19 1 1 1 1 1 2 2 3 1 1 8 25 4 5 2 3 1 26 1 4 1 ... | |
270 | |
271 FingerprintsVector;TopologicalAtomTorsions:AtomicInvariantsAtomTypes;3 | |
272 3;NumericalValues;IDsAndValuesString;C.X1.BO1.H3-C.X3.BO3.H1-C.X3.BO4- | |
273 C.X3.BO4 C.X1.BO1.H3-C.X3.BO3.H1-C.X3.BO4-N.X3.BO3 C.X2.BO2.H2-C.X2.BO | |
274 2.H2-C.X3.BO3.H1-C.X2.BO2.H2 C.X2.BO2.H2-C.X2.BO2.H2-C.X3.BO3.H1-O...; | |
275 2 2 1 1 2 2 1 1 3 4 4 8 4 2 2 6 2 2 1 2 1 1 2 1 1 2 6 2 4 2 1 3 1 | |
276 | |
277 FingerprintsVector;TopologicalAtomTorsions:EStateAtomTypes;36;Numerica | |
278 lValues;IDsAndValuesString;aaCH-aaCH-aaCH-aaCH aaCH-aaCH-aaCH-aasC aaC | |
279 H-aaCH-aasC-aaCH aaCH-aaCH-aasC-aasC aaCH-aaCH-aasC-sF aaCH-aaCH-aasC- | |
280 ssNH aaCH-aasC-aasC-aasC aaCH-aasC-aasC-aasN aaCH-aasC-ssNH-dssC a...; | |
281 4 4 8 4 2 2 6 2 2 2 4 3 2 1 3 3 2 2 2 1 2 1 1 1 2 1 1 1 1 1 1 1 2 1 1 2 | |
282 | |
283 FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAtomTypes:M | |
284 inDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesString;C.X1 | |
285 .BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D1 | |
286 0-C.X3.BO4-D9 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 C.X1.BO1.H3-D1 | |
287 -C.X2.BO2.H2-D4-C.X2.BO2.H2-D5 C.X1.BO1.H3-D1-C.X2.BO2.H2-D6-C.X3....; | |
288 1 2 2 2 2 2 2 2 8 8 4 8 4 4 2 2 2 2 4 2 2 2 4 2 2 2 2 1 2 2 4 4 4 2 2 | |
289 2 4 4 4 8 4 4 2 4 4 4 2 4 4 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 8... | |
290 | |
291 FingerprintsVector;TopologicalAtomTriplets:SYBYLAtomTypes:MinDistance1 | |
292 :MaxDistance10;2332;NumericalValues;IDsAndValuesString;C.2-D1-C.2-D9-C | |
293 .3-D10 C.2-D1-C.2-D9-C.ar-D10 C.2-D1-C.3-D1-C.3-D2 C.2-D1-C.3-D10-C.3- | |
294 D9 C.2-D1-C.3-D2-C.3-D3 C.2-D1-C.3-D2-C.ar-D3 C.2-D1-C.3-D3-C.3-D4 C.2 | |
295 -D1-C.3-D3-N.ar-D4 C.2-D1-C.3-D3-O.3-D2 C.2-D1-C.3-D4-C.3-D5 C.2-D1-C. | |
296 3-D5-C.3-D6 C.2-D1-C.3-D5-O.3-D4 C.2-D1-C.3-D6-C.3-D7 C.2-D1-C.3-D7... | |
297 | |
298 FingerprintsVector;TopologicalPharmacophoreAtomPairs:ArbitrarySize:Min | |
299 Distance1:MaxDistance10;54;NumericalValues;IDsAndValuesString;H-D1-H H | |
300 -D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA H-D2-HBD HBA-D2-HBA HBA-D2- | |
301 HBD H-D3-H H-D3-HBA H-D3-HBD H-D3-NI HBA-D3-NI HBD-D3-NI H-D4-H H-D4-H | |
302 BA H-D4-HBD HBA-D4-HBA HBA-D4-HBD HBD-D4-HBD H-D5-H H-D5-HBA H-D5-...; | |
303 18 1 2 1 22 12 8 1 2 18 6 3 1 1 1 22 13 6 5 7 2 28 9 5 1 1 1 36 16 10 | |
304 3 4 1 37 10 8 1 35 10 9 3 3 1 28 7 7 4 18 16 12 5 1 2 1 | |
305 | |
306 FingerprintsVector;TopologicalPharmacophoreAtomPairs:FixedSize:MinDist | |
307 ance1:MaxDistance10;150;OrderedNumericalValues;ValuesString;18 0 0 1 0 | |
308 0 0 2 0 0 1 0 0 0 0 22 12 8 0 0 1 2 0 0 0 0 0 0 0 0 18 6 3 1 0 0 0 1 | |
309 0 0 1 0 0 0 0 22 13 6 0 0 5 7 0 0 2 0 0 0 0 0 28 9 5 1 0 0 0 1 0 0 1 0 | |
310 0 0 0 36 16 10 0 0 3 4 0 0 1 0 0 0 0 0 37 10 8 0 0 0 0 1 0 0 0 0 0 0 | |
311 0 35 10 9 0 0 3 3 0 0 1 0 0 0 0 0 28 7 7 4 0 0 0 0 0 0 0 0 0 0 0 18... | |
312 | |
313 FingerprintsVector;TopologicalPharmacophoreAtomTriplets:ArbitrarySize: | |
314 MinDistance1:MaxDistance10;696;NumericalValues;IDsAndValuesString;Ar1- | |
315 Ar1-Ar1 Ar1-Ar1-H1 Ar1-Ar1-HBA1 Ar1-Ar1-HBD1 Ar1-H1-H1 Ar1-H1-HBA1 Ar1 | |
316 -H1-HBD1 Ar1-HBA1-HBD1 H1-H1-H1 H1-H1-HBA1 H1-H1-HBD1 H1-HBA1-HBA1 H1- | |
317 HBA1-HBD1 H1-HBA1-NI1 H1-HBD1-NI1 HBA1-HBA1-NI1 HBA1-HBD1-NI1 Ar1-...; | |
318 46 106 8 3 83 11 4 1 21 5 3 1 2 2 1 1 1 100 101 18 11 145 132 26 14 23 | |
319 28 3 3 5 4 61 45 10 4 16 20 7 5 1 3 4 5 3 1 1 1 1 5 4 2 1 2 2 2 1 1 1 | |
320 119 123 24 15 185 202 41 25 22 17 3 5 85 95 18 11 23 17 3 1 1 6 4 ... | |
321 | |
322 FingerprintsVector;TopologicalPharmacophoreAtomTriplets:FixedSize:MinD | |
323 istance1:MaxDistance10;2692;OrderedNumericalValues;ValuesString;46 106 | |
324 8 3 0 0 83 11 4 0 0 0 1 0 0 0 0 0 0 0 0 21 5 3 0 0 1 2 2 0 0 1 0 0 0 | |
325 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 101 18 11 0 0 145 132 26 | |
326 14 0 0 23 28 3 3 0 0 5 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 61 45 10 4 0 | |
327 0 16 20 7 5 1 0 3 4 5 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 5 ... | |
328 | |
329 OPTIONS | |
330 --alpha *number* | |
331 Value of alpha parameter for calculating *Tversky* similarity | |
332 coefficient specified for -b, --BitVectorComparisonMode option. It | |
333 corresponds to weights assigned for bits set to "1" in a pair of | |
334 fingerprint bit-vectors during the calculation of similarity | |
335 coefficient. Possible values: *0 to 1*. Default value: <0.5>. | |
336 | |
337 --beta *number* | |
338 Value of beta parameter for calculating *WeightedTanimoto* and | |
339 *WeightedTversky* similarity coefficients specified for -b, | |
340 --BitVectorComparisonMode option. It is used to weight the | |
341 contributions of bits set to "0" during the calculation of | |
342 similarity coefficients. Possible values: *0 to 1*. Default value of | |
343 <1> makes *WeightedTanimoto* and *WeightedTversky* equivalent to | |
344 *Tanimoto* and *Tversky*. | |
345 | |
346 -b, --BitVectorComparisonMode *All | | |
347 "TanimotoSimilarity,[TverskySimilarity,...]"* | |
348 Specify what similarity coefficients to use for calculating | |
349 similarity matrices for fingerprints bit-vector strings data values | |
350 in *TextFile(s)*: calculate similarity matrices for all supported | |
351 similarity coefficients or specify a comma delimited list of | |
352 similarity coefficients. Possible values: *All | | |
353 "TanimotoSimilarity,[TverskySimilarity,...]*. Default: | |
354 *TanimotoSimilarity* | |
355 | |
356 *All* uses complete list of supported similarity coefficients: | |
357 *BaroniUrbaniSimilarity, BuserSimilarity, CosineSimilarity, | |
358 DiceSimilarity, DennisSimilarity, ForbesSimilarity, | |
359 FossumSimilarity, HamannSimilarity, JacardSimilarity, | |
360 Kulczynski1Similarity, Kulczynski2Similarity, MatchingSimilarity, | |
361 McConnaugheySimilarity, OchiaiSimilarity, PearsonSimilarity, | |
362 RogersTanimotoSimilarity, RussellRaoSimilarity, SimpsonSimilarity, | |
363 SkoalSneath1Similarity, SkoalSneath2Similarity, | |
364 SkoalSneath3Similarity, TanimotoSimilarity, TverskySimilarity, | |
365 YuleSimilarity, WeightedTanimotoSimilarity, | |
366 WeightedTverskySimilarity*. These similarity coefficients are | |
367 described below. | |
368 | |
369 For two fingerprint bit-vectors A and B of same size, let: | |
370 | |
371 Na = Number of bits set to "1" in A | |
372 Nb = Number of bits set to "1" in B | |
373 Nc = Number of bits set to "1" in both A and B | |
374 Nd = Number of bits set to "0" in both A and B | |
375 | |
376 Nt = Number of bits set to "1" or "0" in A or B (Size of A or B) | |
377 Nt = Na + Nb - Nc + Nd | |
378 | |
379 Na - Nc = Number of bits set to "1" in A but not in B | |
380 Nb - Nc = Number of bits set to "1" in B but not in A | |
381 | |
382 Then, various similarity coefficients [ Ref. 40 - 42 ] for a pair of | |
383 bit-vectors A and B are defined as follows: | |
384 | |
385 *BaroniUrbaniSimilarity*: ( SQRT( Nc * Nd ) + Nc ) / ( SQRT ( Nc * | |
386 Nd ) + Nc + ( Na - Nc ) + ( Nb - Nc ) ) ( same as Buser ) | |
387 | |
388 *BuserSimilarity*: ( SQRT ( Nc * Nd ) + Nc ) / ( SQRT ( Nc * Nd ) + | |
389 Nc + ( Na - Nc ) + ( Nb - Nc ) ) ( same as BaroniUrbani ) | |
390 | |
391 *CosineSimilarity*: Nc / SQRT ( Na * Nb ) (same as Ochiai) | |
392 | |
393 *DiceSimilarity*: (2 * Nc) / ( Na + Nb ) | |
394 | |
395 *DennisSimilarity*: ( Nc * Nd - ( ( Na - Nc ) * ( Nb - Nc ) ) ) / | |
396 SQRT ( Nt * Na * Nb) | |
397 | |
398 *ForbesSimilarity*: ( Nt * Nc ) / ( Na * Nb ) | |
399 | |
400 *FossumSimilarity*: ( Nt * ( ( Nc - 1/2 ) ** 2 ) / ( Na * Nb ) | |
401 | |
402 *HamannSimilarity*: ( ( Nc + Nd ) - ( Na - Nc ) - ( Nb - Nc ) ) / Nt | |
403 | |
404 *JaccardSimilarity*: Nc / ( ( Na - Nc) + ( Nb - Nc ) + Nc ) = Nc / ( | |
405 Na + Nb - Nc ) (same as Tanimoto) | |
406 | |
407 *Kulczynski1Similarity*: Nc / ( ( Na - Nc ) + ( Nb - Nc) ) = Nc / ( | |
408 Na + Nb - 2Nc ) | |
409 | |
410 *Kulczynski2Similarity*: ( ( Nc / 2 ) * ( 2 * Nc + ( Na - Nc ) + ( | |
411 Nb - Nc) ) ) / ( ( Nc + ( Na - Nc ) ) * ( Nc + ( Nb - Nc ) ) ) = 0.5 | |
412 * ( Nc / Na + Nc / Nb ) | |
413 | |
414 *MatchingSimilarity*: ( Nc + Nd ) / Nt | |
415 | |
416 *McConnaugheySimilarity*: ( Nc ** 2 - ( Na - Nc ) * ( Nb - Nc) ) / ( | |
417 Na * Nb ) | |
418 | |
419 *OchiaiSimilarity*: Nc / SQRT ( Na * Nb ) (same as Cosine) | |
420 | |
421 *PearsonSimilarity*: ( ( Nc * Nd ) - ( ( Na - Nc ) * ( Nb - Nc ) ) / | |
422 SQRT ( Na * Nb * ( Na - Nc + Nd ) * ( Nb - Nc + Nd ) ) | |
423 | |
424 *RogersTanimotoSimilarity*: ( Nc + Nd ) / ( ( Na - Nc) + ( Nb - Nc) | |
425 + Nt) = ( Nc + Nd ) / ( Na + Nb - 2Nc + Nt) | |
426 | |
427 *RussellRaoSimilarity*: Nc / Nt | |
428 | |
429 *SimpsonSimilarity*: Nc / MIN ( Na, Nb) | |
430 | |
431 *SkoalSneath1Similarity*: Nc / ( Nc + 2 * ( Na - Nc) + 2 * ( Nb - | |
432 Nc) ) = Nc / ( 2 * Na + 2 * Nb - 3 * Nc ) | |
433 | |
434 *SkoalSneath2Similarity*: ( 2 * Nc + 2 * Nd ) / ( Nc + Nd + Nt ) | |
435 | |
436 *SkoalSneath3Similarity*: ( Nc + Nd ) / ( ( Na - Nc ) + ( Nb - Nc ) | |
437 ) = ( Nc + Nd ) / ( Na + Nb - 2 * Nc ) | |
438 | |
439 *TanimotoSimilarity*: Nc / ( ( Na - Nc) + ( Nb - Nc ) + Nc ) = Nc / | |
440 ( Na + Nb - Nc ) (same as Jaccard) | |
441 | |
442 *TverskySimilarity*: Nc / ( alpha * ( Na - Nc ) + ( 1 - alpha) * ( | |
443 Nb - Nc) + Nc ) = Nc / ( alpha * ( Na - Nb ) + Nb) | |
444 | |
445 *YuleSimilarity*: ( ( Nc * Nd ) - ( ( Na - Nc ) * ( Nb - Nc ) ) ) / | |
446 ( ( Nc * Nd ) + ( ( Na - Nc ) * ( Nb - Nc ) ) ) | |
447 | |
448 Values of Tanimoto/Jaccard and Tversky coefficients are dependent on | |
449 only those bit which are set to "1" in both A and B. In order to | |
450 take into account all bit positions, modified versions of Tanimoto [ | |
451 Ref. 42 ] and Tversky [ Ref. 43 ] have been developed. | |
452 | |
453 Let: | |
454 | |
455 Na' = Number of bits set to "0" in A | |
456 Nb' = Number of bits set to "0" in B | |
457 Nc' = Number of bits set to "0" in both A and B | |
458 | |
459 Tanimoto': Nc' / ( ( Na' - Nc') + ( Nb' - Nc' ) + Nc' ) = Nc' / ( | |
460 Na' + Nb' - Nc' ) | |
461 | |
462 Tversky': Nc' / ( alpha * ( Na' - Nc' ) + ( 1 - alpha) * ( Nb' - Nc' | |
463 ) + Nc' ) = Nc' / ( alpha * ( Na' - Nb' ) + Nb') | |
464 | |
465 Then: | |
466 | |
467 *WeightedTanimotoSimilarity* = beta * Tanimoto + (1 - beta) * | |
468 Tanimoto' | |
469 | |
470 *WeightedTverskySimilarity* = beta * Tversky + (1 - beta) * Tversky' | |
471 | |
472 -c, --ColMode *ColNum | ColLabel* | |
473 Specify how columns are identified in *TextFile(s)*: using column | |
474 number or column label. Possible values: *ColNum or ColLabel*. | |
475 Default value: *ColNum*. | |
476 | |
477 --CompoundIDCol *col number | col name* | |
478 This value is -c, --ColMode mode specific. It specifies input | |
479 *TextFile(s)* column to use for generating compound ID for | |
480 similarity matrices in output *TextFile(s)*. Possible values: *col | |
481 number or col label*. Default value: *first column containing the | |
482 word compoundID in its column label or sequentially generated IDs*. | |
483 | |
484 --CompoundIDPrefix *text* | |
485 Specify compound ID prefix to use during sequential generation of | |
486 compound IDs for input *SDFile(s)* and *TextFile(s)*. Default value: | |
487 *Cmpd*. The default value generates compound IDs which look like | |
488 Cmpd<Number>. | |
489 | |
490 For input *SDFile(s)*, this value is only used during *LabelPrefix | | |
491 MolNameOrLabelPrefix* values of --CompoundIDMode option; otherwise, | |
492 it's ignored. | |
493 | |
494 Examples for *LabelPrefix* or *MolNameOrLabelPrefix* value of | |
495 --CompoundIDMode: | |
496 | |
497 Compound | |
498 | |
499 The values specified above generates compound IDs which correspond | |
500 to Compound<Number> instead of default value of Cmpd<Number>. | |
501 | |
502 --CompoundIDField *DataFieldName* | |
503 Specify input *SDFile(s)* datafield label for generating compound | |
504 IDs. This value is only used during *DataField* value of | |
505 --CompoundIDMode option. | |
506 | |
507 Examples for *DataField* value of --CompoundIDMode: | |
508 | |
509 MolID | |
510 ExtReg | |
511 | |
512 --CompoundIDMode *DataField | MolName | LabelPrefix | | |
513 MolNameOrLabelPrefix* | |
514 Specify how to generate compound IDs from input *SDFile(s)* for | |
515 similarity matrix CSV/TSV text file(s): use a *SDFile(s)* datafield | |
516 value; use molname line from *SDFile(s)*; generate a sequential ID | |
517 with specific prefix; use combination of both MolName and | |
518 LabelPrefix with usage of LabelPrefix values for empty molname | |
519 lines. | |
520 | |
521 Possible values: *DataField | MolName | LabelPrefix | | |
522 MolNameOrLabelPrefix*. Default: *LabelPrefix*. | |
523 | |
524 For *MolNameAndLabelPrefix* value of --CompoundIDMode, molname line | |
525 in *SDFile(s)* takes precedence over sequential compound IDs | |
526 generated using *LabelPrefix* and only empty molname values are | |
527 replaced with sequential compound IDs. | |
528 | |
529 -d, --detail *InfoLevel* | |
530 Level of information to print about lines being ignored. Default: | |
531 *1*. Possible values: *1, 2 or 3*. | |
532 | |
533 -f, --fast | |
534 In this mode, fingerprints columns specified using --FingerprintsCol | |
535 for *TextFile(s)* and --FingerprintsField for *SDFile(s)* are | |
536 assumed to contain valid fingerprints data and no checking is | |
537 performed before calculating similarity matrices. By default, | |
538 fingerprints data is validated before computing pairwise similarity | |
539 and distance coefficients. | |
540 | |
541 --FingerprintsCol *col number | col name* | |
542 This value is -c, --colmode specific. It specifies fingerprints | |
543 column to use during calculation similarity matrices for | |
544 *TextFile(s)*. Possible values: *col number or col label*. Default | |
545 value: *first column containing the word Fingerprints in its column | |
546 label*. | |
547 | |
548 --FingerprintsField *FieldLabel* | |
549 Fingerprints field label to use during calculation similarity | |
550 matrices for *SDFile(s)*. Default value: *first data field label | |
551 containing the word Fingerprints in its label* | |
552 | |
553 -h, --help | |
554 Print this help message. | |
555 | |
556 --InDelim *comma | semicolon* | |
557 Input delimiter for CSV *TextFile(s)*. Possible values: *comma or | |
558 semicolon*. Default value: *comma*. For TSV files, this option is | |
559 ignored and *tab* is used as a delimiter. | |
560 | |
561 --InputDataMode *LoadInMemory | ScanFile* | |
562 Specify how fingerprints bit-vector or vector strings data from *SD, | |
563 FP and CSV/TSV* fingerprint file(s) is processed: Retrieve, process | |
564 and load all available fingerprints data in memory; Retrieve and | |
565 process data for fingerprints one at a time. Possible values : | |
566 *LoadInMemory | ScanFile*. Default: *LoadInMemory*. | |
567 | |
568 During *LoadInMemory* value of --InputDataMode, fingerprints | |
569 bit-vector or vector strings data from input file is retrieved, | |
570 processed, and loaded into memory all at once as fingerprints | |
571 objects for generation for similarity matrices. | |
572 | |
573 During *ScanFile* value of --InputDataMode, multiple passes over the | |
574 input fingerprints file are performed to retrieve and process | |
575 fingerprints bit-vector or vector strings data one at a time to | |
576 generate fingerprints objects used during generation of similarity | |
577 matrices. A temporary copy of the input fingerprints file is made at | |
578 the start and deleted after generating the matrices. | |
579 | |
580 *ScanFile* value of --InputDataMode allows processing of arbitrary | |
581 large fingerprints files without any additional memory requirement. | |
582 | |
583 -m, --mode *AutoDetect | FingerprintsBitVectorString | | |
584 FingerprintsVectorString* | |
585 Format of fingerprint strings data in *TextFile(s)*: automatically | |
586 detect format of fingerprints string created by MayaChemTools | |
587 fingerprints generation scripts or explicitly specify its format. | |
588 Possible values: *AutoDetect | FingerprintsBitVectorString | | |
589 FingerprintsVectorString*. Default value: *AutoDetect*. | |
590 | |
591 --OutDelim *comma | tab | semicolon* | |
592 Delimiter for output CSV/TSV text file(s). Possible values: *comma, | |
593 tab, or semicolon* Default value: *comma*. | |
594 | |
595 --OutMatrixFormat *RowsAndColumns | IDPairsAndValue* | |
596 Specify how similarity or distance values calculated for | |
597 fingerprints vector and bit-vector strings are written to the output | |
598 CSV/TSV text file(s): Generate text files containing rows and | |
599 columns with their labels corresponding to compound IDs and each | |
600 matrix element value corresponding to similarity or distance between | |
601 corresponding compounds; Generate text files containing rows | |
602 containing compoundIDs for two compounds followed by similarity or | |
603 distance value between these compounds. | |
604 | |
605 Possible values: *RowsAndColumns, or IDPairsAndValue*. Default | |
606 value: *RowsAndColumns*. | |
607 | |
608 The value of --OutMatrixFormat in conjunction with --OutMatrixType | |
609 determines type of data written to output files and allows | |
610 generation of up to 6 different output data formats: | |
611 | |
612 OutMatrixFormat OutMatrixType | |
613 | |
614 RowsAndColumns FullMatrix [ DEFAULT ] | |
615 RowsAndColumns UpperTriangularMatrix | |
616 RowsAndColumns LowerTriangularMatrix | |
617 | |
618 IDPairsAndValue FullMatrix | |
619 IDPairsAndValue UpperTriangularMatrix | |
620 IDPairsAndValue LowerTriangularMatrix | |
621 | |
622 Example of data in output file for *RowsAndColumns* | |
623 --OutMatrixFormat value for *FullMatrix* valueof --OutMatrixType: | |
624 | |
625 "","Cmpd1","Cmpd2","Cmpd3","Cmpd4","Cmpd5","Cmpd6",... ... | |
626 "Cmpd1","1","0.04","0.25","0.13","0.11","0.2",... ... | |
627 "Cmpd2","0.04","1","0.06","0.05","0.19","0.07",... ... | |
628 "Cmpd3","0.25","0.06","1","0.12","0.22","0.25",... ... | |
629 "Cmpd4","0.13","0.05","0.12","1","0.11","0.13",... ... | |
630 "Cmpd5","0.11","0.19","0.22","0.11","1","0.17",... ... | |
631 "Cmpd6","0.2","0.07","0.25","0.13","0.17","1",... ... | |
632 ... ... .. | |
633 ... ... .. | |
634 ... ... .. | |
635 | |
636 Example of data in output file for *RowsAndColumns* | |
637 --OutMatrixFormat value for *UpperTriangularMatrix* value of | |
638 --OutMatrixType: | |
639 | |
640 "","Cmpd1","Cmpd2","Cmpd3","Cmpd4","Cmpd5","Cmpd6",... ... | |
641 "Cmpd1","1","0.04","0.25","0.13","0.11","0.2",... ... | |
642 "Cmpd2","1","0.06","0.05","0.19","0.07",... ... | |
643 "Cmpd3","1","0.12","0.22","0.25",... ... | |
644 "Cmpd4","1","0.11","0.13",... ... | |
645 "Cmpd5","1","0.17",... ... | |
646 "Cmpd6","1",... ... | |
647 ... ... .. | |
648 ... ... .. | |
649 ... ... .. | |
650 | |
651 Example of data in output file for *RowsAndColumns* | |
652 --OutMatrixFormat value for *LowerTriangularMatrix* value of | |
653 --OutMatrixType: | |
654 | |
655 "","Cmpd1","Cmpd2","Cmpd3","Cmpd4","Cmpd5","Cmpd6",... ... | |
656 "Cmpd1","1" | |
657 "Cmpd2","0.04","1" | |
658 "Cmpd3","0.25","0.06","1" | |
659 "Cmpd4","0.13","0.05","0.12","1" | |
660 "Cmpd5","0.11","0.19","0.22","0.11","1" | |
661 "Cmpd6","0.2","0.07","0.25","0.13","0.17","1" | |
662 ... ... .. | |
663 ... ... .. | |
664 ... ... .. | |
665 | |
666 Example of data in output file for *IDPairsAndValue* | |
667 --OutMatrixFormat value for <FullMatrix> value of OutMatrixType: | |
668 | |
669 "CmpdID1","CmpdID2","Coefficient Value" | |
670 "Cmpd1","Cmpd1","1" | |
671 "Cmpd1","Cmpd2","0.04" | |
672 "Cmpd1","Cmpd3","0.25" | |
673 "Cmpd1","Cmpd4","0.13" | |
674 ... ... ... | |
675 ... ... ... | |
676 ... ... ... | |
677 "Cmpd2","Cmpd1","0.04" | |
678 "Cmpd2","Cmpd2","1" | |
679 "Cmpd2","Cmpd3","0.06" | |
680 "Cmpd2","Cmpd4","0.05" | |
681 ... ... ... | |
682 ... ... ... | |
683 ... ... ... | |
684 "Cmpd3","Cmpd1","0.25" | |
685 "Cmpd3","Cmpd2","0.06" | |
686 "Cmpd3","Cmpd3","1" | |
687 "Cmpd3","Cmpd4","0.12" | |
688 ... ... ... | |
689 ... ... ... | |
690 ... ... ... | |
691 | |
692 Example of data in output file for *IDPairsAndValue* | |
693 --OutMatrixFormat value for <UpperTriangularMatrix> value of | |
694 --OutMatrixType: | |
695 | |
696 "CmpdID1","CmpdID2","Coefficient Value" | |
697 "Cmpd1","Cmpd1","1" | |
698 "Cmpd1","Cmpd2","0.04" | |
699 "Cmpd1","Cmpd3","0.25" | |
700 "Cmpd1","Cmpd4","0.13" | |
701 ... ... ... | |
702 ... ... ... | |
703 ... ... ... | |
704 "Cmpd2","Cmpd2","1" | |
705 "Cmpd2","Cmpd3","0.06" | |
706 "Cmpd2","Cmpd4","0.05" | |
707 ... ... ... | |
708 ... ... ... | |
709 ... ... ... | |
710 "Cmpd3","Cmpd3","1" | |
711 "Cmpd3","Cmpd4","0.12" | |
712 ... ... ... | |
713 ... ... ... | |
714 ... ... ... | |
715 | |
716 Example of data in output file for *IDPairsAndValue* | |
717 --OutMatrixFormat value for <LowerTriangularMatrix> value of | |
718 --OutMatrixType: | |
719 | |
720 "CmpdID1","CmpdID2","Coefficient Value" | |
721 "Cmpd1","Cmpd1","1" | |
722 "Cmpd2","Cmpd1","0.04" | |
723 "Cmpd2","Cmpd2","1" | |
724 "Cmpd3","Cmpd1","0.25" | |
725 "Cmpd3","Cmpd2","0.06" | |
726 "Cmpd3","Cmpd3","1" | |
727 "Cmpd4","Cmpd1","0.13" | |
728 "Cmpd4","Cmpd2","0.05" | |
729 "Cmpd4","Cmpd3","0.12" | |
730 "Cmpd4","Cmpd4","1" | |
731 ... ... ... | |
732 ... ... ... | |
733 ... ... ... | |
734 | |
735 --OutMatrixType *FullMatrix | UpperTriangularMatrix | | |
736 LowerTriangularMatrix* | |
737 Type of similarity or distance matrix to calculate for fingerprints | |
738 vector and bit-vector strings: Calculate full matrix; Calculate | |
739 lower triangular matrix including diagonal; Calculate upper | |
740 triangular matrix including diagonal. | |
741 | |
742 Possible values: *FullMatrix, UpperTriangularMatrix, or | |
743 LowerTriangularMatrix*. Default value: *FullMatrix*. | |
744 | |
745 The value of --OutMatrixType in conjunction with --OutMatrixFormat | |
746 determines type of data written to output files. | |
747 | |
748 -o, --overwrite | |
749 Overwrite existing files | |
750 | |
751 -p, --precision *number* | |
752 Precision of calculated values in the output file. Default: up to | |
753 *2* decimal places. Valid values: positive integers. | |
754 | |
755 -q, --quote *Yes | No* | |
756 Put quote around column values in output CSV/TSV text file(s). | |
757 Possible values: *Yes or No*. Default value: *Yes*. | |
758 | |
759 -r, --root *RootName* | |
760 New file name is generated using the root: | |
761 <Root><BitVectorComparisonMode>.<Ext> or | |
762 <Root><VectorComparisonMode><VectorComparisonFormulism>.<Ext>. The | |
763 csv, and tsv <Ext> values are used for comma/semicolon, and tab | |
764 delimited text files respectively. This option is ignored for | |
765 multiple input files. | |
766 | |
767 -v, --VectorComparisonMode *All | | |
768 "TanimotoSimilarity,[ManhattanDistance,...]"* | |
769 Specify what similarity or distance coefficients to use for | |
770 calculating similarity matrices for fingerprint vector strings data | |
771 values in *TextFile(s)*: calculate similarity matrices for all | |
772 supported similarity and distance coefficients or specify a comma | |
773 delimited list of similarity and distance coefficients. Possible | |
774 values: *All | "TanimotoSimilairy,[ManhattanDistance,..]"*. Default: | |
775 *TanimotoSimilarity*. | |
776 | |
777 The value of -v, --VectorComparisonMode, in conjunction with | |
778 --VectorComparisonFormulism, decides which type of similarity and | |
779 distance coefficient formulism gets used. | |
780 | |
781 *All* uses complete list of supported similarity and distance | |
782 coefficients: *CosineSimilarity, CzekanowskiSimilarity, | |
783 DiceSimilarity, OchiaiSimilarity, JaccardSimilarity, | |
784 SorensonSimilarity, TanimotoSimilarity, CityBlockDistance, | |
785 EuclideanDistance, HammingDistance, ManhattanDistance, | |
786 SoergelDistance*. These similarity and distance coefficients are | |
787 described below. | |
788 | |
789 FingerprintsVector.pm module, used to calculate similarity and | |
790 distance coefficients, provides support to perform comparison | |
791 between vectors containing three different types of values: | |
792 | |
793 Type I: OrderedNumericalValues | |
794 | |
795 . Size of two vectors are same | |
796 . Vectors contain real values in a specific order. For example: MACCS keys | |
797 count, Topological pharmnacophore atom pairs and so on. | |
798 | |
799 Type II: UnorderedNumericalValues | |
800 | |
801 . Size of two vectors might not be same | |
802 . Vectors contain unordered real value identified by value IDs. For example: | |
803 Toplogical atom pairs, Topological atom torsions and so on | |
804 | |
805 Type III: AlphaNumericalValues | |
806 | |
807 . Size of two vectors might not be same | |
808 . Vectors contain unordered alphanumerical values. For example: Extended | |
809 connectivity fingerprints, atom neighborhood fingerprints. | |
810 | |
811 Before performing similarity or distance calculations between | |
812 vectors containing UnorderedNumericalValues or AlphaNumericalValues, | |
813 the vectors are transformed into vectors containing unique | |
814 OrderedNumericalValues using value IDs for UnorderedNumericalValues | |
815 and values itself for AlphaNumericalValues. | |
816 | |
817 Three forms of similarity and distance calculation between two | |
818 vectors, specified using --VectorComparisonFormulism option, are | |
819 supported: *AlgebraicForm, BinaryForm or SetTheoreticForm*. | |
820 | |
821 For *BinaryForm*, the ordered list of processed final vector values | |
822 containing the value or count of each unique value type is simply | |
823 converted into a binary vector containing 1s and 0s corresponding to | |
824 presence or absence of values before calculating similarity or | |
825 distance between two vectors. | |
826 | |
827 For two fingerprint vectors A and B of same size containing | |
828 OrderedNumericalValues, let: | |
829 | |
830 N = Number values in A or B | |
831 | |
832 Xa = Values of vector A | |
833 Xb = Values of vector B | |
834 | |
835 Xai = Value of ith element in A | |
836 Xbi = Value of ith element in B | |
837 | |
838 SUM = Sum of i over N values | |
839 | |
840 For SetTheoreticForm of calculation between two vectors, let: | |
841 | |
842 SetIntersectionXaXb = SUM ( MIN ( Xai, Xbi ) ) | |
843 SetDifferenceXaXb = SUM ( Xai ) + SUM ( Xbi ) - SUM ( MIN ( Xai, Xbi ) ) | |
844 | |
845 For BinaryForm of calculation between two vectors, let: | |
846 | |
847 Na = Number of bits set to "1" in A = SUM ( Xai ) | |
848 Nb = Number of bits set to "1" in B = SUM ( Xbi ) | |
849 Nc = Number of bits set to "1" in both A and B = SUM ( Xai * Xbi ) | |
850 Nd = Number of bits set to "0" in both A and B | |
851 = SUM ( 1 - Xai - Xbi + Xai * Xbi) | |
852 | |
853 N = Number of bits set to "1" or "0" in A or B = Size of A or B = Na + Nb - Nc + Nd | |
854 | |
855 Additionally, for BinaryForm various values also correspond to: | |
856 | |
857 Na = | Xa | | |
858 Nb = | Xb | | |
859 Nc = | SetIntersectionXaXb | | |
860 Nd = N - | SetDifferenceXaXb | | |
861 | |
862 | SetDifferenceXaXb | = N - Nd = Na + Nb - Nc + Nd - Nd = Na + Nb - Nc | |
863 = | Xa | + | Xb | - | SetIntersectionXaXb | | |
864 | |
865 Various similarity and distance coefficients [ Ref 40, Ref 62, Ref | |
866 64 ] for a pair of vectors A and B in *AlgebraicForm, BinaryForm and | |
867 SetTheoreticForm* are defined as follows: | |
868 | |
869 CityBlockDistance: ( same as HammingDistance and ManhattanDistance) | |
870 | |
871 *AlgebraicForm*: SUM ( ABS ( Xai - Xbi ) ) | |
872 | |
873 *BinaryForm*: ( Na - Nc ) + ( Nb - Nc ) = Na + Nb - 2 * Nc | |
874 | |
875 *SetTheoreticForm*: | SetDifferenceXaXb | - | SetIntersectionXaXb | | |
876 = SUM ( Xai ) + SUM ( Xbi ) - 2 * ( SUM ( MIN ( Xai, Xbi ) ) ) | |
877 | |
878 CosineSimilarity: ( same as OchiaiSimilarityCoefficient) | |
879 | |
880 *AlgebraicForm*: SUM ( Xai * Xbi ) / SQRT ( SUM ( Xai ** 2) * SUM ( | |
881 Xbi ** 2) ) | |
882 | |
883 *BinaryForm*: Nc / SQRT ( Na * Nb) | |
884 | |
885 *SetTheoreticForm*: | SetIntersectionXaXb | / SQRT ( |Xa| * |Xb| ) = | |
886 SUM ( MIN ( Xai, Xbi ) ) / SQRT ( SUM ( Xai ) * SUM ( Xbi ) ) | |
887 | |
888 CzekanowskiSimilarity: ( same as DiceSimilarity and | |
889 SorensonSimilarity) | |
890 | |
891 *AlgebraicForm*: ( 2 * ( SUM ( Xai * Xbi ) ) ) / ( SUM ( Xai ** 2) + | |
892 SUM ( Xbi **2 ) ) | |
893 | |
894 *BinaryForm*: 2 * Nc / ( Na + Nb ) | |
895 | |
896 *SetTheoreticForm*: 2 * | SetIntersectionXaXb | / ( |Xa| + |Xb| ) = | |
897 2 * ( SUM ( MIN ( Xai, Xbi ) ) ) / ( SUM ( Xai ) + SUM ( Xbi ) ) | |
898 | |
899 DiceSimilarity: ( same as CzekanowskiSimilarity and | |
900 SorensonSimilarity) | |
901 | |
902 *AlgebraicForm*: ( 2 * ( SUM ( Xai * Xbi ) ) ) / ( SUM ( Xai ** 2) + | |
903 SUM ( Xbi **2 ) ) | |
904 | |
905 *BinaryForm*: 2 * Nc / ( Na + Nb ) | |
906 | |
907 *SetTheoreticForm*: 2 * | SetIntersectionXaXb | / ( |Xa| + |Xb| ) = | |
908 2 * ( SUM ( MIN ( Xai, Xbi ) ) ) / ( SUM ( Xai ) + SUM ( Xbi ) ) | |
909 | |
910 EuclideanDistance: | |
911 | |
912 *AlgebraicForm*: SQRT ( SUM ( ( ( Xai - Xbi ) ** 2 ) ) ) | |
913 | |
914 *BinaryForm*: SQRT ( ( Na - Nc ) + ( Nb - Nc ) ) = SQRT ( Na + Nb - | |
915 2 * Nc ) | |
916 | |
917 *SetTheoreticForm*: SQRT ( | SetDifferenceXaXb | - | | |
918 SetIntersectionXaXb | ) = SQRT ( SUM ( Xai ) + SUM ( Xbi ) - 2 * ( | |
919 SUM ( MIN ( Xai, Xbi ) ) ) ) | |
920 | |
921 HammingDistance: ( same as CityBlockDistance and ManhattanDistance) | |
922 | |
923 *AlgebraicForm*: SUM ( ABS ( Xai - Xbi ) ) | |
924 | |
925 *BinaryForm*: ( Na - Nc ) + ( Nb - Nc ) = Na + Nb - 2 * Nc | |
926 | |
927 *SetTheoreticForm*: | SetDifferenceXaXb | - | SetIntersectionXaXb | | |
928 = SUM ( Xai ) + SUM ( Xbi ) - 2 * ( SUM ( MIN ( Xai, Xbi ) ) ) | |
929 | |
930 JaccardSimilarity: ( same as TanimotoSimilarity) | |
931 | |
932 *AlgebraicForm*: SUM ( Xai * Xbi ) / ( SUM ( Xai ** 2 ) + SUM ( Xbi | |
933 ** 2 ) - SUM ( Xai * Xbi ) ) | |
934 | |
935 *BinaryForm*: Nc / ( ( Na - Nc ) + ( Nb - Nc ) + Nc ) = Nc / ( Na + | |
936 Nb - Nc ) | |
937 | |
938 *SetTheoreticForm*: | SetIntersectionXaXb | / | SetDifferenceXaXb | | |
939 = SUM ( MIN ( Xai, Xbi ) ) / ( SUM ( Xai ) + SUM ( Xbi ) - SUM ( MIN | |
940 ( Xai, Xbi ) ) ) | |
941 | |
942 ManhattanDistance: ( same as CityBlockDistance and HammingDistance) | |
943 | |
944 *AlgebraicForm*: SUM ( ABS ( Xai - Xbi ) ) | |
945 | |
946 *BinaryForm*: ( Na - Nc ) + ( Nb - Nc ) = Na + Nb - 2 * Nc | |
947 | |
948 *SetTheoreticForm*: | SetDifferenceXaXb | - | SetIntersectionXaXb | | |
949 = SUM ( Xai ) + SUM ( Xbi ) - 2 * ( SUM ( MIN ( Xai, Xbi ) ) ) | |
950 | |
951 OchiaiSimilarity: ( same as CosineSimilarity) | |
952 | |
953 *AlgebraicForm*: SUM ( Xai * Xbi ) / SQRT ( SUM ( Xai ** 2) * SUM ( | |
954 Xbi ** 2) ) | |
955 | |
956 *BinaryForm*: Nc / SQRT ( Na * Nb) | |
957 | |
958 *SetTheoreticForm*: | SetIntersectionXaXb | / SQRT ( |Xa| * |Xb| ) = | |
959 SUM ( MIN ( Xai, Xbi ) ) / SQRT ( SUM ( Xai ) * SUM ( Xbi ) ) | |
960 | |
961 SorensonSimilarity: ( same as CzekanowskiSimilarity and | |
962 DiceSimilarity) | |
963 | |
964 *AlgebraicForm*: ( 2 * ( SUM ( Xai * Xbi ) ) ) / ( SUM ( Xai ** 2) + | |
965 SUM ( Xbi **2 ) ) | |
966 | |
967 *BinaryForm*: 2 * Nc / ( Na + Nb ) | |
968 | |
969 *SetTheoreticForm*: 2 * | SetIntersectionXaXb | / ( |Xa| + |Xb| ) = | |
970 2 * ( SUM ( MIN ( Xai, Xbi ) ) ) / ( SUM ( Xai ) + SUM ( Xbi ) ) | |
971 | |
972 SoergelDistance: | |
973 | |
974 *AlgebraicForm*: SUM ( ABS ( Xai - Xbi ) ) / SUM ( MAX ( Xai, Xbi ) | |
975 ) | |
976 | |
977 *BinaryForm*: 1 - Nc / ( Na + Nb - Nc ) = ( Na + Nb - 2 * Nc ) / ( | |
978 Na + Nb - Nc ) | |
979 | |
980 *SetTheoreticForm*: ( | SetDifferenceXaXb | - | SetIntersectionXaXb | |
981 | ) / | SetDifferenceXaXb | = ( SUM ( Xai ) + SUM ( Xbi ) - 2 * ( | |
982 SUM ( MIN ( Xai, Xbi ) ) ) ) / ( SUM ( Xai ) + SUM ( Xbi ) - SUM ( | |
983 MIN ( Xai, Xbi ) ) ) | |
984 | |
985 TanimotoSimilarity: ( same as JaccardSimilarity) | |
986 | |
987 *AlgebraicForm*: SUM ( Xai * Xbi ) / ( SUM ( Xai ** 2 ) + SUM ( Xbi | |
988 ** 2 ) - SUM ( Xai * Xbi ) ) | |
989 | |
990 *BinaryForm*: Nc / ( ( Na - Nc ) + ( Nb - Nc ) + Nc ) = Nc / ( Na + | |
991 Nb - Nc ) | |
992 | |
993 *SetTheoreticForm*: | SetIntersectionXaXb | / | SetDifferenceXaXb | | |
994 = SUM ( MIN ( Xai, Xbi ) ) / ( SUM ( Xai ) + SUM ( Xbi ) - SUM ( MIN | |
995 ( Xai, Xbi ) ) ) | |
996 | |
997 --VectorComparisonFormulism *All | | |
998 "AlgebraicForm,[BinaryForm,SetTheoreticForm]"* | |
999 Specify fingerprints vector comparison formulism to use for | |
1000 calculation similarity and distance coefficients during -v, | |
1001 --VectorComparisonMode: use all supported comparison formulisms or | |
1002 specify a comma delimited. Possible values: *All | | |
1003 "AlgebraicForm,[BinaryForm,SetTheoreticForm]"*. Default value: | |
1004 *AlgebraicForm*. | |
1005 | |
1006 *All* uses all three forms of supported vector comparison formulism | |
1007 for values of -v, --VectorComparisonMode option. | |
1008 | |
1009 For fingerprint vector strings containing AlphaNumericalValues data | |
1010 values - ExtendedConnectivityFingerprints, | |
1011 AtomNeighborhoodsFingerprints and so on - all three formulism result | |
1012 in same value during similarity and distance calculations. | |
1013 | |
1014 -w, --WorkingDir *DirName* | |
1015 Location of working directory. Default: current directory. | |
1016 | |
1017 EXAMPLES | |
1018 To generate a similarity matrix corresponding to Tanimoto similarity | |
1019 coefficient for fingerprints bit-vector strings data corresponding to | |
1020 supported fingerprints in text file present in a column name containing | |
1021 Fingerprint substring by loading all fingerprints data into memory and | |
1022 create a SampleFPHexTanimotoSimilarity.csv file containing compound IDs | |
1023 retrieved from column name containing CompoundID substring, type: | |
1024 | |
1025 % SimilarityMatricesFingerprints.pl -o SampleFPHex.csv | |
1026 | |
1027 To generate a similarity matrix corresponding to Tanimoto similarity | |
1028 coefficient for fingerprints bit-vector strings data corresponding to | |
1029 supported fingerprints in SD File present in a data field with | |
1030 Fingerprint substring in its label by loading all fingerprints data into | |
1031 memory and create a SampleFPHexTanimotoSimilarity.csv file containing | |
1032 sequentially generated compound IDs with Cmpd prefix, type: | |
1033 | |
1034 % SimilarityMatricesFingerprints.pl -o SampleFPHex.sdf | |
1035 | |
1036 To generate a similarity matrix corresponding to Tanimoto similarity | |
1037 coefficient for fingerprints bit-vector strings data corresponding to | |
1038 supported fingerprints in FP file by loading all fingerprints data into | |
1039 memory and create a SampleFPHexTanimotoSimilarity.csv file along with | |
1040 compound IDs retrieved from FP file, type: | |
1041 | |
1042 % SimilarityMatricesFingerprints.pl -o SampleFPHex.fpf | |
1043 | |
1044 To generate a lower triangular similarity matrix corresponding to | |
1045 Tanimoto similarity coefficient for fingerprints bit-vector strings data | |
1046 corresponding to supported fingerprints in text file present in a column | |
1047 name containing Fingerprint substring by loading all fingerprints data | |
1048 into memory and create a SampleFPHexTanimotoSimilarity.csv file | |
1049 containing compound IDs retrieved from column name containing CompoundID | |
1050 substring, type: | |
1051 | |
1052 % SimilarityMatricesFingerprints.pl -o --InputDataMode LoadInMemory | |
1053 --OutMatrixFormat RowsAndColumns --OutMatrixType LowerTriangularMatrix | |
1054 SampleFPHex.csv | |
1055 | |
1056 To generate a upper triangular similarity matrix corresponding to | |
1057 Tanimoto similarity coefficient for fingerprints bit-vector strings data | |
1058 corresponding to supported fingerprints in text file present in a column | |
1059 name containing Fingerprint substring by loading all fingerprints data | |
1060 into memory and create a SampleFPHexTanimotoSimilarity.csv file in | |
1061 IDPairsAndValue format containing compound IDs retrieved from column | |
1062 name containing CompoundID substring, type: | |
1063 | |
1064 % SimilarityMatricesFingerprints.pl -o --InputDataMode LoadInMemory | |
1065 --OutMatrixFormat IDPairsAndValue --OutMatrixType UpperTriangularMatrix | |
1066 SampleFPHex.csv | |
1067 | |
1068 To generate a full similarity matrix corresponding to Tanimoto | |
1069 similarity coefficient for fingerprints bit-vector strings data | |
1070 corresponding to supported fingerprints in text file present in a column | |
1071 name containing Fingerprint substring by scanning file without loading | |
1072 all fingerprints data into memory and create a | |
1073 SampleFPHexTanimotoSimilarity.csv file containing compound IDs retrieved | |
1074 from column name containing CompoundID substring, type: | |
1075 | |
1076 % SimilarityMatricesFingerprints.pl -o --InputDataMode ScanFile | |
1077 --OutMatrixFormat RowsAndColumns --OutMatrixType FullMatrix | |
1078 SampleFPHex.csv | |
1079 | |
1080 To generate a lower triangular similarity matrix corresponding to | |
1081 Tanimoto similarity coefficient for fingerprints bit-vector strings data | |
1082 corresponding to supported fingerprints in text file present in a column | |
1083 name containing Fingerprint substring by scanning file without loading | |
1084 all fingerprints data into memory and create a | |
1085 SampleFPHexTanimotoSimilarity.csv file in IDPairsAndValue format | |
1086 containing compound IDs retrieved from column name containing CompoundID | |
1087 substring, type: | |
1088 | |
1089 % SimilarityMatricesFingerprints.pl -o --InputDataMode ScanFile | |
1090 --OutMatrixFormat IDPairsAndValue --OutMatrixType LowerTriangularMatrix | |
1091 SampleFPHex.csv | |
1092 | |
1093 To generate a similarity matrix corresponding to Tanimoto similarity | |
1094 coefficient using algebraic formulism for fingerprints vector strings | |
1095 data corresponding to supported fingerprints in text file present in a | |
1096 column name containing Fingerprint substring and create a | |
1097 SampleFPCountTanimotoSimilarityAlgebraicForm.csv file containing | |
1098 compound IDs retrieved from column name containing CompoundID substring, | |
1099 type: | |
1100 | |
1101 % SimilarityMatricesFingerprints.pl -o SampleFPCount.csv | |
1102 | |
1103 To generate a similarity matrix corresponding to Tanimoto similarity | |
1104 coefficient using algebraic formulism for fingerprints vector strings | |
1105 data corresponding to supported fingerprints in SD file present in a | |
1106 data field with Fingerprint substring in its label and create a | |
1107 SampleFPCountTanimotoSimilarityAlgebraicForm.csv file containing | |
1108 sequentially generated compound IDs with Cmpd prefix, type: | |
1109 | |
1110 % SimilarityMatricesFingerprints.pl -o SampleFPCount.sdf | |
1111 | |
1112 To generate a similarity matrix corresponding to Tanimoto similarity | |
1113 coefficient using algebraic formulism vector strings data corresponding | |
1114 to supported fingerprints in FP file and create a | |
1115 SampleFPCountTanimotoSimilarityAlgebraicForm.csv file along with | |
1116 compound IDs retrieved from FP file, type: | |
1117 | |
1118 % SimilarityMatricesFingerprints.pl -o SampleFPCount.fpf | |
1119 | |
1120 To generate a similarity matrix corresponding to Tanimoto similarity | |
1121 coefficient for fingerprints bit-vector strings data corresponding to | |
1122 supported fingerprints in text file present in a column name containing | |
1123 Fingerprint substring and create a SampleFPHexTanimotoSimilarity.csv | |
1124 file in IDPairsAndValue format containing compound IDs retrieved from | |
1125 column name containing CompoundID substring, type: | |
1126 | |
1127 % SimilarityMatricesFingerprints.pl --OutMatrixFormat IDPairsAndValue -o | |
1128 SampleFPHex.csv | |
1129 | |
1130 To generate a similarity matrix corresponding to Tanimoto similarity | |
1131 coefficient for fingerprints bit-vector strings data corresponding to | |
1132 supported fingerprints in SD file present in a data field with | |
1133 Fingerprint substring in its label and create a | |
1134 SampleFPHexTanimotoSimilarity.csv file in IDPairsAndValue format | |
1135 containing sequentially generated compound IDs with Cmpd prefix, type: | |
1136 | |
1137 % SimilarityMatricesFingerprints.pl --OutMatrixFormat IDPairsAndValue -o | |
1138 SampleFPHex.sdf | |
1139 | |
1140 To generate a similarity matrix corresponding to Tanimoto similarity | |
1141 coefficient for fingerprints bit-vector strings data corresponding to | |
1142 supported fingerprints in FP file and create a | |
1143 SampleFPHexTanimotoSimilarity.csv file in IDPairsAndValue format along | |
1144 with compound IDs retrieved from FP file, type: | |
1145 | |
1146 % SimilarityMatricesFingerprints.pl --OutMatrixFormat IDPairsAndValue -o | |
1147 SampleFPHex.fpf | |
1148 | |
1149 To generate a similarity matrix corresponding to Tanimoto similarity | |
1150 coefficient for fingerprints bit-vector strings data corresponding to | |
1151 supported fingerprints in SD file present in a data field with | |
1152 Fingerprint substring in its label and create a | |
1153 SampleFPHexTanimotoSimilarity.csv file containing compound IDs from mol | |
1154 name line, type: | |
1155 | |
1156 % SimilarityMatricesFingerprints.pl --CompoundIDMode MolName -o | |
1157 SampleFPHex.sdf | |
1158 | |
1159 To generate a similarity matrix corresponding to Tanimoto similarity | |
1160 coefficient for fingerprints bit-vector strings data corresponding to | |
1161 supported fingerprints present in a data field with Fingerprint | |
1162 substring in its label and create a SampleFPHexTanimotoSimilarity.csv | |
1163 file containing compound IDs from data field name Mol_ID, type: | |
1164 | |
1165 % SimilarityMatricesFingerprints.pl --CompoundIDMode DataField | |
1166 --CompoundIDField Mol_ID -o SampleFPBin.sdf | |
1167 | |
1168 To generate similarity matrices corresponding to Buser, Dice and | |
1169 Tanimoto similarity coefficient for fingerprints bit-vector strings data | |
1170 corresponding to supported fingerprints present in a column name | |
1171 containing Fingerprint substring and create | |
1172 SampleFPBin[CoefficientName]Similarity.csv files containing compound IDs | |
1173 retrieved from column name containing CompoundID substring, type: | |
1174 | |
1175 % SimilarityMatricesFingerprints.pl -b "BuserSimilarity,DiceSimilarity, | |
1176 TanimotoSimilarity" -o SampleFPBin.csv | |
1177 | |
1178 To generate similarity matrices corresponding to Buser, Dice and | |
1179 Tanimoto similarity coefficient for fingerprints bit-vector strings data | |
1180 corresponding to supported fingerprints present in a data field with | |
1181 Fingerprint substring in its label and create | |
1182 SampleFPBin[CoefficientName]Similarity.csv files containing sequentially | |
1183 generated compound IDs with Cmpd prefix, type: | |
1184 | |
1185 % SimilarityMatricesFingerprints.pl -b "BuserSimilarity,DiceSimilarity, | |
1186 TanimotoSimilarity" -o SampleFPBin.sdf | |
1187 | |
1188 To generate similarity matrices corresponding to CityBlock distance and | |
1189 Tanimoto similarity coefficients using algebraic formulism for | |
1190 fingerprints vector strings data corresponding to supported fingerprints | |
1191 present in a column name containing Fingerprint substring and create | |
1192 SampleFPCount[CoefficientName]AlgebraicForm.csv files containing | |
1193 compound IDs retrieved from column name containing CompoundID substring, | |
1194 type: | |
1195 | |
1196 % SimilarityMatricesFingerprints.pl -v "CityBlockDistance, | |
1197 TanimotoSimilarity" -o SampleFPCount.csv | |
1198 | |
1199 To generate similarity matrices corresponding to CityBlock distance and | |
1200 Tanimoto similarity coefficients using algebraic formulism for | |
1201 fingerprints vector strings data corresponding to supported fingerprints | |
1202 present in a data field with Fingerprint substring in its label and | |
1203 create SampleFPCount[CoefficientName]AlgebraicForm.csv files containing | |
1204 sequentially generated compound IDs with Cmpd prefix, type: | |
1205 | |
1206 % SimilarityMatricesFingerprints.pl -v "CityBlockDistance, | |
1207 TanimotoSimilarity" -o SampleFPCount.sdf | |
1208 | |
1209 To generate similarity matrices corresponding to CityBlock distance | |
1210 Tanimoto similarity coefficients using binary formulism for fingerprints | |
1211 vector strings data corresponding to supported fingerprints present in a | |
1212 column name containing Fingerprint substring and create | |
1213 SampleFPCount[CoefficientName]Binary.csv files containing compound IDs | |
1214 retrieved from column name containing CompoundID substring, type: | |
1215 | |
1216 % SimilarityMatricesFingerprints.pl -v "CityBlockDistance, | |
1217 TanimotoSimilarity" --VectorComparisonFormulism BinaryForm -o | |
1218 SampleFPCount.csv | |
1219 | |
1220 To generate similarity matrices corresponding to CityBlock distance | |
1221 Tanimoto similarity coefficients using binary formulism for fingerprints | |
1222 vector strings data corresponding to supported fingerprints present in a | |
1223 data field with Fingerprint substring in its label and create | |
1224 SampleFPCount[CoefficientName]Binary.csv files containing sequentially | |
1225 generated compound IDs with Cmpd prefix, type: | |
1226 | |
1227 % SimilarityMatricesFingerprints.pl -v "CityBlockDistance, | |
1228 TanimotoSimilarity" --VectorComparisonFormulism BinaryForm -o | |
1229 SampleFPCount.sdf | |
1230 | |
1231 To generate similarity matrices corresponding to CityBlock distance | |
1232 Tanimoto similarity coefficients using all supported comparison | |
1233 formulisms for fingerprints vector strings data corresponding to | |
1234 supported fingerprints present in a column name containing Fingerprint | |
1235 substring and create SampleFPCount[CoefficientName][FormulismName].csv | |
1236 files containing compound IDs retrieved from column name containing | |
1237 CompoundID substring, type: | |
1238 | |
1239 % SimilarityMatricesFingerprints.pl -v "CityBlockDistance, | |
1240 TanimotoSimilarity" --VectorComparisonFormulism All -o SampleFPCount.csv | |
1241 | |
1242 To generate similarity matrices corresponding to CityBlock distance | |
1243 Tanimoto similarity coefficients using all supported comparison | |
1244 formulisms for fingerprints vector strings data corresponding to | |
1245 supported fingerprints present in a data field with Fingerprint | |
1246 substring in its label and create | |
1247 SampleFPCount[CoefficientName][FormulismName].csv files containing | |
1248 sequentially generated compound IDs with Cmpd prefix, type: | |
1249 | |
1250 % SimilarityMatricesFingerprints.pl -v "CityBlockDistance,TanimotoSimilarity" | |
1251 --VectorComparisonFormulism All -o SampleFPCount.sdf | |
1252 | |
1253 To generate similarity matrices corresponding to all available | |
1254 similarity coefficient for fingerprints bit-vector strings data | |
1255 corresponding to supported fingerprints present in a column name | |
1256 containing Fingerprint substring and create | |
1257 SampleFPHex[CoefficientName].csv files containing compound IDs retrieved | |
1258 from column name containing CompoundID substring, type: | |
1259 | |
1260 % SimilarityMatricesFingerprints.pl -m AutoDetect --BitVectorComparisonMode | |
1261 All --alpha 0.5 -beta 0.5 -o SampleFPHex.csv | |
1262 | |
1263 To generate similarity matrices corresponding to all available | |
1264 similarity coefficient for fingerprints bit-vector strings data | |
1265 corresponding to supported fingerprints present in a data field with | |
1266 Fingerprint substring in its label and create | |
1267 SampleFPHex[CoefficientName].csv files containing sequentially generated | |
1268 compound IDs with Cmpd prefix, type | |
1269 | |
1270 % SimilarityMatricesFingerprints.pl -m AutoDetect --BitVectorComparisonMode | |
1271 All --alpha 0.5 -beta 0.5 -o SampleFPHex.sdf | |
1272 | |
1273 To generate similarity matrices corresponding to all available | |
1274 similarity and distance coefficients using all comparison formulism for | |
1275 fingerprints vector strings data corresponding to supported fingerprints | |
1276 present in a column name containing Fingerprint substring and create | |
1277 SampleFPCount[CoefficientName][FormulismName].csv files containing | |
1278 compound IDs retrieved from column name containing CompoundID substring, | |
1279 type: | |
1280 | |
1281 % SimilarityMatricesFingerprints.pl -m AutoDetect --VectorComparisonMode | |
1282 All --VectorComparisonFormulism All -o SampleFPCount.csv | |
1283 | |
1284 To generate similarity matrices corresponding to all available | |
1285 similarity and distance coefficients using all comparison formulism for | |
1286 fingerprints vector strings data corresponding to supported fingerprints | |
1287 present in a data field with Fingerprint substring in its label and | |
1288 create SampleFPCount[CoefficientName][FormulismName].csv files | |
1289 containing sequentially generated compound IDs with Cmpd prefix, type: | |
1290 | |
1291 % SimilarityMatricesFingerprints.pl -m AutoDetect --VectorComparisonMode | |
1292 All --VectorComparisonFormulism All -o SampleFPCount.sdf | |
1293 | |
1294 To generate a similarity matrix corresponding to Tanimoto similarity | |
1295 coefficient for fingerprints bit-vector strings data corresponding to | |
1296 supported fingerprints present in a column number 2 and create a | |
1297 SampleFPHexTanimotoSimilarity.csv file containing compound IDs retrieved | |
1298 column number 1, type: | |
1299 | |
1300 % SimilarityMatricesFingerprints.pl --ColMode ColNum --CompoundIDCol 1 | |
1301 --FingerprintsCol 2 -o SampleFPHex.csv | |
1302 | |
1303 To generate a similarity matrix corresponding to Tanimoto similarity | |
1304 coefficient for fingerprints bit-vector strings data corresponding to | |
1305 supported fingerprints present in a data field name Fingerprints and | |
1306 create a SampleFPHexTanimotoSimilarity.csv file containing compound IDs | |
1307 present in data field name Mol_ID, type: | |
1308 | |
1309 % SimilarityMatricesFingerprints.pl --FingerprintsField Fingerprints | |
1310 --CompoundIDMode DataField --CompoundIDField Mol_ID -o SampleFPHex.sdf | |
1311 | |
1312 To generate a similarity matrix corresponding to Tversky similarity | |
1313 coefficient for fingerprints bit-vector strings data corresponding to | |
1314 supported fingerprints present in a column named Fingerprints and create | |
1315 a SampleFPHexTverskySimilarity.tsv file containing compound IDs | |
1316 retrieved column named CompoundID, type: | |
1317 | |
1318 % SimilarityMatricesFingerprints.pl --BitVectorComparisonMode | |
1319 TverskySimilarity --alpha 0.5 --ColMode ColLabel --CompoundIDCol | |
1320 CompoundID --FingerprintsCol Fingerprints --OutDelim Tab --quote No | |
1321 -o SampleFPHex.csv | |
1322 | |
1323 To generate a similarity matrix corresponding to Tanimoto similarity | |
1324 coefficient for fingerprints bit-vector strings data corresponding to | |
1325 supported fingerprints present in a data field with Fingerprint | |
1326 substring in its label and create a SampleFPHexTanimotoSimilarity.csv | |
1327 file containing compound IDs from molname line or sequentially generated | |
1328 compound IDs with Mol prefix, type: | |
1329 | |
1330 % SimilarityMatricesFingerprints.pl --CompoundIDMode MolnameOrLabelPrefix | |
1331 --CompoundIDPrefix Mol -o SampleFPHex.sdf | |
1332 | |
1333 To generate a similarity matrix corresponding to Tanimoto similarity | |
1334 coefficient for fingerprints bit-vector strings data corresponding to | |
1335 supported fingerprints present in a data field with Fingerprint | |
1336 substring in its label and create a SampleFPHexTanimotoSimilarity.tsv | |
1337 file containing sequentially generated compound IDs with Cmpd prefix, | |
1338 type: | |
1339 | |
1340 % SimilarityMatricesFingerprints.pl -OutDelim Tab --quote No -o SampleFPHex.sdf | |
1341 | |
1342 AUTHOR | |
1343 Manish Sud <msud@san.rr.com> | |
1344 | |
1345 SEE ALSO | |
1346 InfoFingerprintsFiles.pl, SimilaritySearchingFingerprints.pl, | |
1347 AtomNeighborhoodsFingerprints.pl, ExtendedConnectivityFingerprints.pl, | |
1348 MACCSKeysFingerprints.pl, PathLengthFingerprints.pl, | |
1349 TopologicalAtomPairsFingerprints.pl, | |
1350 TopologicalAtomTorsionsFingerprints.pl, | |
1351 TopologicalPharmacophoreAtomPairsFingerprints.pl, | |
1352 TopologicalPharmacophoreAtomTripletsFingerprints.pl | |
1353 | |
1354 COPYRIGHT | |
1355 Copyright (C) 2015 Manish Sud. All rights reserved. | |
1356 | |
1357 This file is part of MayaChemTools. | |
1358 | |
1359 MayaChemTools is free software; you can redistribute it and/or modify it | |
1360 under the terms of the GNU Lesser General Public License as published by | |
1361 the Free Software Foundation; either version 3 of the License, or (at | |
1362 your option) any later version. | |
1363 |