comparison docs/scripts/html/SimilaritySearchingFingerprints.html @ 0:4816e4a8ae95 draft default tip

Uploaded
author deepakjadmin
date Wed, 20 Jan 2016 09:23:18 -0500
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:4816e4a8ae95
1 <html>
2 <head>
3 <title>MayaChemTools:Documentation:SimilaritySearchingFingerprints.pl</title>
4 <meta http-equiv="content-type" content="text/html;charset=utf-8">
5 <link rel="stylesheet" type="text/css" href="../../css/MayaChemTools.css">
6 </head>
7 <body leftmargin="20" rightmargin="20" topmargin="10" bottommargin="10">
8 <br/>
9 <center>
10 <a href="http://www.mayachemtools.org" title="MayaChemTools Home"><img src="../../images/MayaChemToolsLogo.gif" border="0" alt="MayaChemTools"></a>
11 </center>
12 <br/>
13 <div class="DocNav">
14 <table width="100%" border=0 cellpadding=0 cellspacing=2>
15 <tr align="left" valign="top"><td width="33%" align="left"><a href="./SimilarityMatricesFingerprints.html" title="SimilarityMatricesFingerprints.html">Previous</a>&nbsp;&nbsp;<a href="./index.html" title="Table of Contents">TOC</a>&nbsp;&nbsp;<a href="./SortSDFiles.html" title="SortSDFiles.html">Next</a></td><td width="34%" align="middle"><strong>SimilaritySearchingFingerprints.pl</strong></td><td width="33%" align="right"><a href="././code/SimilaritySearchingFingerprints.html" title="View source code">Code</a>&nbsp;|&nbsp;<a href="./../pdf/SimilaritySearchingFingerprints.pdf" title="PDF US Letter Size">PDF</a>&nbsp;|&nbsp;<a href="./../pdfgreen/SimilaritySearchingFingerprints.pdf" title="PDF US Letter Size with narrow margins: www.changethemargins.com">PDFGreen</a>&nbsp;|&nbsp;<a href="./../pdfa4/SimilaritySearchingFingerprints.pdf" title="PDF A4 Size">PDFA4</a>&nbsp;|&nbsp;<a href="./../pdfa4green/SimilaritySearchingFingerprints.pdf" title="PDF A4 Size with narrow margins: www.changethemargins.com">PDFA4Green</a></td></tr>
16 </table>
17 </div>
18 <p>
19 </p>
20 <h2>NAME</h2>
21 <p>SimilaritySearchingFingerprints.pl - Perform similarity search using fingerprints strings data in SD, FP and CSV/TSV text file(s)</p>
22 <p>
23 </p>
24 <h2>SYNOPSIS</h2>
25 <p>SimilaritySearchingFingerprints.pl ReferenceFPFile DatabaseFPFile</p>
26 <p>SimilaritySearchingFingerprints.pl [<strong>--alpha</strong> <em>number</em>] [<strong>--beta</strong> <em>number</em>]
27 [<strong>-b, --BitVectorComparisonMode</strong> <em>TanimotoSimilarity | TverskySimilarity | ...</em>]
28 [<strong>--DatabaseColMode</strong> <em>ColNum | ColLabel</em>] [<strong>--DatabaseCompoundIDCol</strong> <em>col number | col name</em>]
29 [<strong>--DatabaseCompoundIDPrefix</strong> <em>text</em>] [<strong>--DatabaseCompoundIDField</strong> <em>DataFieldName</em>]
30 [<strong>--DatabaseCompoundIDMode</strong> <em>DataField | MolName | LabelPrefix | MolNameOrLabelPrefix</em>]
31 [<strong>--DatabaseDataCols</strong> <em>&quot;DataColNum1, DataColNum2,... &quot; | DataColLabel1, DataCoLabel2,... &quot;</em>]
32 [<strong>--DatabaseDataColsMode</strong> <em>All | Specify | CompoundID</em>] [<strong>--DatabaseDataFields</strong> <em>&quot;FieldLabel1, FieldLabel2,... &quot;</em>]
33 [<strong>--DatabaseDataFieldsMode</strong> <em>All | Common | Specify | CompoundID</em>]
34 [<strong>--DatabaseFingerprintsCol</strong> <em>col number | col name</em>] [<strong>--DatabaseFingerprintsField</strong> <em>FieldLabel</em>]
35 []<strong>--DistanceCutoff</strong> <em>number</em>] [<strong>-d, --detail</strong> <em>InfoLevel</em>] [<strong>-f, --fast</strong>]
36 [<strong>--FingerprintsMode</strong> <em>AutoDetect | FingerprintsBitVectorString | FingerprintsVectorString</em>]
37 [<strong>-g, --GroupFusionRule</strong> <em>Max, Mean, Median, Min, Sum, Euclidean</em>] [<strong>--GroupFusionApplyCutoff</strong> <em>Yes | No</em>]
38 [<strong>-h, --help</strong>] [<strong>--InDelim</strong> <em>comma | semicolon</em>] [<strong>-k, --KNN</strong> <em>all | number</em>]
39 [<strong>-m, --mode</strong> <em>IndividualReference | MultipleReferences</em>]
40 [<strong>-n, --NumOfSimilarMolecules</strong> <em>number</em>] [<strong>--OutDelim</strong> <em>comma | tab | semicolon</em>]
41 [<strong>--output</strong> <em>SD | text | both</em>] [<strong>-o, --overwrite</strong>]
42 [<strong>-p, --PercentSimilarMolecules</strong> <em>number</em>] [<strong>--precision</strong> <em>number</em>] [<strong>-q, --quote</strong> <em>Yes | No</em>]
43 [<strong>--ReferenceColMode</strong> <em>ColNum | ColLabel</em>] [<strong>--ReferenceCompoundIDCol</strong> <em>col number | col name</em>]
44 [<strong>--ReferenceCompoundIDPrefix</strong> <em>text</em>] [<strong>--ReferenceCompoundIDField</strong> <em>DataFieldName</em>]
45 [<strong>--ReferenceCompoundIDMode</strong> <em>DataField | MolName | LabelPrefix | MolNameOrLabelPrefix</em>]
46 [<strong>--ReferenceFingerprintsCol</strong> <em>col number | col name</em>] [<strong>--ReferenceFingerprintsField</strong> <em>FieldLabel</em>]
47 [<strong>-r, --root</strong> <em>RootName</em>] [<strong>-s, --SearchMode</strong> <em>SimilaritySearch | DissimilaritySearch</em>]
48 [<strong>--SimilarCountMode</strong> <em>NumOfSimilar | PercentSimilar</em>] [<strong>--SimilarityCutoff</strong> <em>number</em>]
49 [<strong>-v, --VectorComparisonMode</strong> <em>TanimotoSimilairy | ... | ManhattanDistance | ...</em>]
50 [<strong>--VectorComparisonFormulism</strong> <em>AlgebraicForm | BinaryForm | SetTheoreticForm</em>]
51 [<strong>-w, --WorkingDir</strong> dirname] ReferenceFingerprintsFile DatabaseFingerprintsFile</p>
52 <p>
53 </p>
54 <h2>DESCRIPTION</h2>
55 <p>Perform molecular similarity search [ Ref 94-113 ] using fingerprint bit-vector or vector strings
56 data in <em>SD, FP, or CSV/TSV text</em> files corresponding to <em>ReferenceFingerprintsFile</em> and
57 <em>DatabaseFingerprintsFile</em>, and generate SD and CSV/TSV text file(s) containing database
58 molecules which are similar to reference molecule(s). The reference molecules are also referred
59 to as query or seed molecules and database molecules as target molecules in the literature.</p>
60 <p>The current release of MayaChemTools supports two types of similarity search modes:
61 <em>IndividualReference or MultipleReferences</em>. For default value of <em>MultipleReferences</em> for <strong>-m, --mode</strong>
62 option, reference molecules are considered as a set and <strong>-g, --GroupFusionRule</strong> is used to calculate
63 similarity of a database molecule against reference molecules set. The group fusion rule is also
64 referred to as data fusion of consensus scoring in the literature. However, for <em>IndividualReference</em>
65 value of <strong>-m, --mode</strong> option, reference molecules are treated as individual molecules and each reference
66 molecule is compared against a database molecule by itself to identify similar molecules.</p>
67 <p>The molecular dissimilarity search can also be performed using <em>DissimilaritySearch</em> value for
68 <strong>-s, --SearchMode</strong> option. During dissimilarity search or usage of distance comparison coefficient
69 in similarity similarity search, the meaning of fingerprints comparison value is automatically reversed
70 as shown below:</p>
71 <div class="OptionsBox">
72 SeachMode ComparisonCoefficient ResultsSort ComparisonValues</div>
73 <div class="OptionsBox">
74 Similarity SimilarityCoefficient Descending Higher value imples
75 high similarity
76 <br/> Similarity DistanceCoefficient Ascending Lower value implies
77 high similarity</div>
78 <div class="OptionsBox">
79 Dissimilarity SimilarityCoefficient Ascending Lower value implies
80 high dissimilarity
81 <br/> Dissimilarity DistanceCoefficient Descending Higher value implies
82 high dissimilarity</div>
83 <p>During <em>IndividualReference</em> value of <strong>-m, --Mode</strong> option for similarity search, fingerprints bit-vector
84 or vector string of each reference molecule is compared with database molecules using specified
85 similarity or distance coefficients to identify most similar molecules for each reference molecule.
86 Based on value of <strong>--SimilarCountMode</strong>, up to <strong>--n, --NumOfSimilarMolecules</strong> or <strong>-p,
87 --PercentSimilarMolecules</strong> at specified <strong>--SimilarityCutoff</strong> or <strong>--DistanceCutoff</strong> are
88 identified for each reference molecule.</p>
89 <p>During <em>MultipleReferences</em> value <strong>-m, --mode</strong> option for similarity search, all reference molecules
90 are considered as a set and <strong>-g, --GroupFusionRule</strong> is used to calculate similarity of a database
91 molecule against reference molecules set either using all reference molecules or number of k-nearest
92 neighbors (k-NN) to a database molecule specified using <strong>-k, --kNN</strong>. The fingerprints bit-vector
93 or vector string of each reference molecule in a set is compared with a database molecule using
94 a similarity or distance coefficient specified via <strong>-b, --BitVectorComparisonMode</strong> or <strong>-v,
95 --VectorComparisonMode</strong>. The reference molecules whose comparison values with a database
96 molecule fall outside specified <strong>--SimilarityCutoff</strong> or <strong>--DistanceCutoff</strong> are ignored during <em>Yes</em>
97 value of <strong>--GroupFusionApplyCutoff</strong>. The specified <strong>-g, --GroupFusionRule</strong> is applied to
98 <strong>-k, --kNN</strong> reference molecules to calculate final similarity value between a database molecule
99 and reference molecules set.</p>
100 <p>The input fingerprints <em>SD, FP, or Text (CSV/TSV)</em> files for <em>ReferenceFingerprintsFile</em> and
101 <em>DatabaseTextFile</em> must contain valid fingerprint bit-vector or vector strings data corresponding to
102 same type of fingerprints.</p>
103 <p>The valid fingerprints <em>SDFile</em> extensions are <em>.sdf</em> and <em>.sd</em>. The valid fingerprints <em>FPFile</em>
104 extensions are <em>.fpf</em> and <em>.fp</em>. The valid fingerprints <em>TextFile (CSV/TSV)</em> extensions are
105 <em>.csv</em> and <em>.tsv</em> for comma/semicolon and tab delimited text files respectively. The <strong>--indelim</strong>
106 option determines the format of <em>TextFile</em>. Any file which doesn't correspond to the format indicated
107 by <strong>--indelim</strong> option is ignored.</p>
108 <p>Example of <em>FP</em> file containing fingerprints bit-vector string data:</p>
109 <div class="OptionsBox">
110 #
111 <br/> # Package = MayaChemTools 7.4
112 <br/> # ReleaseDate = Oct 21, 2010
113 <br/> #
114 <br/> # TimeStamp = Mon Mar 7 15:14:01 2011
115 <br/> #
116 <br/> # FingerprintsStringType = FingerprintsBitVector
117 <br/> #
118 <br/> # Description = PathLengthBits:AtomicInvariantsAtomTypes:MinLength1:...
119 <br/> # Size = 1024
120 <br/> # BitStringFormat = HexadecimalString
121 <br/> # BitsOrder = Ascending
122 <br/> #
123 <br/> Cmpd1 9c8460989ec8a49913991a6603130b0a19e8051c89184414953800cc21510...
124 <br/> Cmpd2 000000249400840040100042011001001980410c000000001010088001120...
125 <br/> ... ...
126 <br/> ... ..</div>
127 <p>Example of <em>FP</em> file containing fingerprints vector string data:</p>
128 <div class="OptionsBox">
129 #
130 <br/> # Package = MayaChemTools 7.4
131 <br/> # ReleaseDate = Oct 21, 2010
132 <br/> #
133 <br/> # TimeStamp = Mon Mar 7 15:14:01 2011
134 <br/> #
135 <br/> # FingerprintsStringType = FingerprintsVector
136 <br/> #
137 <br/> # Description = PathLengthBits:AtomicInvariantsAtomTypes:MinLength1:...
138 <br/> # VectorStringFormat = IDsAndValuesString
139 <br/> # VectorValuesType = NumericalValues
140 <br/> #
141 <br/> Cmpd1 338;C F N O C:C C:N C=O CC CF CN CO C:C:C C:C:N C:CC C:CF C:CN C:
142 <br/> N:C C:NC CC:N CC=O CCC CCN CCO CNC NC=O O=CO C:C:C:C C:C:C:N C:C:CC...;
143 <br/> 33 1 2 5 21 2 2 12 1 3 3 20 2 10 2 2 1 2 2 2 8 2 5 1 1 1 19 2 8 2 2 2 2
144 <br/> 6 2 2 2 2 2 2 2 2 3 2 2 1 4 1 5 1 1 18 6 2 2 1 2 10 2 1 2 1 2 2 2 2 ...
145 <br/> Cmpd2 103;C N O C=N C=O CC CN CO CC=O CCC CCN CCO CNC N=CN NC=O NCN O=C
146 <br/> O C CC=O CCCC CCCN CCCO CCNC CNC=N CNC=O CNCN CCCC=O CCCCC CCCCN CC...;
147 <br/> 15 4 4 1 2 13 5 2 2 15 5 3 2 2 1 1 1 2 17 7 6 5 1 1 1 2 15 8 5 7 2 2 2 2
148 <br/> 1 2 1 1 3 15 7 6 8 3 4 4 3 2 2 1 2 3 14 2 4 7 4 4 4 4 1 1 1 2 1 1 1 ...
149 <br/> ... ...
150 <br/> ... ...</div>
151 <p>Example of <em>SD</em> file containing fingerprints bit-vector string data:</p>
152 <div class="OptionsBox">
153 ... ...
154 <br/> ... ...
155 <br/> $$$$
156 <br/> ... ...
157 <br/> ... ...
158 <br/> ... ...
159 <br/> 41 44 0 0 0 0 0 0 0 0999 V2000
160 -3.3652 1.4499 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
161 <br/> ... ...
162 <br/> 2 3 1 0 0 0 0
163 <br/> ... ...
164 <br/> M END
165 <br/> &gt; &lt;CmpdID&gt;
166 <br/> Cmpd1</div>
167 <div class="OptionsBox">
168 &gt; &lt;PathLengthFingerprints&gt;
169 <br/> FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes:MinLengt
170 <br/> h1:MaxLength8;1024;HexadecimalString;Ascending;9c8460989ec8a49913991a66
171 <br/> 03130b0a19e8051c89184414953800cc2151082844a201042800130860308e8204d4028
172 <br/> 00831048940e44281c00060449a5000ac80c894114e006321264401600846c050164462
173 <br/> 08190410805000304a10205b0100e04c0038ba0fad0209c0ca8b1200012268b61c0026a
174 <br/> aa0660a11014a011d46</div>
175 <div class="OptionsBox">
176 $$$$
177 <br/> ... ...
178 <br/> ... ...</div>
179 <p>Example of CSV <em>TextFile</em> containing fingerprints bit-vector string data:</p>
180 <div class="OptionsBox">
181 &quot;CompoundID&quot;,&quot;PathLengthFingerprints&quot;
182 <br/> &quot;Cmpd1&quot;,&quot;FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes
183 <br/> :MinLength1:MaxLength8;1024;HexadecimalString;Ascending;9c8460989ec8a4
184 <br/> 9913991a6603130b0a19e8051c89184414953800cc2151082844a20104280013086030
185 <br/> 8e8204d402800831048940e44281c00060449a5000ac80c894114e006321264401...&quot;
186 <br/> ... ...
187 <br/> ... ...</div>
188 <p>The current release of MayaChemTools supports the following types of fingerprint
189 bit-vector and vector strings:</p>
190 <div class="OptionsBox">
191 FingerprintsVector;AtomNeighborhoods:AtomicInvariantsAtomTypes:MinRadi
192 <br/> us0:MaxRadius2;41;AlphaNumericalValues;ValuesString;NR0-C.X1.BO1.H3-AT
193 <br/> C1:NR1-C.X3.BO3.H1-ATC1:NR2-C.X1.BO1.H3-ATC1:NR2-C.X3.BO4-ATC1 NR0-C.X
194 <br/> 1.BO1.H3-ATC1:NR1-C.X3.BO3.H1-ATC1:NR2-C.X1.BO1.H3-ATC1:NR2-C.X3.BO4-A
195 <br/> TC1 NR0-C.X2.BO2.H2-ATC1:NR1-C.X2.BO2.H2-ATC1:NR1-C.X3.BO3.H1-ATC1:NR2
196 <br/> -C.X2.BO2.H2-ATC1:NR2-N.X3.BO3-ATC1:NR2-O.X1.BO1.H1-ATC1 NR0-C.X2.B...</div>
197 <div class="OptionsBox">
198 FingerprintsVector;AtomTypesCount:AtomicInvariantsAtomTypes:ArbitraryS
199 <br/> ize;10;NumericalValues;IDsAndValuesString;C.X1.BO1.H3 C.X2.BO2.H2 C.X2
200 <br/> .BO3.H1 C.X3.BO3.H1 C.X3.BO4 F.X1.BO1 N.X2.BO2.H1 N.X3.BO3 O.X1.BO1.H1
201 <br/> O.X1.BO2;2 4 14 3 10 1 1 1 3 2</div>
202 <div class="OptionsBox">
203 FingerprintsVector;AtomTypesCount:SLogPAtomTypes:ArbitrarySize;16;Nume
204 <br/> ricalValues;IDsAndValuesString;C1 C10 C11 C14 C18 C20 C21 C22 C5 CS F
205 <br/> N11 N4 O10 O2 O9;5 1 1 1 14 4 2 1 2 2 1 1 1 1 3 1</div>
206 <div class="OptionsBox">
207 FingerprintsVector;AtomTypesCount:SLogPAtomTypes:FixedSize;67;OrderedN
208 <br/> umericalValues;IDsAndValuesString;C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C
209 <br/> 12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 C25 C26 C27 CS N1 N
210 <br/> 2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 N14 NS O1 O2 O3 O4 O5 O6 O7 O8
211 <br/> O9 O10 O11 O12 OS F Cl Br I Hal P S1 S2 S3 Me1 Me2;5 0 0 0 2 0 0 0 0 1
212 <br/> 1 0 0 1 0 0 0 14 0 4 2 1 0 0 0 0 0 2 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0...</div>
213 <div class="OptionsBox">
214 FingerprintsVector;EStateIndicies:ArbitrarySize;11;NumericalValues;IDs
215 <br/> AndValuesString;SaaCH SaasC SaasN SdO SdssC SsCH3 SsF SsOH SssCH2 SssN
216 <br/> H SsssCH;24.778 4.387 1.993 25.023 -1.435 3.975 14.006 29.759 -0.073 3
217 <br/> .024 -2.270</div>
218 <div class="OptionsBox">
219 FingerprintsVector;EStateIndicies:FixedSize;87;OrderedNumericalValues;
220 <br/> ValuesString;0 0 0 0 0 0 0 3.975 0 -0.073 0 0 24.778 -2.270 0 0 -1.435
221 <br/> 4.387 0 0 0 0 0 0 3.024 0 0 0 0 0 0 0 1.993 0 29.759 25.023 0 0 0 0 1
222 <br/> 4.006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
223 <br/> 0 0 0 0 0 0 0 0 0 0 0 0 0 0</div>
224 <div class="OptionsBox">
225 FingerprintsVector;ExtendedConnectivity:AtomicInvariantsAtomTypes:Radi
226 <br/> us2;60;AlphaNumericalValues;ValuesString;73555770 333564680 352413391
227 <br/> 666191900 1001270906 1371674323 1481469939 1977749791 2006158649 21414
228 <br/> 08799 49532520 64643108 79385615 96062769 273726379 564565671 85514103
229 <br/> 5 906706094 988546669 1018231313 1032696425 1197507444 1331250018 1338
230 <br/> 532734 1455473691 1607485225 1609687129 1631614296 1670251330 17303...</div>
231 <div class="OptionsBox">
232 FingerprintsVector;ExtendedConnectivityCount:AtomicInvariantsAtomTypes
233 <br/> :Radius2;60;NumericalValues;IDsAndValuesString;73555770 333564680 3524
234 <br/> 13391 666191900 1001270906 1371674323 1481469939 1977749791 2006158649
235 <br/> 2141408799 49532520 64643108 79385615 96062769 273726379 564565671...;
236 <br/> 3 2 1 1 14 1 2 10 4 3 1 1 1 1 2 1 2 1 1 1 2 3 1 1 2 1 3 3 8 2 2 2 6 2
237 <br/> 1 2 1 1 2 1 1 1 2 1 1 2 1 2 1 1 1 1 1 1 1 1 1 2 1 1</div>
238 <div class="OptionsBox">
239 FingerprintsBitVector;ExtendedConnectivityBits:AtomicInvariantsAtomTyp
240 <br/> es:Radius2;1024;BinaryString;Ascending;0000000000000000000000000000100
241 <br/> 0000000001010000000110000011000000000000100000000000000000000000100001
242 <br/> 1000000110000000000000000000000000010011000000000000000000000000010000
243 <br/> 0000000000000000000000000010000000000000000001000000000000000000000000
244 <br/> 0000000000010000100001000000000000101000000000000000100000000000000...</div>
245 <div class="OptionsBox">
246 FingerprintsVector;ExtendedConnectivity:FunctionalClassAtomTypes:Radiu
247 <br/> s2;57;AlphaNumericalValues;ValuesString;24769214 508787397 850393286 8
248 <br/> 62102353 981185303 1231636850 1649386610 1941540674 263599683 32920567
249 <br/> 1 571109041 639579325 683993318 723853089 810600886 885767127 90326012
250 <br/> 7 958841485 981022393 1126908698 1152248391 1317567065 1421489994 1455
251 <br/> 632544 1557272891 1826413669 1983319256 2015750777 2029559552 20404...</div>
252 <div class="OptionsBox">
253 FingerprintsVector;ExtendedConnectivity:EStateAtomTypes:Radius2;62;Alp
254 <br/> haNumericalValues;ValuesString;25189973 528584866 662581668 671034184
255 <br/> 926543080 1347067490 1738510057 1759600920 2034425745 2097234755 21450
256 <br/> 44754 96779665 180364292 341712110 345278822 386540408 387387308 50430
257 <br/> 1706 617094135 771528807 957666640 997798220 1158349170 1291258082 134
258 <br/> 1138533 1395329837 1420277211 1479584608 1486476397 1487556246 1566...</div>
259 <div class="OptionsBox">
260 FingerprintsBitVector;MACCSKeyBits;166;BinaryString;Ascending;00000000
261 <br/> 0000000000000000000000000000000001001000010010000000010010000000011100
262 <br/> 0100101010111100011011000100110110000011011110100110111111111111011111
263 <br/> 11111111111110111000</div>
264 <div class="OptionsBox">
265 FingerprintsBitVector;MACCSKeyBits;322;BinaryString;Ascending;11101011
266 <br/> 1110011111100101111111000111101100110000000000000011100010000000000000
267 <br/> 0000000000000000000000000000000000000000000000101000000000000000000000
268 <br/> 0000000000000000000000000000000000000000000000000000000000000000000000
269 <br/> 0000000000000000000000000000000000000011000000000000000000000000000000
270 <br/> 0000000000000000000000000000000000000000</div>
271 <div class="OptionsBox">
272 FingerprintsVector;MACCSKeyCount;166;OrderedNumericalValues;ValuesStri
273 <br/> ng;0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
274 <br/> 0 0 0 0 0 0 0 1 0 0 3 0 0 0 0 4 0 0 2 0 0 0 0 0 0 0 0 2 0 0 2 0 0 0 0
275 <br/> 0 0 0 0 1 1 8 0 0 0 1 0 0 1 0 1 0 1 0 3 1 3 1 0 0 0 1 2 0 11 1 0 0 0
276 <br/> 5 0 0 1 2 0 1 1 0 0 0 0 0 1 1 0 1 1 1 1 0 4 0 0 1 1 0 4 6 1 1 1 2 1 1
277 <br/> 3 5 2 2 0 5 3 5 1 1 2 5 1 2 1 2 4 8 3 5 5 2 2 0 3 5 4 1</div>
278 <div class="OptionsBox">
279 FingerprintsVector;MACCSKeyCount;322;OrderedNumericalValues;ValuesStri
280 <br/> ng;14 8 2 0 2 0 4 4 2 1 4 0 0 2 5 10 5 2 1 0 0 2 0 5 13 3 28 5 5 3 0 0
281 <br/> 0 4 2 1 1 0 1 1 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 5 3 0 0 0 1 0
282 <br/> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
283 <br/> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 2 0 0 0 0 0 0 0 0 0
284 <br/> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...</div>
285 <div class="OptionsBox">
286 FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes:MinLeng
287 <br/> th1:MaxLength8;1024;BinaryString;Ascending;001000010011010101011000110
288 <br/> 0100010101011000101001011100110001000010001001101000001001001001001000
289 <br/> 0010110100000111001001000001001010100100100000000011000000101001011100
290 <br/> 0010000001000101010100000100111100110111011011011000000010110111001101
291 <br/> 0101100011000000010001000011000010100011101100001000001000100000000...</div>
292 <div class="OptionsBox">
293 FingerprintsVector;PathLengthCount:AtomicInvariantsAtomTypes:MinLength
294 <br/> 1:MaxLength8;432;NumericalValues;IDsAndValuesPairsString;C.X1.BO1.H3 2
295 <br/> C.X2.BO2.H2 4 C.X2.BO3.H1 14 C.X3.BO3.H1 3 C.X3.BO4 10 F.X1.BO1 1 N.X
296 <br/> 2.BO2.H1 1 N.X3.BO3 1 O.X1.BO1.H1 3 O.X1.BO2 2 C.X1.BO1.H3C.X3.BO3.H1
297 <br/> 2 C.X2.BO2.H2C.X2.BO2.H2 1 C.X2.BO2.H2C.X3.BO3.H1 4 C.X2.BO2.H2C.X3.BO
298 <br/> 4 1 C.X2.BO2.H2N.X3.BO3 1 C.X2.BO3.H1:C.X2.BO3.H1 10 C.X2.BO3.H1:C....</div>
299 <div class="OptionsBox">
300 FingerprintsVector;PathLengthCount:MMFF94AtomTypes:MinLength1:MaxLengt
301 <br/> h8;463;NumericalValues;IDsAndValuesPairsString;C5A 2 C5B 2 C=ON 1 CB 1
302 <br/> 8 COO 1 CR 9 F 1 N5 1 NC=O 1 O=CN 1 O=CO 1 OC=O 1 OR 2 C5A:C5B 2 C5A:N
303 <br/> 5 2 C5ACB 1 C5ACR 1 C5B:C5B 1 C5BC=ON 1 C5BCB 1 C=ON=O=CN 1 C=ONNC=O 1
304 <br/> CB:CB 18 CBF 1 CBNC=O 1 COO=O=CO 1 COOCR 1 COOOC=O 1 CRCR 7 CRN5 1 CR
305 <br/> OR 2 C5A:C5B:C5B 2 C5A:C5BC=ON 1 C5A:C5BCB 1 C5A:N5:C5A 1 C5A:N5CR ...</div>
306 <div class="OptionsBox">
307 FingerprintsVector;TopologicalAtomPairs:AtomicInvariantsAtomTypes:MinD
308 <br/> istance1:MaxDistance10;223;NumericalValues;IDsAndValuesString;C.X1.BO1
309 <br/> .H3-D1-C.X3.BO3.H1 C.X2.BO2.H2-D1-C.X2.BO2.H2 C.X2.BO2.H2-D1-C.X3.BO3.
310 <br/> H1 C.X2.BO2.H2-D1-C.X3.BO4 C.X2.BO2.H2-D1-N.X3.BO3 C.X2.BO3.H1-D1-...;
311 <br/> 2 1 4 1 1 10 8 1 2 6 1 2 2 1 2 1 2 2 1 2 1 5 1 10 12 2 2 1 2 1 9 1 3 1
312 <br/> 1 1 2 2 1 3 6 1 6 14 2 2 2 3 1 3 1 8 2 2 1 3 2 6 1 2 2 5 1 3 1 23 1...</div>
313 <div class="OptionsBox">
314 FingerprintsVector;TopologicalAtomPairs:FunctionalClassAtomTypes:MinDi
315 <br/> stance1:MaxDistance10;144;NumericalValues;IDsAndValuesString;Ar-D1-Ar
316 <br/> Ar-D1-Ar.HBA Ar-D1-HBD Ar-D1-Hal Ar-D1-None Ar.HBA-D1-None HBA-D1-NI H
317 <br/> BA-D1-None HBA.HBD-D1-NI HBA.HBD-D1-None HBD-D1-None NI-D1-None No...;
318 <br/> 23 2 1 1 2 1 1 1 1 2 1 1 7 28 3 1 3 2 8 2 1 1 1 5 1 5 24 3 3 4 2 13 4
319 <br/> 1 1 4 1 5 22 4 4 3 1 19 1 1 1 1 1 2 2 3 1 1 8 25 4 5 2 3 1 26 1 4 1 ...</div>
320 <div class="OptionsBox">
321 FingerprintsVector;TopologicalAtomTorsions:AtomicInvariantsAtomTypes;3
322 <br/> 3;NumericalValues;IDsAndValuesString;C.X1.BO1.H3-C.X3.BO3.H1-C.X3.BO4-
323 <br/> C.X3.BO4 C.X1.BO1.H3-C.X3.BO3.H1-C.X3.BO4-N.X3.BO3 C.X2.BO2.H2-C.X2.BO
324 <br/> 2.H2-C.X3.BO3.H1-C.X2.BO2.H2 C.X2.BO2.H2-C.X2.BO2.H2-C.X3.BO3.H1-O...;
325 <br/> 2 2 1 1 2 2 1 1 3 4 4 8 4 2 2 6 2 2 1 2 1 1 2 1 1 2 6 2 4 2 1 3 1</div>
326 <div class="OptionsBox">
327 FingerprintsVector;TopologicalAtomTorsions:EStateAtomTypes;36;Numerica
328 <br/> lValues;IDsAndValuesString;aaCH-aaCH-aaCH-aaCH aaCH-aaCH-aaCH-aasC aaC
329 <br/> H-aaCH-aasC-aaCH aaCH-aaCH-aasC-aasC aaCH-aaCH-aasC-sF aaCH-aaCH-aasC-
330 <br/> ssNH aaCH-aasC-aasC-aasC aaCH-aasC-aasC-aasN aaCH-aasC-ssNH-dssC a...;
331 <br/> 4 4 8 4 2 2 6 2 2 2 4 3 2 1 3 3 2 2 2 1 2 1 1 1 2 1 1 1 1 1 1 1 2 1 1 2</div>
332 <div class="OptionsBox">
333 FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAtomTypes:M
334 <br/> inDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesString;C.X1
335 <br/> .BO1.H3-D1-C.X1.BO1.H3-D1-C.X3.BO3.H1-D2 C.X1.BO1.H3-D1-C.X2.BO2.H2-D1
336 <br/> 0-C.X3.BO4-D9 C.X1.BO1.H3-D1-C.X2.BO2.H2-D3-N.X3.BO3-D4 C.X1.BO1.H3-D1
337 <br/> -C.X2.BO2.H2-D4-C.X2.BO2.H2-D5 C.X1.BO1.H3-D1-C.X2.BO2.H2-D6-C.X3....;
338 <br/> 1 2 2 2 2 2 2 2 8 8 4 8 4 4 2 2 2 2 4 2 2 2 4 2 2 2 2 1 2 2 4 4 4 2 2
339 <br/> 2 4 4 4 8 4 4 2 4 4 4 2 4 4 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 8...</div>
340 <div class="OptionsBox">
341 FingerprintsVector;TopologicalAtomTriplets:SYBYLAtomTypes:MinDistance1
342 <br/> :MaxDistance10;2332;NumericalValues;IDsAndValuesString;C.2-D1-C.2-D9-C
343 <br/> .3-D10 C.2-D1-C.2-D9-C.ar-D10 C.2-D1-C.3-D1-C.3-D2 C.2-D1-C.3-D10-C.3-
344 <br/> D9 C.2-D1-C.3-D2-C.3-D3 C.2-D1-C.3-D2-C.ar-D3 C.2-D1-C.3-D3-C.3-D4 C.2
345 <br/> -D1-C.3-D3-N.ar-D4 C.2-D1-C.3-D3-O.3-D2 C.2-D1-C.3-D4-C.3-D5 C.2-D1-C.
346 <br/> 3-D5-C.3-D6 C.2-D1-C.3-D5-O.3-D4 C.2-D1-C.3-D6-C.3-D7 C.2-D1-C.3-D7...</div>
347 <div class="OptionsBox">
348 FingerprintsVector;TopologicalPharmacophoreAtomPairs:ArbitrarySize:Min
349 <br/> Distance1:MaxDistance10;54;NumericalValues;IDsAndValuesString;H-D1-H H
350 <br/> -D1-NI HBA-D1-NI HBD-D1-NI H-D2-H H-D2-HBA H-D2-HBD HBA-D2-HBA HBA-D2-
351 <br/> HBD H-D3-H H-D3-HBA H-D3-HBD H-D3-NI HBA-D3-NI HBD-D3-NI H-D4-H H-D4-H
352 <br/> BA H-D4-HBD HBA-D4-HBA HBA-D4-HBD HBD-D4-HBD H-D5-H H-D5-HBA H-D5-...;
353 <br/> 18 1 2 1 22 12 8 1 2 18 6 3 1 1 1 22 13 6 5 7 2 28 9 5 1 1 1 36 16 10
354 <br/> 3 4 1 37 10 8 1 35 10 9 3 3 1 28 7 7 4 18 16 12 5 1 2 1</div>
355 <div class="OptionsBox">
356 FingerprintsVector;TopologicalPharmacophoreAtomPairs:FixedSize:MinDist
357 <br/> ance1:MaxDistance10;150;OrderedNumericalValues;ValuesString;18 0 0 1 0
358 <br/> 0 0 2 0 0 1 0 0 0 0 22 12 8 0 0 1 2 0 0 0 0 0 0 0 0 18 6 3 1 0 0 0 1
359 <br/> 0 0 1 0 0 0 0 22 13 6 0 0 5 7 0 0 2 0 0 0 0 0 28 9 5 1 0 0 0 1 0 0 1 0
360 <br/> 0 0 0 36 16 10 0 0 3 4 0 0 1 0 0 0 0 0 37 10 8 0 0 0 0 1 0 0 0 0 0 0
361 <br/> 0 35 10 9 0 0 3 3 0 0 1 0 0 0 0 0 28 7 7 4 0 0 0 0 0 0 0 0 0 0 0 18...</div>
362 <div class="OptionsBox">
363 FingerprintsVector;TopologicalPharmacophoreAtomTriplets:ArbitrarySize:
364 <br/> MinDistance1:MaxDistance10;696;NumericalValues;IDsAndValuesString;Ar1-
365 <br/> Ar1-Ar1 Ar1-Ar1-H1 Ar1-Ar1-HBA1 Ar1-Ar1-HBD1 Ar1-H1-H1 Ar1-H1-HBA1 Ar1
366 <br/> -H1-HBD1 Ar1-HBA1-HBD1 H1-H1-H1 H1-H1-HBA1 H1-H1-HBD1 H1-HBA1-HBA1 H1-
367 <br/> HBA1-HBD1 H1-HBA1-NI1 H1-HBD1-NI1 HBA1-HBA1-NI1 HBA1-HBD1-NI1 Ar1-...;
368 <br/> 46 106 8 3 83 11 4 1 21 5 3 1 2 2 1 1 1 100 101 18 11 145 132 26 14 23
369 <br/> 28 3 3 5 4 61 45 10 4 16 20 7 5 1 3 4 5 3 1 1 1 1 5 4 2 1 2 2 2 1 1 1
370 <br/> 119 123 24 15 185 202 41 25 22 17 3 5 85 95 18 11 23 17 3 1 1 6 4 ...</div>
371 <div class="OptionsBox">
372 FingerprintsVector;TopologicalPharmacophoreAtomTriplets:FixedSize:MinD
373 <br/> istance1:MaxDistance10;2692;OrderedNumericalValues;ValuesString;46 106
374 <br/> 8 3 0 0 83 11 4 0 0 0 1 0 0 0 0 0 0 0 0 21 5 3 0 0 1 2 2 0 0 1 0 0 0
375 <br/> 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 101 18 11 0 0 145 132 26
376 <br/> 14 0 0 23 28 3 3 0 0 5 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 61 45 10 4 0
377 <br/> 0 16 20 7 5 1 0 3 4 5 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 5 ...</div>
378 <p>
379 </p>
380 <h2>OPTIONS</h2>
381 <dl>
382 <dt><strong><strong>--alpha</strong> <em>number</em></strong></dt>
383 <dd>
384 <p>Value of alpha parameter for calculating <em>Tversky</em> similarity coefficient specified for
385 <strong>-b, --BitVectorComparisonMode</strong> option. It corresponds to weights assigned for bits set
386 to &quot;1&quot; in a pair of fingerprint bit-vectors during the calculation of similarity coefficient. Possible
387 values: <em>0 to 1</em>. Default value: &lt;0.5&gt;.</p>
388 </dd>
389 <dt><strong><strong>--beta</strong> <em>number</em></strong></dt>
390 <dd>
391 <p>Value of beta parameter for calculating <em>WeightedTanimoto</em> and <em>WeightedTversky</em>
392 similarity coefficients specified for <strong>-b, --BitVectorComparisonMode</strong> option. It is used to
393 weight the contributions of bits set to &quot;0&quot; during the calculation of similarity coefficients. Possible
394 values: <em>0 to 1</em>. Default value of &lt;1&gt; makes <em>WeightedTanimoto</em> and <em>WeightedTversky</em>
395 equivalent to <em>Tanimoto</em> and <em>Tversky</em>.</p>
396 </dd>
397 <dt><strong><strong>-b, --BitVectorComparisonMode</strong> <em>TanimotoSimilarity | TverskySimilarity | ...</em></strong></dt>
398 <dd>
399 <p>Specify what similarity coefficient to use for calculating similarity between fingerprints bit-vector
400 string data values in <em>ReferenceFingerprintsFile</em> and <em>DatabaseFingerprintsFile</em> during similarity
401 search. Possible values: <em>TanimotoSimilarity | TverskySimilarity | ...</em>. Default: <em>TanimotoSimilarity</em></p>
402 <p>The current release supports the following similarity coefficients: <em>BaroniUrbaniSimilarity, BuserSimilarity,
403 CosineSimilarity, DiceSimilarity, DennisSimilarity, ForbesSimilarity, FossumSimilarity, HamannSimilarity, JacardSimilarity,
404 Kulczynski1Similarity, Kulczynski2Similarity, MatchingSimilarity, McConnaugheySimilarity, OchiaiSimilarity,
405 PearsonSimilarity, RogersTanimotoSimilarity, RussellRaoSimilarity, SimpsonSimilarity, SkoalSneath1Similarity,
406 SkoalSneath2Similarity, SkoalSneath3Similarity, TanimotoSimilarity, TverskySimilarity, YuleSimilarity,
407 WeightedTanimotoSimilarity, WeightedTverskySimilarity</em>. These similarity coefficients are described below.</p>
408 <p>For two fingerprint bit-vectors A and B of same size, let:</p>
409 <div class="OptionsBox">
410 Na = Number of bits set to &quot;1&quot; in A
411 <br/> Nb = Number of bits set to &quot;1&quot; in B
412 <br/> Nc = Number of bits set to &quot;1&quot; in both A and B
413 <br/> Nd = Number of bits set to &quot;0&quot; in both A and B</div>
414 <div class="OptionsBox">
415 Nt = Number of bits set to &quot;1&quot; or &quot;0&quot; in A or B (Size of A or B)
416 <br/> Nt = Na + Nb - Nc + Nd</div>
417 <div class="OptionsBox">
418 Na - Nc = Number of bits set to &quot;1&quot; in A but not in B
419 <br/> Nb - Nc = Number of bits set to &quot;1&quot; in B but not in A</div>
420 <p>Then, various similarity coefficients [ Ref. 40 - 42 ] for a pair of bit-vectors A and B are
421 defined as follows:</p>
422 <p><em>BaroniUrbaniSimilarity</em>: ( SQRT( Nc * Nd ) + Nc ) / ( SQRT ( Nc * Nd ) + Nc + ( Na - Nc ) + ( Nb - Nc ) ) ( same as Buser )</p>
423 <p><em>BuserSimilarity</em>: ( SQRT ( Nc * Nd ) + Nc ) / ( SQRT ( Nc * Nd ) + Nc + ( Na - Nc ) + ( Nb - Nc ) ) ( same as BaroniUrbani )</p>
424 <p><em>CosineSimilarity</em>: Nc / SQRT ( Na * Nb ) (same as Ochiai)</p>
425 <p><em>DiceSimilarity</em>: (2 * Nc) / ( Na + Nb )</p>
426 <p><em>DennisSimilarity</em>: ( Nc * Nd - ( ( Na - Nc ) * ( Nb - Nc ) ) ) / SQRT ( Nt * Na * Nb)</p>
427 <p><em>ForbesSimilarity</em>: ( Nt * Nc ) / ( Na * Nb )</p>
428 <p><em>FossumSimilarity</em>: ( Nt * ( ( Nc - 1/2 ) ** 2 ) / ( Na * Nb )</p>
429 <p><em>HamannSimilarity</em>: ( ( Nc + Nd ) - ( Na - Nc ) - ( Nb - Nc ) ) / Nt</p>
430 <p><em>JaccardSimilarity</em>: Nc / ( ( Na - Nc) + ( Nb - Nc ) + Nc ) = Nc / ( Na + Nb - Nc ) (same as Tanimoto)</p>
431 <p><em>Kulczynski1Similarity</em>: Nc / ( ( Na - Nc ) + ( Nb - Nc) ) = Nc / ( Na + Nb - 2Nc )</p>
432 <p><em>Kulczynski2Similarity</em>: ( ( Nc / 2 ) * ( 2 * Nc + ( Na - Nc ) + ( Nb - Nc) ) ) / ( ( Nc + ( Na - Nc ) ) * ( Nc + ( Nb - Nc ) ) ) = 0.5 * ( Nc / Na + Nc / Nb )</p>
433 <p><em>MatchingSimilarity</em>: ( Nc + Nd ) / Nt</p>
434 <p><em>McConnaugheySimilarity</em>: ( Nc ** 2 - ( Na - Nc ) * ( Nb - Nc) ) / ( Na * Nb )</p>
435 <p><em>OchiaiSimilarity</em>: Nc / SQRT ( Na * Nb ) (same as Cosine)</p>
436 <p><em>PearsonSimilarity</em>: ( ( Nc * Nd ) - ( ( Na - Nc ) * ( Nb - Nc ) ) / SQRT ( Na * Nb * ( Na - Nc + Nd ) * ( Nb - Nc + Nd ) )</p>
437 <p><em>RogersTanimotoSimilarity</em>: ( Nc + Nd ) / ( ( Na - Nc) + ( Nb - Nc) + Nt) = ( Nc + Nd ) / ( Na + Nb - 2Nc + Nt)</p>
438 <p><em>RussellRaoSimilarity</em>: Nc / Nt</p>
439 <p><em>SimpsonSimilarity</em>: Nc / MIN ( Na, Nb)</p>
440 <p><em>SkoalSneath1Similarity</em>: Nc / ( Nc + 2 * ( Na - Nc) + 2 * ( Nb - Nc) ) = Nc / ( 2 * Na + 2 * Nb - 3 * Nc )</p>
441 <p><em>SkoalSneath2Similarity</em>: ( 2 * Nc + 2 * Nd ) / ( Nc + Nd + Nt )</p>
442 <p><em>SkoalSneath3Similarity</em>: ( Nc + Nd ) / ( ( Na - Nc ) + ( Nb - Nc ) ) = ( Nc + Nd ) / ( Na + Nb - 2 * Nc )</p>
443 <p><em>TanimotoSimilarity</em>: Nc / ( ( Na - Nc) + ( Nb - Nc ) + Nc ) = Nc / ( Na + Nb - Nc ) (same as Jaccard)</p>
444 <p><em>TverskySimilarity</em>: Nc / ( alpha * ( Na - Nc ) + ( 1 - alpha) * ( Nb - Nc) + Nc ) = Nc / ( alpha * ( Na - Nb ) + Nb)</p>
445 <p><em>YuleSimilarity</em>: ( ( Nc * Nd ) - ( ( Na - Nc ) * ( Nb - Nc ) ) ) / ( ( Nc * Nd ) + ( ( Na - Nc ) * ( Nb - Nc ) ) )</p>
446 <p>Values of Tanimoto/Jaccard and Tversky coefficients are dependent on only those bit which
447 are set to &quot;1&quot; in both A and B. In order to take into account all bit positions, modified versions
448 of Tanimoto [ Ref. 42 ] and Tversky [ Ref. 43 ] have been developed.</p>
449 <p>Let:</p>
450 <div class="OptionsBox">
451 Na' = Number of bits set to &quot;0&quot; in A
452 <br/> Nb' = Number of bits set to &quot;0&quot; in B
453 <br/> Nc' = Number of bits set to &quot;0&quot; in both A and B</div>
454 <p>Tanimoto': Nc' / ( ( Na' - Nc') + ( Nb' - Nc' ) + Nc' ) = Nc' / ( Na' + Nb' - Nc' )</p>
455 <p>Tversky': Nc' / ( alpha * ( Na' - Nc' ) + ( 1 - alpha) * ( Nb' - Nc' ) + Nc' ) = Nc' / ( alpha * ( Na' - Nb' ) + Nb')</p>
456 <p>Then:</p>
457 <p><em>WeightedTanimotoSimilarity</em> = beta * Tanimoto + (1 - beta) * Tanimoto'</p>
458 <p><em>WeightedTverskySimilarity</em> = beta * Tversky + (1 - beta) * Tversky'</p>
459 </dd>
460 <dt><strong><strong>--DatabaseColMode</strong> <em>ColNum | ColLabel</em></strong></dt>
461 <dd>
462 <p>Specify how columns are identified in database fingerprints <em>TextFile</em>: using column
463 number or column label. Possible values: <em>ColNum or ColLabel</em>. Default value: <em>ColNum</em>.</p>
464 </dd>
465 <dt><strong><strong>--DatabaseCompoundIDCol</strong> <em>col number | col name</em></strong></dt>
466 <dd>
467 <p>This value is <strong>--DatabaseColMode</strong> mode specific. It specifies column to use for retrieving compound
468 ID from database fingerprints <em>TextFile</em> during similarity and dissimilarity search for output SD and
469 CSV/TSV text files. Possible values: <em>col number or col label</em>. Default value: <em>first column containing
470 the word compoundID in its column label or sequentially generated IDs</em>.</p>
471 <p>This is only used for <em>CompoundID</em> value of <strong>--DatabaseDataColsMode</strong> option.</p>
472 </dd>
473 <dt><strong><strong>--DatabaseCompoundIDPrefix</strong> <em>text</em></strong></dt>
474 <dd>
475 <p>Specify compound ID prefix to use during sequential generation of compound IDs for database fingerprints
476 <em>SDFile</em> and <em>TextFile</em>. Default value: <em>Cmpd</em>. The default value generates compound IDs which look
477 like Cmpd&lt;Number&gt;.</p>
478 <p>For database fingerprints <em>SDFile</em>, this value is only used during <em>LabelPrefix | MolNameOrLabelPrefix</em>
479 values of <strong>--DatabaseCompoundIDMode</strong> option; otherwise, it's ignored.</p>
480 <p>Examples for <em>LabelPrefix</em> or <em>MolNameOrLabelPrefix</em> value of <strong>--DatabaseCompoundIDMode</strong>:</p>
481 <div class="OptionsBox">
482 Compound</div>
483 <p>The values specified above generates compound IDs which correspond to Compound&lt;Number&gt;
484 instead of default value of Cmpd&lt;Number&gt;.</p>
485 </dd>
486 <dt><strong><strong>--DatabaseCompoundIDField</strong> <em>DataFieldName</em></strong></dt>
487 <dd>
488 <p>Specify database fingerprints <em>SDFile</em> datafield label for generating compound IDs. This value is
489 only used during <em>DataField</em> value of <strong>--DatabaseCompoundIDMode</strong> option.</p>
490 <p>Examples for <em>DataField</em> value of <strong>--DatabaseCompoundIDMode</strong>:</p>
491 <div class="OptionsBox">
492 MolID
493 <br/> ExtReg</div>
494 </dd>
495 <dt><strong><strong>--DatabaseCompoundIDMode</strong> <em>DataField | MolName | LabelPrefix | MolNameOrLabelPrefix</em></strong></dt>
496 <dd>
497 <p>Specify how to generate compound IDs from database fingerprints <em>SDFile</em> during similarity and
498 dissimilarity search for output SD and CSV/TSV text files: use a <em>SDFile</em> datafield value; use
499 molname line from <em>SDFile</em>; generate a sequential ID with specific prefix; use combination of both
500 MolName and LabelPrefix with usage of LabelPrefix values for empty molname lines.</p>
501 <p>Possible values: <em>DataField | MolName | LabelPrefix | MolNameOrLabelPrefix</em>.
502 Default: <em>LabelPrefix</em>.</p>
503 <p>For <em>MolNameAndLabelPrefix</em> value of <strong>--DatabaseCompoundIDMode</strong>, molname line in <em>SDFile</em> takes
504 precedence over sequential compound IDs generated using <em>LabelPrefix</em> and only empty molname
505 values are replaced with sequential compound IDs.</p>
506 <p>This is only used for <em>CompoundID</em> value of <strong>--DatabaseDataFieldsMode</strong> option.</p>
507 </dd>
508 <dt><strong><strong>--DatabaseDataCols</strong> <em>&quot;DataColNum1,DataColNum2,... &quot; | DataColLabel1,DataCoLabel2,... &quot;</em></strong></dt>
509 <dd>
510 <p>This value is <strong>--DatabaseColMode</strong> mode specific. It is a comma delimited list of database fingerprints
511 <em>TextFile</em> data column numbers or labels to extract and write to SD and CSV/TSV text files along with
512 other information for <em>SD | text | both</em> values of <strong>--output</strong> option.</p>
513 <p>This is only used for <em>Specify</em> value of <strong>--DatabaseDataColsMode</strong> option.</p>
514 <p>Examples:</p>
515 <div class="OptionsBox">
516 1,2,3
517 <br/> CompoundName,MolWt</div>
518 </dd>
519 <dt><strong><strong>--DatabaseDataColsMode</strong> <em>All | Specify | CompoundID</em></strong></dt>
520 <dd>
521 <p>Specify how data columns from database fingerprints <em>TextFile</em> are transferred to output SD and
522 CSV/TSV text files along with other information for <em>SD | text | both</em> values of <strong>--output</strong> option:
523 transfer all data columns; extract specified data columns; generate a compound ID database compound
524 prefix. Possible values: <em>All | Specify | CompoundID</em>. Default value: <em>CompoundID</em>.</p>
525 </dd>
526 <dt><strong><strong>--DatabaseDataFields</strong> <em>&quot;FieldLabel1,FieldLabel2,... &quot;</em></strong></dt>
527 <dd>
528 <p>Comma delimited list of database fingerprints <em>SDFile</em> data fields to extract and write to SD
529 and CSV/TSV text files along with other information for <em>SD | text | both</em> values of
530 <strong>--output</strong> option.</p>
531 <p>This is only used for <em>Specify</em> value of <strong>--DatabaseDataFieldsMode</strong> option.</p>
532 <p>Examples:</p>
533 <div class="OptionsBox">
534 Extreg
535 <br/> MolID,CompoundName</div>
536 </dd>
537 <dt><strong><strong>--DatabaseDataFieldsMode</strong> <em>All | Common | Specify | CompoundID</em></strong></dt>
538 <dd>
539 <p>Specify how data fields from database fingerprints <em>SDFile</em> are transferred to output SD and
540 CSV/TSV text files along with other information for <em>SD | text | both</em> values of <strong>--output</strong>
541 option: transfer all SD data field; transfer SD data files common to all compounds; extract
542 specified data fields; generate a compound ID using molname line, a compound prefix, or a
543 combination of both. Possible values: <em>All | Common | specify | CompoundID</em>. Default value:
544 <em>CompoundID</em>.</p>
545 </dd>
546 <dt><strong><strong>--DatabaseFingerprintsCol</strong> <em>col number | col name</em></strong></dt>
547 <dd>
548 <p>This value is <strong>--DatabaseColMode</strong> specific. It specifies fingerprints column to use during similarity
549 and dissimilarity search for database fingerprints <em>TextFile</em>. Possible values: <em>col number or col label</em>.
550 Default value: <em>first column containing the word Fingerprints in its column label</em>.</p>
551 </dd>
552 <dt><strong><strong>--DatabaseFingerprintsField</strong> <em>FieldLabel</em></strong></dt>
553 <dd>
554 <p>Fingerprints field label to use during similarity and dissimilarity search for database fingerprints <em>SDFile</em>.
555 Default value: <em>first data field label containing the word Fingerprints in its label</em></p>
556 </dd>
557 <dt><strong><strong>--DistanceCutoff</strong> <em>number</em></strong></dt>
558 <dd>
559 <p>Distance cutoff value to use during comparison of distance value between a pair of database
560 and reference molecule calculated by distance comparison methods for fingerprints vector
561 string data values. Possible values: <em>Any valid number</em>. Default value: <em>10</em>.</p>
562 <p>The comparison value between a pair of database and reference molecule must meet the cutoff
563 criterion as shown below:</p>
564 <div class="OptionsBox">
565 SeachMode CutoffCriterion ComparisonValues</div>
566 <div class="OptionsBox">
567 Similarity &lt;= Lower value implies high similarity
568 <br/> Dissimilarity &gt;= Higher value implies high dissimilarity</div>
569 <p>This option is only used during distance coefficients values of <strong>-v, --VectorComparisonMode</strong>
570 option.</p>
571 <p>This option is ignored during <em>No</em> value of <strong>--GroupFusionApplyCutoff</strong> for <em>MultipleReferences</em>
572 <strong>-m, --mode</strong>.</p>
573 </dd>
574 <dt><strong><strong>-d, --detail</strong> <em>InfoLevel</em></strong></dt>
575 <dd>
576 <p>Level of information to print about lines being ignored. Default: <em>1</em>. Possible values:
577 <em>1, 2 or 3</em>.</p>
578 </dd>
579 <dt><strong><strong>-f, --fast</strong></strong></dt>
580 <dd>
581 <p>In this mode, fingerprints columns specified using <strong>--FingerprintsCol</strong> for reference and database
582 fingerprints <em>TextFile(s)</em>, and <strong>--FingerprintsField</strong> for reference and database fingerprints <em>SDFile(s)</em>
583 are assumed to contain valid fingerprints data and no checking is performed before performing similarity
584 and dissimilarity search. By default, fingerprints data is validated before computing pairwise similarity and
585 distance coefficients.</p>
586 </dd>
587 <dt><strong><strong>--FingerprintsMode</strong> <em>AutoDetect | FingerprintsBitVectorString | FingerprintsVectorString</em></strong></dt>
588 <dd>
589 <p>Format of fingerprint strings data in reference and database fingerprints <em>SD, FP, or Text (CSV/TSV)</em>
590 files: automatically detect format of fingerprints string created by MayaChemTools fingerprints
591 generation scripts or explicitly specify its format. Possible values: <em>AutoDetect | FingerprintsBitVectorString |
592 FingerprintsVectorString</em>. Default value: <em>AutoDetect</em>.</p>
593 </dd>
594 <dt><strong><strong>-g, --GroupFusionRule</strong> <em>Max, Min, Mean, Median, Sum, Euclidean</em></strong></dt>
595 <dd>
596 <p>Specify what group fusion [ Ref 94-97, Ref 100, Ref 105 ] rule to use for calculating similarity of
597 a database molecule against a set of reference molecules during <em>MultipleReferences</em> value of
598 similarity search <strong>-m, --mode</strong>. Possible values: <em>Max, Min, Mean, Median, Sum, Euclidean</em>. Default
599 value: <em>Max</em>. <em>Mean</em> value corresponds to average or arithmetic mean. The group fusion rule is
600 also referred to as data fusion of consensus scoring in the literature.</p>
601 <p>For a reference molecules set and a database molecule, let:</p>
602 <div class="OptionsBox">
603 N = Number of reference molecules in a set</div>
604 <div class="OptionsBox">
605 i = ith reference reference molecule in a set
606 <br/> n = Nth reference reference molecule in a set</div>
607 <div class="OptionsBox">
608 d = dth database molecule</div>
609 <div class="OptionsBox">
610 Crd = Fingerprints comparison value between rth reference and dth database
611 molecule - similarity/dissimilarity comparison using similarity or
612 distance coefficient</div>
613 <p>Then, various group fusion rules to calculate fused similarity between a database molecule and
614 reference molecules set are defined as follows:</p>
615 <p><strong>Max</strong>: MAX ( C1d, C2d, ..., Cid, ..., Cnd )</p>
616 <p><strong>Min</strong>: MIN ( C1d, C2d, ..., Cid, ..., Cnd )</p>
617 <p><strong>Mean</strong>: SUM ( C1d, C2d, ..., Cid, ..., Cnd ) / N</p>
618 <p><strong>Median</strong>: MEDIAN ( C1d, C2d, ..., Cid, ..., Cnd )</p>
619 <p><strong>Sum</strong>: SUM ( C1d, C2d, ..., Cid, ..., Cnd )</p>
620 <p><strong>Euclidean</strong>: SQRT( SUM( C1d ** 2, C2d ** 2, ..., Cid ** 2, ..., Cnd *** 2) )</p>
621 <p>The fingerprints bit-vector or vector string of each reference molecule in a set is compared
622 with a database molecule using a similarity or distance coefficient specified via <strong>-b,
623 --BitVectorComparisonMode</strong> or <strong>-v, --VectorComparisonMode</strong>. The reference molecules
624 whose comparison values with a database molecule fall outside specified <strong>--SimilarityCutoff</strong>
625 or <strong>--DistanceCutoff</strong> are ignored during <em>Yes</em> value of <strong>--GroupFusionApplyCutoff</strong>. The
626 specified <strong>-g, --GroupFusionRule</strong> is applied to <strong>-k, --kNN</strong> reference molecules to calculate
627 final fused similarity value between a database molecule and reference molecules set.</p>
628 <p>During dissimilarity search or usage of distance comparison coefficient in similarity search,
629 the meaning of fingerprints comaprison value is automatically reversed as shown below:</p>
630 <div class="OptionsBox">
631 SeachMode ComparisonCoefficient ComparisonValues</div>
632 <div class="OptionsBox">
633 Similarity SimilarityCoefficient Higher value imples high similarity
634 <br/> Similarity DistanceCoefficient Lower value implies high similarity</div>
635 <div class="OptionsBox">
636 Dissimilarity SimilarityCoefficient Lower value implies high
637 dissimilarity
638 <br/> Dissimilarity DistanceCoefficient Higher value implies high
639 dissimilarity</div>
640 <p>Consequently, <em>Max</em> implies highest and lowest comparison value for usage of similarity and
641 distance coefficient respectively during similarity search. And it corresponds to lowest and highest
642 comparison value for usage of similarity and distance coefficient respectively during dissimilarity
643 search. During <em>Min</em> fusion rule, the highest and lowest comparison values are appropriately
644 reversed.</p>
645 </dd>
646 <dt><strong><strong>--GroupFusionApplyCutoff</strong> <em>Yes | No</em></strong></dt>
647 <dd>
648 <p>Specify whether to apply <strong>--SimilarityCutoff</strong> or <strong>--DistanceCutoff</strong> values during application
649 of <strong>-g, --GroupFusionRule</strong> to reference molecules set. Possible values: <em>Yes or No</em>. Default
650 value: <em>Yes</em>.</p>
651 <p>During <em>Yes</em> value of <strong>--GroupFusionApplyCutoff</strong>, the reference molecules whose comparison
652 values with a database molecule fall outside specified <strong>--SimilarityCutoff</strong> or <strong>--DistanceCutoff</strong>
653 are not used to calculate final fused similarity value between a database molecule and reference
654 molecules set.</p>
655 </dd>
656 <dt><strong><strong>-h, --help</strong></strong></dt>
657 <dd>
658 <p>Print this help message.</p>
659 </dd>
660 <dt><strong><strong>--InDelim</strong> <em>comma | semicolon</em></strong></dt>
661 <dd>
662 <p>Input delimiter for reference and database fingerprints CSV <em>TextFile(s)</em>. Possible values:
663 <em>comma or semicolon</em>. Default value: <em>comma</em>. For TSV files, this option is ignored
664 and <em>tab</em> is used as a delimiter.</p>
665 </dd>
666 <dt><strong><strong>-k, --kNN</strong> <em>all | number</em></strong></dt>
667 <dd>
668 <p>Number of k-nearest neighbors (k-NN) reference molecules to use during <strong>-g, --GroupFusionRule</strong>
669 for calculating similarity of a database molecule against a set of reference molecules. Possible values:
670 <em>all | positive integers</em>. Default: <em>all</em>.</p>
671 <p>After ranking similarity values between a database molecule and reference molecules during
672 <em>MultipleReferences</em> value of similarity search <strong>-m, --mode</strong> option, a top <strong>-k, --KNN</strong> reference
673 molecule are selected and used during <strong>-g, --GroupFusionRule</strong>.</p>
674 <p>This option is <strong>-s, --SearchMode</strong> dependent: It corresponds to dissimilar molecules during
675 <em>DissimilaritySearch</em> value of <strong>-s, --SearchMode</strong> option.</p>
676 </dd>
677 <dt><strong><strong>-m, --mode</strong> <em>IndividualReference | MultipleReferences</em></strong></dt>
678 <dd>
679 <p>Specify how to treat reference molecules in <em>ReferenceFingerprintsFile</em> during similarity search:
680 Treat each reference molecule individually during similarity search or perform similarity
681 search by treating multiple reference molecules as a set. Possible values: <em>IndividualReference
682 | MultipleReferences</em>. Default value: <em>MultipleReferences</em>.</p>
683 <p>During <em>IndividualReference</em> value of <strong>-m, --Mode</strong> for similarity search, fingerprints bit-vector
684 or vector string of each reference molecule is compared with database molecules using specified
685 similarity or distance coefficients to identify most similar molecules for each reference molecule.
686 Based on value of <strong>--SimilarCountMode</strong>, upto <strong>--n, NumOfSimilarMolecules</strong> or <strong>-p,
687 --PercentSimilarMolecules</strong> at specified &lt;--SimilarityCutoff&gt; or <strong>--DistanceCutoff</strong> are
688 identified for each reference molecule.</p>
689 <p>During <em>MultipleReferences</em> value <strong>-m, --mode</strong> for similarity search, all reference molecules
690 are considered as a set and <strong>-g, --GroupFusionRule</strong> is used to calculate similarity of a database
691 molecule against reference molecules set either using all reference molecules or number of k-nearest
692 neighbors (k-NN) to a database molecule specified using <strong>-k, --kNN</strong>. The fingerprints bit-vector
693 or vector string of each reference molecule in a set is compared with a database molecule using
694 a similarity or distance coefficient specified via <strong>-b, --BitVectorComparisonMode</strong> or <strong>-v,
695 --VectorComparisonMode</strong>. The reference molecules whose comparison values with a database
696 molecule fall outside specified <strong>--SimilarityCutoff</strong> or <strong>--DistanceCutoff</strong> are ignored. The
697 specified <strong>-g, --GroupFusionRule</strong> is applied to rest of <strong>-k, --kNN</strong> reference molecules to calculate
698 final similarity value between a database molecule and reference molecules set.</p>
699 <p>The meaning of similarity and distance is automatically reversed during <em>DissimilaritySearch</em> value
700 of <strong>-s, --SearchMode</strong> along with appropriate handling of <strong>--SimilarityCutoff</strong> or
701 <strong>--DistanceCutoff</strong> values.</p>
702 </dd>
703 <dt><strong><strong>-n, --NumOfSimilarMolecules</strong> <em>number</em></strong></dt>
704 <dd>
705 <p>Maximum number of most similar database molecules to find for each reference molecule or set of
706 reference molecules based on <em>IndividualReference</em> or <em>MultipleReferences</em> value of similarity
707 search <strong>-m, --mode</strong> option. Default: <em>10</em>. Valid values: positive integers.</p>
708 <p>This option is ignored during <em>PercentSimilar</em> value of <strong>--SimilarCountMode</strong> option.</p>
709 <p>This option is <strong>-s, --SearchMode</strong> dependent: It corresponds to dissimilar molecules during
710 <em>DissimilaritySearch</em> value of <strong>-s, --SearchMode</strong> option.</p>
711 </dd>
712 <dt><strong><strong>--OutDelim</strong> <em>comma | tab | semicolon</em></strong></dt>
713 <dd>
714 <p>Delimiter for output CSV/TSV text file. Possible values: <em>comma, tab, or semicolon</em>
715 Default value: <em>comma</em>.</p>
716 </dd>
717 <dt><strong><strong>--output</strong> <em>SD | text | both</em></strong></dt>
718 <dd>
719 <p>Type of output files to generate. Possible values: <em>SD, text, or both</em>. Default value: <em>text</em>.</p>
720 </dd>
721 <dt><strong><strong>-o, --overwrite</strong></strong></dt>
722 <dd>
723 <p>Overwrite existing files</p>
724 </dd>
725 <dt><strong><strong>-p, --PercentSimilarMolecules</strong> <em>number</em></strong></dt>
726 <dd>
727 <p>Maximum percent of mosy similar database molecules to find for each reference molecule or set of
728 reference molecules based on <em>IndividualReference</em> or <em>MultipleReferences</em> value of similarity
729 search <strong>-m, --mode</strong> option. Default: <em>1</em> percent of database molecules. Valid values: non-zero values
730 in between <em>0 to 100</em>.</p>
731 <p>This option is ignored during <em>NumOfSimilar</em> value of <strong>--SimilarCountMode</strong> option.</p>
732 <p>During <em>PercentSimilar</em> value of <strong>--SimilarCountMode</strong> option, the number of molecules
733 in <em>DatabaseFingerprintsFile</em> is counted and number of similar molecules correspond to
734 <strong>--PercentSimilarMolecules</strong> of the total number of database molecules.</p>
735 <p>This option is <strong>-s, --SearchMode</strong> dependent: It corresponds to dissimilar molecules during
736 <em>DissimilaritySearch</em> value of <strong>-s, --SearchMode</strong> option.</p>
737 </dd>
738 <dt><strong><strong>--precision</strong> <em>number</em></strong></dt>
739 <dd>
740 <p>Precision of calculated similarity values for comparison and generating output files. Default: up to <em>2</em>
741 decimal places. Valid values: positive integers.</p>
742 </dd>
743 <dt><strong><strong>-q, --quote</strong> <em>Yes | No</em></strong></dt>
744 <dd>
745 <p>Put quote around column values in output CSV/TSV text file. Possible values:
746 <em>Yes or No</em>. Default value: <em>Yes</em>.</p>
747 </dd>
748 <dt><strong><strong>--ReferenceColMode</strong> <em>ColNum | ColLabel</em></strong></dt>
749 <dd>
750 <p>Specify how columns are identified in reference fingerprints <em>TextFile</em>: using column
751 number or column label. Possible values: <em>ColNum or ColLabel</em>. Default value: <em>ColNum</em>.</p>
752 </dd>
753 <dt><strong><strong>--ReferenceCompoundIDCol</strong> <em>col number | col name</em></strong></dt>
754 <dd>
755 <p>This value is <strong>--ReferenceColMode</strong> mode specific. It specifies column to use for retrieving compound
756 ID from reference fingerprints <em>TextFile</em> during similarity and dissimilarity search for output SD and CSV/TSV
757 text files. Possible values: <em>col number or col label</em>. Default value: <em>first column containing the word compoundID
758 in its column label or sequentially generated IDs</em>.</p>
759 </dd>
760 <dt><strong><strong>--ReferenceCompoundIDPrefix</strong> <em>text</em></strong></dt>
761 <dd>
762 <p>Specify compound ID prefix to use during sequential generation of compound IDs for reference fingerprints
763 <em>SDFile</em> and <em>TextFile</em>. Default value: <em>Cmpd</em>. The default value generates compound IDs which looks
764 like Cmpd&lt;Number&gt;.</p>
765 <p>For reference fingerprints <em>SDFile</em>, this value is only used during <em>LabelPrefix | MolNameOrLabelPrefix</em>
766 values of <strong>--ReferenceCompoundIDMode</strong> option; otherwise, it's ignored.</p>
767 <p>Examples for <em>LabelPrefix</em> or <em>MolNameOrLabelPrefix</em> value of <strong>--DatabaseCompoundIDMode</strong>:</p>
768 <div class="OptionsBox">
769 Compound</div>
770 <p>The values specified above generates compound IDs which correspond to Compound&lt;Number&gt;
771 instead of default value of Cmpd&lt;Number&gt;.</p>
772 </dd>
773 <dt><strong><strong>--ReferenceCompoundIDField</strong> <em>DataFieldName</em></strong></dt>
774 <dd>
775 <p>Specify reference fingerprints <em>SDFile</em> datafield label for generating compound IDs.
776 This value is only used during <em>DataField</em> value of <strong>--ReferenceCompoundIDMode</strong> option.</p>
777 <p>Examples for <em>DataField</em> value of <strong>--ReferenceCompoundIDMode</strong>:</p>
778 <div class="OptionsBox">
779 MolID
780 <br/> ExtReg</div>
781 </dd>
782 <dt><strong><strong>--ReferenceCompoundIDMode</strong> <em>DataField | MolName | LabelPrefix | MolNameOrLabelPrefix</em></strong></dt>
783 <dd>
784 <p>Specify how to generate compound IDs from reference fingerprints <em>SDFile</em> during similarity and
785 dissimilarity search for output SD and CSV/TSV text files: use a <em>SDFile</em> datafield value; use
786 molname line from <em>SDFile</em>; generate a sequential ID with specific prefix; use combination of both
787 MolName and LabelPrefix with usage of LabelPrefix values for empty molname lines.</p>
788 <p>Possible values: <em>DataField | MolName | LabelPrefix | MolNameOrLabelPrefix</em>.
789 Default: <em>LabelPrefix</em>.</p>
790 <p>For <em>MolNameAndLabelPrefix</em> value of <strong>--ReferenceCompoundIDMode</strong>, molname line in <em>SDFiles</em>
791 takes precedence over sequential compound IDs generated using <em>LabelPrefix</em> and only empty molname
792 values are replaced with sequential compound IDs.</p>
793 </dd>
794 <dt><strong><strong>--ReferenceFingerprintsCol</strong> <em>col number | col name</em></strong></dt>
795 <dd>
796 <p>This value is <strong>--ReferenceColMode</strong> specific. It specifies fingerprints column to use during similarity
797 and dissimilarity search for reference fingerprints <em>TextFile</em>. Possible values: <em>col number or col label</em>.
798 Default value: <em>first column containing the word Fingerprints in its column label</em>.</p>
799 </dd>
800 <dt><strong><strong>--ReferenceFingerprintsField</strong> <em>FieldLabel</em></strong></dt>
801 <dd>
802 <p>Fingerprints field label to use during similarity and dissimilarity search for reference fingerprints <em>SDFile</em>.
803 Default value: <em>first data field label containing the word Fingerprints in its label</em></p>
804 </dd>
805 <dt><strong><strong>-r, --root</strong> <em>RootName</em></strong></dt>
806 <dd>
807 <p>New file name is generated using the root: &lt;Root&gt;.&lt;Ext&gt;. Default for new file name:
808 &lt;ReferenceFileName&gt;SimilaritySearching.&lt;Ext&gt;. The output file type determines &lt;Ext&gt;
809 value. The sdf, csv, and tsv &lt;Ext&gt; values are used for SD, comma/semicolon, and tab delimited
810 text files respectively.</p>
811 </dd>
812 <dt><strong><strong>-s, --SearchMode</strong> <em>SimilaritySearch | DissimilaritySearch</em></strong></dt>
813 <dd>
814 <p>Specify how to find molecules from database molecules for individual reference molecules or
815 set of reference molecules: Find similar molecules or dissimilar molecules from database molecules.
816 Possible values: <em>SimilaritySearch | DissimilaritySearch</em>. Default value: <em>SimilaritySearch</em>.</p>
817 <p>During <em>DissimilaritySearch</em> value of <strong>-s, --SearchMode</strong> option, the meaning of the following
818 options is switched and they correspond to dissimilar molecules instead of similar molecules:
819 <strong>--SimilarCountMode</strong>, <strong>-n, --NumOfSimilarMolecules</strong>, <strong>--PercentSimilarMolecules</strong>,
820 <strong>-k, --kNN</strong>.</p>
821 </dd>
822 <dt><strong><strong>--SimilarCountMode</strong> <em>NumOfSimilar | PercentSimilar</em></strong></dt>
823 <dd>
824 <p>Specify method used to count similar molecules found from database molecules for individual
825 reference molecules or set of reference molecules: Find number of similar molecules or percent
826 of similar molecules from database molecules. Possible values: <em>NumOfSimilar | PercentSimilar</em>.
827 Default value: <em>NumOfSimilar</em>.</p>
828 <p>The values for number of similar molecules and percent similar molecules are specified
829 using options <strong>-n, NumOfSimilarMolecule</strong> and <strong>--PercentSimilarMolecules</strong>.</p>
830 <p>This option is <strong>-s, --SearchMode</strong> dependent: It corresponds to dissimilar molecules during
831 <em>DissimilaritySearch</em> value of <strong>-s, --SearchMode</strong> option.</p>
832 </dd>
833 <dt><strong><strong>--SimilarityCutoff</strong> <em>number</em></strong></dt>
834 <dd>
835 <p>Similarity cutoff value to use during comparison of similarity value between a pair of database
836 and reference molecules calculated by similarity comparison methods for fingerprints bit-vector
837 vector strings data values. Possible values: <em>Any valid number</em>. Default value: <em>0.75</em>.</p>
838 <p>The comparison value between a pair of database and reference molecule must meet the cutoff
839 criterion as shown below:</p>
840 <div class="OptionsBox">
841 SeachMode CutoffCriterion ComparisonValues</div>
842 <div class="OptionsBox">
843 Similarity &gt;= Higher value implies high similarity
844 <br/> Dissimilarity &lt;= Lower value implies high dissimilarity</div>
845 <p>This option is ignored during <em>No</em> value of <strong>--GroupFusionApplyCutoff</strong> for <em>MultipleReferences</em>
846 <strong>-m, --mode</strong>.</p>
847 <p>This option is <strong>-s, --SearchMode</strong> dependent: It corresponds to dissimilar molecules during
848 <em>DissimilaritySearch</em> value of <strong>-s, --SearchMode</strong> option.</p>
849 </dd>
850 <dt><strong><strong>-v, --VectorComparisonMode</strong> <em>SupportedSimilarityName | SupportedDistanceName</em></strong></dt>
851 <dd>
852 <p>Specify what similarity or distance coefficient to use for calculating similarity between fingerprint
853 vector strings data values in <em>ReferenceFingerprintsFile</em> and <em>DatabaseFingerprintsFile</em> during
854 similarity search. Possible values: <em>TanimotoSimilairy | ... | ManhattanDistance | ...</em>. Default
855 value: <em>TanimotoSimilarity</em>.</p>
856 <p>The value of <strong>-v, --VectorComparisonMode</strong>, in conjunction with <strong>--VectorComparisonFormulism</strong>,
857 decides which type of similarity and distance coefficient formulism gets used.</p>
858 <p>The current releases supports the following similarity and distance coefficients: <em>CosineSimilarity,
859 CzekanowskiSimilarity, DiceSimilarity, OchiaiSimilarity, JaccardSimilarity, SorensonSimilarity, TanimotoSimilarity,
860 CityBlockDistance, EuclideanDistance, HammingDistance, ManhattanDistance, SoergelDistance</em>. These
861 similarity and distance coefficients are described below.</p>
862 <p><strong>FingerprintsVector.pm</strong> module, used to calculate similarity and distance coefficients,
863 provides support to perform comparison between vectors containing three different types of
864 values:</p>
865 <p>Type I: OrderedNumericalValues</p>
866 <div class="OptionsBox">
867 . Size of two vectors are same
868 <br/> . Vectors contain real values in a specific order. For example: MACCS keys
869 count, Topological pharmnacophore atom pairs and so on.</div>
870 <p>Type II: UnorderedNumericalValues</p>
871 <div class="OptionsBox">
872 . Size of two vectors might not be same
873 <br/> . Vectors contain unordered real value identified by value IDs. For example:
874 Toplogical atom pairs, Topological atom torsions and so on</div>
875 <p>Type III: AlphaNumericalValues</p>
876 <div class="OptionsBox">
877 . Size of two vectors might not be same
878 <br/> . Vectors contain unordered alphanumerical values. For example: Extended
879 connectivity fingerprints, atom neighborhood fingerprints.</div>
880 <p>Before performing similarity or distance calculations between vectors containing UnorderedNumericalValues
881 or AlphaNumericalValues, the vectors are transformed into vectors containing unique OrderedNumericalValues
882 using value IDs for UnorderedNumericalValues and values itself for AlphaNumericalValues.</p>
883 <p>Three forms of similarity and distance calculation between two vectors, specified using <strong>--VectorComparisonFormulism</strong>
884 option, are supported: <em>AlgebraicForm, BinaryForm or SetTheoreticForm</em>.</p>
885 <p>For <em>BinaryForm</em>, the ordered list of processed final vector values containing the value or
886 count of each unique value type is simply converted into a binary vector containing 1s and 0s
887 corresponding to presence or absence of values before calculating similarity or distance between
888 two vectors.</p>
889 <p>For two fingerprint vectors A and B of same size containing OrderedNumericalValues, let:</p>
890 <div class="OptionsBox">
891 N = Number values in A or B</div>
892 <div class="OptionsBox">
893 Xa = Values of vector A
894 <br/> Xb = Values of vector B</div>
895 <div class="OptionsBox">
896 Xai = Value of ith element in A
897 <br/> Xbi = Value of ith element in B</div>
898 <div class="OptionsBox">
899 SUM = Sum of i over N values</div>
900 <p>For SetTheoreticForm of calculation between two vectors, let:</p>
901 <div class="OptionsBox">
902 SetIntersectionXaXb = SUM ( MIN ( Xai, Xbi ) )
903 <br/> SetDifferenceXaXb = SUM ( Xai ) + SUM ( Xbi ) - SUM ( MIN ( Xai, Xbi ) )</div>
904 <p>For BinaryForm of calculation between two vectors, let:</p>
905 <div class="OptionsBox">
906 Na = Number of bits set to &quot;1&quot; in A = SUM ( Xai )
907 <br/> Nb = Number of bits set to &quot;1&quot; in B = SUM ( Xbi )
908 <br/> Nc = Number of bits set to &quot;1&quot; in both A and B = SUM ( Xai * Xbi )
909 <br/> Nd = Number of bits set to &quot;0&quot; in both A and B
910 = SUM ( 1 - Xai - Xbi + Xai * Xbi)</div>
911 <div class="OptionsBox">
912 N = Number of bits set to &quot;1&quot; or &quot;0&quot; in A or B = Size of A or B = Na + Nb - Nc + Nd</div>
913 <p>Additionally, for BinaryForm various values also correspond to:</p>
914 <div class="OptionsBox">
915 Na = | Xa |
916 <br/> Nb = | Xb |
917 <br/> Nc = | SetIntersectionXaXb |
918 <br/> Nd = N - | SetDifferenceXaXb |</div>
919 <div class="OptionsBox">
920 | SetDifferenceXaXb | = N - Nd = Na + Nb - Nc + Nd - Nd = Na + Nb - Nc
921 = | Xa | + | Xb | - | SetIntersectionXaXb |</div>
922 <p>Various similarity and distance coefficients [ Ref 40, Ref 62, Ref 64 ] for a pair of vectors A and B
923 in <em>AlgebraicForm, BinaryForm and SetTheoreticForm</em> are defined as follows:</p>
924 <p><strong>CityBlockDistance</strong>: ( same as HammingDistance and ManhattanDistance)</p>
925 <p><em>AlgebraicForm</em>: SUM ( ABS ( Xai - Xbi ) )</p>
926 <p><em>BinaryForm</em>: ( Na - Nc ) + ( Nb - Nc ) = Na + Nb - 2 * Nc</p>
927 <p><em>SetTheoreticForm</em>: | SetDifferenceXaXb | - | SetIntersectionXaXb | = SUM ( Xai ) + SUM ( Xbi ) - 2 * ( SUM ( MIN ( Xai, Xbi ) ) )</p>
928 <p><strong>CosineSimilarity</strong>: ( same as OchiaiSimilarityCoefficient)</p>
929 <p><em>AlgebraicForm</em>: SUM ( Xai * Xbi ) / SQRT ( SUM ( Xai ** 2) * SUM ( Xbi ** 2) )</p>
930 <p><em>BinaryForm</em>: Nc / SQRT ( Na * Nb)</p>
931 <p><em>SetTheoreticForm</em>: | SetIntersectionXaXb | / SQRT ( |Xa| * |Xb| ) = SUM ( MIN ( Xai, Xbi ) ) / SQRT ( SUM ( Xai ) * SUM ( Xbi ) )</p>
932 <p><strong>CzekanowskiSimilarity</strong>: ( same as DiceSimilarity and SorensonSimilarity)</p>
933 <p><em>AlgebraicForm</em>: ( 2 * ( SUM ( Xai * Xbi ) ) ) / ( SUM ( Xai ** 2) + SUM ( Xbi **2 ) )</p>
934 <p><em>BinaryForm</em>: 2 * Nc / ( Na + Nb )</p>
935 <p><em>SetTheoreticForm</em>: 2 * | SetIntersectionXaXb | / ( |Xa| + |Xb| ) = 2 * ( SUM ( MIN ( Xai, Xbi ) ) ) / ( SUM ( Xai ) + SUM ( Xbi ) )</p>
936 <p><strong>DiceSimilarity</strong>: ( same as CzekanowskiSimilarity and SorensonSimilarity)</p>
937 <p><em>AlgebraicForm</em>: ( 2 * ( SUM ( Xai * Xbi ) ) ) / ( SUM ( Xai ** 2) + SUM ( Xbi **2 ) )</p>
938 <p><em>BinaryForm</em>: 2 * Nc / ( Na + Nb )</p>
939 <p><em>SetTheoreticForm</em>: 2 * | SetIntersectionXaXb | / ( |Xa| + |Xb| ) = 2 * ( SUM ( MIN ( Xai, Xbi ) ) ) / ( SUM ( Xai ) + SUM ( Xbi ) )</p>
940 <p><strong>EuclideanDistance</strong>:</p>
941 <p><em>AlgebraicForm</em>: SQRT ( SUM ( ( ( Xai - Xbi ) ** 2 ) ) )</p>
942 <p><em>BinaryForm</em>: SQRT ( ( Na - Nc ) + ( Nb - Nc ) ) = SQRT ( Na + Nb - 2 * Nc )</p>
943 <p><em>SetTheoreticForm</em>: SQRT ( | SetDifferenceXaXb | - | SetIntersectionXaXb | ) = SQRT ( SUM ( Xai ) + SUM ( Xbi ) - 2 * ( SUM ( MIN ( Xai, Xbi ) ) ) )</p>
944 <p><strong>HammingDistance</strong>: ( same as CityBlockDistance and ManhattanDistance)</p>
945 <p><em>AlgebraicForm</em>: SUM ( ABS ( Xai - Xbi ) )</p>
946 <p><em>BinaryForm</em>: ( Na - Nc ) + ( Nb - Nc ) = Na + Nb - 2 * Nc</p>
947 <p><em>SetTheoreticForm</em>: | SetDifferenceXaXb | - | SetIntersectionXaXb | = SUM ( Xai ) + SUM ( Xbi ) - 2 * ( SUM ( MIN ( Xai, Xbi ) ) )</p>
948 <p><strong>JaccardSimilarity</strong>: ( same as TanimotoSimilarity)</p>
949 <p><em>AlgebraicForm</em>: SUM ( Xai * Xbi ) / ( SUM ( Xai ** 2 ) + SUM ( Xbi ** 2 ) - SUM ( Xai * Xbi ) )</p>
950 <p><em>BinaryForm</em>: Nc / ( ( Na - Nc ) + ( Nb - Nc ) + Nc ) = Nc / ( Na + Nb - Nc )</p>
951 <p><em>SetTheoreticForm</em>: | SetIntersectionXaXb | / | SetDifferenceXaXb | = SUM ( MIN ( Xai, Xbi ) ) / ( SUM ( Xai ) + SUM ( Xbi ) - SUM ( MIN ( Xai, Xbi ) ) )</p>
952 <p><strong>ManhattanDistance</strong>: ( same as CityBlockDistance and HammingDistance)</p>
953 <p><em>AlgebraicForm</em>: SUM ( ABS ( Xai - Xbi ) )</p>
954 <p><em>BinaryForm</em>: ( Na - Nc ) + ( Nb - Nc ) = Na + Nb - 2 * Nc</p>
955 <p><em>SetTheoreticForm</em>: | SetDifferenceXaXb | - | SetIntersectionXaXb | = SUM ( Xai ) + SUM ( Xbi ) - 2 * ( SUM ( MIN ( Xai, Xbi ) ) )</p>
956 <p><strong>OchiaiSimilarity</strong>: ( same as CosineSimilarity)</p>
957 <p><em>AlgebraicForm</em>: SUM ( Xai * Xbi ) / SQRT ( SUM ( Xai ** 2) * SUM ( Xbi ** 2) )</p>
958 <p><em>BinaryForm</em>: Nc / SQRT ( Na * Nb)</p>
959 <p><em>SetTheoreticForm</em>: | SetIntersectionXaXb | / SQRT ( |Xa| * |Xb| ) = SUM ( MIN ( Xai, Xbi ) ) / SQRT ( SUM ( Xai ) * SUM ( Xbi ) )</p>
960 <p><strong>SorensonSimilarity</strong>: ( same as CzekanowskiSimilarity and DiceSimilarity)</p>
961 <p><em>AlgebraicForm</em>: ( 2 * ( SUM ( Xai * Xbi ) ) ) / ( SUM ( Xai ** 2) + SUM ( Xbi **2 ) )</p>
962 <p><em>BinaryForm</em>: 2 * Nc / ( Na + Nb )</p>
963 <p><em>SetTheoreticForm</em>: 2 * | SetIntersectionXaXb | / ( |Xa| + |Xb| ) = 2 * ( SUM ( MIN ( Xai, Xbi ) ) ) / ( SUM ( Xai ) + SUM ( Xbi ) )</p>
964 <p><strong>SoergelDistance</strong>:</p>
965 <p><em>AlgebraicForm</em>: SUM ( ABS ( Xai - Xbi ) ) / SUM ( MAX ( Xai, Xbi ) )</p>
966 <p><em>BinaryForm</em>: 1 - Nc / ( Na + Nb - Nc ) = ( Na + Nb - 2 * Nc ) / ( Na + Nb - Nc )</p>
967 <p><em>SetTheoreticForm</em>: ( | SetDifferenceXaXb | - | SetIntersectionXaXb | ) / | SetDifferenceXaXb | = ( SUM ( Xai ) + SUM ( Xbi ) - 2 * ( SUM ( MIN ( Xai, Xbi ) ) ) ) / ( SUM ( Xai ) + SUM ( Xbi ) - SUM ( MIN ( Xai, Xbi ) ) )</p>
968 <p><strong>TanimotoSimilarity</strong>: ( same as JaccardSimilarity)</p>
969 <p><em>AlgebraicForm</em>: SUM ( Xai * Xbi ) / ( SUM ( Xai ** 2 ) + SUM ( Xbi ** 2 ) - SUM ( Xai * Xbi ) )</p>
970 <p><em>BinaryForm</em>: Nc / ( ( Na - Nc ) + ( Nb - Nc ) + Nc ) = Nc / ( Na + Nb - Nc )</p>
971 <p><em>SetTheoreticForm</em>: | SetIntersectionXaXb | / | SetDifferenceXaXb | = SUM ( MIN ( Xai, Xbi ) ) / ( SUM ( Xai ) + SUM ( Xbi ) - SUM ( MIN ( Xai, Xbi ) ) )</p>
972 </dd>
973 <dt><strong><strong>--VectorComparisonFormulism</strong> <em>AlgebraicForm | BinaryForm | SetTheoreticForm</em></strong></dt>
974 <dd>
975 <p>Specify fingerprints vector comparison formulism to use for calculation similarity and distance
976 coefficients during <strong>-v, --VectorComparisonMode</strong>. Possible values: <em>AlgebraicForm | BinaryForm |
977 SetTheoreticForm</em>. Default value: <em>AlgebraicForm</em>.</p>
978 <p>For fingerprint vector strings containing <strong>AlphaNumericalValues</strong> data values - <strong>ExtendedConnectivityFingerprints</strong>,
979 <strong>AtomNeighborhoodsFingerprints</strong> and so on - all three formulism result in same value during similarity and distance
980 calculations.</p>
981 </dd>
982 <dt><strong><strong>-w, --WorkingDir</strong> <em>DirName</em></strong></dt>
983 <dd>
984 <p>Location of working directory. Default: current directory.</p>
985 </dd>
986 </dl>
987 <p>
988 </p>
989 <h2>EXAMPLES</h2>
990 <p>To perform similarity search using Tanimoto coefficient by treating all reference molecules as a set
991 to find 10 most similar database molecules with application of Max group fusion rule and similarity
992 cutoff to supported fingerprints strings data in SD fingerprints files present in a data fields with
993 Fingerprint substring in their labels, and create a ReferenceFPHexSimilaritySearching.csv file containing
994 sequentially generated database compound IDs with Cmpd prefix, type:</p>
995 <div class="ExampleBox">
996 % SimilaritySearchingFingerprints.pl -o ReferenceSampleFPHex.sdf
997 DatabaseSampleFPHex.sdf</div>
998 <p>To perform similarity search using Tanimoto coefficient by treating all reference molecules as a set
999 to find 10 most similar database molecules with application of Max group fusion rule and similarity
1000 cutoff to supported fingerprints strings data in FP fingerprints files, and create a
1001 SimilaritySearchResults.csv file containing database compound IDs retireved from FP file, type:</p>
1002 <div class="ExampleBox">
1003 % SimilaritySearchingFingerprints.pl -r SimilaritySearchResults -o
1004 ReferenceSampleFPBin.fpf DatabaseSampleFPBin.fpf</div>
1005 <p>To perform similarity search using Tanimoto coefficient by treating all reference molecules as a set
1006 to find 10 most similar database database molecules with application of Max group fusion rule and
1007 similarity cutoff to supported fingerprints strings data in text fingerprints files present in a column
1008 names containing Fingerprint substring in their names, and create a ReferenceFPHexSimilaritySearching.csv
1009 file containing database compound IDs retireved column name containing CompoundID substring or
1010 sequentially generated compound IDs, type:</p>
1011 <div class="ExampleBox">
1012 % SimilaritySearchingFingerprints.pl -o ReferenceSampleFPCount.csv
1013 DatabaseSampleFPCount.csv</div>
1014 <p>To perform similarity search using Tanimoto coefficient by treating reference molecules as individual molecules
1015 to find 10 most similar database molecules for each reference molecule with application of similarity cutoff to
1016 supported fingerprints strings data in SD fingerprints files present in a data fields with Fingerprint substring
1017 in their labels, and create a ReferenceFPHexSimilaritySearching.csv file containing sequentially generated
1018 reference and database compound IDs with Cmpd prefix, type:</p>
1019 <div class="ExampleBox">
1020 % SimilaritySearchingFingerprints.pl -mode IndividualReference -o
1021 ReferenceSampleFPHex.sdf DatabaseSampleFPHex.sdf</div>
1022 <p>To perform similarity search using Tanimoto coefficient by treating reference molecules as individual molecules
1023 to find 10 most similar database molecules for each reference molecule with application of similarity cutoff to
1024 supported fingerprints strings data in FP fingerprints files, and create a ReferenceFPHexSimilaritySearching.csv
1025 file containing references and database compound IDs retireved from FP file, type:</p>
1026 <div class="ExampleBox">
1027 % SimilaritySearchingFingerprints.pl -mode IndividualReference -o
1028 ReferenceSampleFPHex.fpf DatabaseSampleFPHex.fpf</div>
1029 <p>To perform similarity search using Tanimoto coefficient by treating reference molecules as individual molecules
1030 to find 10 most similar database molecules for each reference molecule with application of similarity cutoff to
1031 supported fingerprints strings data in text fingerprints files present in a column names containing Fingerprint
1032 substring in their names, and create a ReferenceFPHexSimilaritySearching.csv file containing reference and
1033 database compound IDs retrieved column name containing CompoundID substring or sequentially generated
1034 compound IDs, type:</p>
1035 <div class="ExampleBox">
1036 % SimilaritySearchingFingerprints.pl -mode IndividualReference -o
1037 ReferenceSampleFPHex.csv DatabaseSampleFPHex.csv</div>
1038 <p>To perform dissimilarity search using Tanimoto coefficient by treating all reference molecules as a set
1039 to find 10 most dissimilar database molecules with application of Max group fusion rule and similarity
1040 cutoff to supported fingerprints strings data in SD fingerprints files present in a data fields with
1041 Fingerprint substring in their labels, and create a ReferenceFPHexSimilaritySearching.csv file containing
1042 sequentially generated database compound IDs with Cmpd prefix, type:</p>
1043 <div class="ExampleBox">
1044 % SimilaritySearchingFingerprints.pl --mode MultipleReferences --SearchMode
1045 DissimilaritySearch -o ReferenceSampleFPHex.sdf DatabaseSampleFPHex.sdf</div>
1046 <p>To perform similarity search using CityBlock distance by treating reference molecules as individual molecules
1047 to find 10 most similar database molecules for each reference molecule with application of distance cutoff
1048 to supported vector fingerprints strings data in SD fingerprints files present in a data fields with Fingerprint
1049 substring in their labels, and create a ReferenceFPHexSimilaritySearching.csv file containing sequentially generated
1050 reference and database compound IDs with Cmpd prefix, type:</p>
1051 <div class="ExampleBox">
1052 % SimilaritySearchingFingerprints.pl -mode IndividualReference
1053 --VectorComparisonMode CityBlockDistance --VectorComparisonFormulism
1054 AlgebraicForm --DistanceCutoff 10 -o
1055 ReferenceSampleFPCount.sdf DatabaseSampleFPCount.sdf</div>
1056 <p>To perform similarity search using Tanimoto coefficient by treating all reference molecules as a set
1057 to find 100 most similar database molecules with application of Mean group fusion rule to to top 10
1058 reference molecules with in similarity cutoff of 0.75 to supported fingerprints strings data in FP fingerprints
1059 files, and create a ReferenceFPHexSimilaritySearching.csv file containing database compound IDs retrieved
1060 from FP file, type:</p>
1061 <div class="ExampleBox">
1062 % SimilaritySearchingFingerprints.pl --mode MultipleReferences --SearchMode
1063 SimilaritySearch --BitVectorComparisonMode TanimotoSimilarity
1064 --GroupFusionRule Mean --GroupFusionApplyCutoff Yes --kNN 10
1065 --SimilarityCutoff 0.75 --SimilarCountMode NumOfSimilar
1066 --NumOfSimilarMolecules 100 -o
1067 ReferenceSampleFPHex.fpf DatabaseSampleFPHex.fpf</div>
1068 <p>To perform similarity search using Tanimoto coefficient by treating reference molecules as individual molecules
1069 to find 2 percent of most similar database molecules for each reference molecule with application of similarity
1070 cutoff of 0.85 to supported fingerprints strings data in text fingerprints files present in specific columns and
1071 create a ReferenceFPHexSimilaritySearching.csv file containing reference and database compoundIDs retrieved
1072 from specific columns, type:</p>
1073 <div class="ExampleBox">
1074 % SimilaritySearchingFingerprints.pl --mode IndividualReference --SearchMode
1075 SimilaritySearch --BitVectorComparisonMode TanimotoSimilarity
1076 --ReferenceColMode ColLabel --ReferenceFingerprintsCol Fingerprints
1077 --ReferenceCompoundIDCol CompoundID --DatabaseColMode Collabel
1078 --DatabaseCompoundIDCol CompoundID --DatabaseFingerprintsCol
1079 Fingerprints --SimilarityCutoff 0.85 --SimilarCountMode PercentSimilar
1080 --PercentSimilarMolecules 2 -o
1081 ReferenceSampleFPHex.csv DatabaseSampleFPHex.csv</div>
1082 <p>To perform similarity search using Tanimoto coefficient by treating reference molecules as individual molecules
1083 to find top 50 most similar database molecules for each reference molecule with application of similarity
1084 cutoff of 0.85 to supported fingerprints strings data in SD fingerprints files present in specific data fields and
1085 create both ReferenceFPHexSimilaritySearching.csv and ReferenceFPHexSimilaritySearching.sdf files containing
1086 reference and database compoundIDs retrieved from specific data fields, type:</p>
1087 <div class="ExampleBox">
1088 % SimilaritySearchingFingerprints.pl --mode IndividualReference --SearchMode
1089 SimilaritySearch --BitVectorComparisonMode TanimotoSimilarity
1090 --ReferenceFingerprintsField Fingerprints
1091 --DatabaseFingerprintsField Fingerprints
1092 --ReferenceCompoundIDMode DataField --ReferenceCompoundIDField CmpdID
1093 --DatabaseCompoundIDMode DataField --DatabaseCompoundIDField CmpdID
1094 --SimilarityCutoff 0.85 --SimilarCountMode NumOfSimilar
1095 --NumOfSimilarMolecules 50 --output both -o
1096 ReferenceSampleFPHex.sdf DatabaseSampleFPHex.sdf</div>
1097 <p>To perform similarity search using Tanimoto coefficient by treating reference molecules as individual molecules
1098 to find 1 percent of most similar database molecules for each reference molecule with application of similarity
1099 cutoff to supported fingerprints strings data in SD fingerprints files present in specific data field labels, and create
1100 both ReferenceFPHexSimilaritySearching.csv ReferenceFPHexSimilaritySearching.sdf files containing reference and
1101 database compound IDs retrieved from specific data field labels along with other specific data for database
1102 molecules, type:</p>
1103 <div class="ExampleBox">
1104 % SimilaritySearchingFingerprints.pl --mode IndividualReference --SearchMode
1105 SimilaritySearch --BitVectorComparisonMode TanimotoSimilarity
1106 --ReferenceFingerprintsField Fingerprints
1107 --DatabaseFingerprintsField Fingerprints
1108 --ReferenceCompoundIDMode DataField --ReferenceCompoundIDField CmpdID
1109 --DatabaseCompoundIDMode DataField --DatabaseCompoundIDField CmpdID
1110 --DatabaseDataFieldsMode Specify --DatabaseDataFields &quot;TPSA,SLogP&quot;
1111 --SimilarityCutoff 0.75 --SimilarCountMode PercentSimilar
1112 --PercentSimilarMolecules 1 --output both --OutDelim comma --quote Yes
1113 --precision 3 -o ReferenceSampleFPHex.sdf DatabaseSampleFPHex.sdf</div>
1114 <p>
1115 </p>
1116 <h2>AUTHOR</h2>
1117 <p><a href="mailto:msud@san.rr.com">Manish Sud</a></p>
1118 <p>
1119 </p>
1120 <h2>SEE ALSO</h2>
1121 <p><a href="./InfoFingerprintsFiles.html">InfoFingerprintsFiles.pl</a>,&nbsp<a href="./SimilarityMatricesFingerprints.html">SimilarityMatricesFingerprints.pl</a>,&nbsp<a href="./AtomNeighborhoodsFingerprints.html">AtomNeighborhoodsFingerprints.pl</a>,&nbsp
1122 <a href="./ExtendedConnectivityFingerprints.html">ExtendedConnectivityFingerprints.pl</a>,&nbsp<a href="./MACCSKeysFingerprints.html">MACCSKeysFingerprints.pl</a>,&nbsp<a href="./PathLengthFingerprints.html">PathLengthFingerprints.pl</a>,&nbsp
1123 <a href="./TopologicalAtomPairsFingerprints.html">TopologicalAtomPairsFingerprints.pl</a>,&nbsp<a href="./TopologicalAtomTorsionsFingerprints.html">TopologicalAtomTorsionsFingerprints.pl</a>,&nbsp
1124 <a href="./TopologicalPharmacophoreAtomPairsFingerprints.html">TopologicalPharmacophoreAtomPairsFingerprints.pl</a>,&nbsp<a href="./TopologicalPharmacophoreAtomTripletsFingerprints.html">TopologicalPharmacophoreAtomTripletsFingerprints.pl</a>
1125 </p>
1126 <p>
1127 </p>
1128 <h2>COPYRIGHT</h2>
1129 <p>Copyright (C) 2015 Manish Sud. All rights reserved.</p>
1130 <p>This file is part of MayaChemTools.</p>
1131 <p>MayaChemTools is free software; you can redistribute it and/or modify it under
1132 the terms of the GNU Lesser General Public License as published by the Free
1133 Software Foundation; either version 3 of the License, or (at your option)
1134 any later version.</p>
1135 <p>&nbsp</p><p>&nbsp</p><div class="DocNav">
1136 <table width="100%" border=0 cellpadding=0 cellspacing=2>
1137 <tr align="left" valign="top"><td width="33%" align="left"><a href="./SimilarityMatricesFingerprints.html" title="SimilarityMatricesFingerprints.html">Previous</a>&nbsp;&nbsp;<a href="./index.html" title="Table of Contents">TOC</a>&nbsp;&nbsp;<a href="./SortSDFiles.html" title="SortSDFiles.html">Next</a></td><td width="34%" align="middle"><strong>March 29, 2015</strong></td><td width="33%" align="right"><strong>SimilaritySearchingFingerprints.pl</strong></td></tr>
1138 </table>
1139 </div>
1140 <br />
1141 <center>
1142 <img src="../../images/h2o2.png">
1143 </center>
1144 </body>
1145 </html>