Mercurial > repos > deepakjadmin > mayatool3_test3
view mayachemtools/docs/scripts/man1/SimilarityMatricesFingerprints.1 @ 0:73ae111cf86f draft
Uploaded
author | deepakjadmin |
---|---|
date | Wed, 20 Jan 2016 11:55:01 -0500 |
parents | |
children |
line wrap: on
line source
.\" Automatically generated by Pod::Man 2.25 (Pod::Simple 3.22) .\" .\" Standard preamble: .\" ======================================================================== .de Sp \" Vertical space (when we can't use .PP) .if t .sp .5v .if n .sp .. .de Vb \" Begin verbatim text .ft CW .nf .ne \\$1 .. .de Ve \" End verbatim text .ft R .fi .. .\" Set up some character translations and predefined strings. \*(-- will .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left .\" double quote, and \*(R" will give a right double quote. \*(C+ will .\" give a nicer C++. Capital omega is used to do unbreakable dashes and .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, .\" nothing in troff, for use with C<>. .tr \(*W- .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' .ie n \{\ . ds -- \(*W- . ds PI pi . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch . ds L" "" . ds R" "" . ds C` "" . ds C' "" 'br\} .el\{\ . ds -- \|\(em\| . ds PI \(*p . ds L" `` . ds R" '' 'br\} .\" .\" Escape single quotes in literal strings from groff's Unicode transform. .ie \n(.g .ds Aq \(aq .el .ds Aq ' .\" .\" If the F register is turned on, we'll generate index entries on stderr for .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index .\" entries marked with X<> in POD. Of course, you'll have to process the .\" output yourself in some meaningful fashion. .ie \nF \{\ . de IX . tm Index:\\$1\t\\n%\t"\\$2" .. . nr % 0 . rr F .\} .el \{\ . de IX .. .\} .\" .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). .\" Fear. Run. Save yourself. No user-serviceable parts. . \" fudge factors for nroff and troff .if n \{\ . ds #H 0 . ds #V .8m . ds #F .3m . ds #[ \f1 . ds #] \fP .\} .if t \{\ . ds #H ((1u-(\\\\n(.fu%2u))*.13m) . ds #V .6m . ds #F 0 . ds #[ \& . ds #] \& .\} . \" simple accents for nroff and troff .if n \{\ . ds ' \& . ds ` \& . ds ^ \& . ds , \& . ds ~ ~ . ds / .\} .if t \{\ . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' .\} . \" troff and (daisy-wheel) nroff accents .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' .ds 8 \h'\*(#H'\(*b\h'-\*(#H' .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] .ds ae a\h'-(\w'a'u*4/10)'e .ds Ae A\h'-(\w'A'u*4/10)'E . \" corrections for vroff .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' . \" for low resolution devices (crt and lpr) .if \n(.H>23 .if \n(.V>19 \ \{\ . ds : e . ds 8 ss . ds o a . ds d- d\h'-1'\(ga . ds D- D\h'-1'\(hy . ds th \o'bp' . ds Th \o'LP' . ds ae ae . ds Ae AE .\} .rm #[ #] #H #V #F C .\" ======================================================================== .\" .IX Title "SIMILARITYMATRICESFINGERPRINTS 1" .TH SIMILARITYMATRICESFINGERPRINTS 1 "2015-03-29" "perl v5.14.2" "MayaChemTools" .\" For nroff, turn off justification. Always turn off hyphenation; it makes .\" way too many mistakes in technical documents. .if n .ad l .nh .SH "NAME" SimilarityMatricesFingerprints.pl \- Calculate similarity matrices using fingerprints strings data in SD, FP and CSV/TSV text file(s) .SH "SYNOPSIS" .IX Header "SYNOPSIS" SimilarityMatricesFingerprints.pl SDFile(s) FPFile(s) TextFile(s)... .PP SimilarityMatricesFingerprints.pl [\fB\-\-alpha\fR \fInumber\fR] [\fB\-\-beta\fR \fInumber\fR] [\fB\-b, \-\-BitVectorComparisonMode\fR \fIAll | \*(L"TanimotoSimilarity,[ TverskySimilarity, ... ]\*(R"\fR] [\fB\-c, \-\-ColMode\fR \fIColNum | ColLabel\fR] [\fB\-\-CompoundIDCol\fR \fIcol number | col name\fR] [\fB\-\-CompoundIDPrefix\fR \fItext\fR] [\fB\-\-CompoundIDField\fR \fIDataFieldName\fR] [\fB\-\-CompoundIDMode\fR \fIDataField | MolName | LabelPrefix | MolNameOrLabelPrefix\fR] [\fB\-d, \-\-detail\fR \fIInfoLevel\fR] [\fB\-f, \-\-fast\fR] [\fB\-\-FingerprintsCol\fR \fIcol number | col name\fR] [\fB\-\-FingerprintsField\fR \fIFieldLabel\fR] [\fB\-h, \-\-help\fR] [\fB\-\-InDelim\fR \fIcomma | semicolon\fR] [\fB\-\-InputDataMode\fR \fILoadInMemory | ScanFile\fR] [\fB\-m, \-\-mode\fR \fIAutoDetect | FingerprintsBitVectorString | FingerprintsVectorString\fR] [\fB\-\-OutDelim\fR \fIcomma | tab | semicolon\fR] [\fB\-\-OutMatrixFormat\fR \fIRowsAndColumns | IDPairsAndValue\fR] [\fB\-\-OutMatrixType\fR \fIFullMatrix | UpperTriangularMatrix | LowerTriangularMatrix\fR] [\fB\-o, \-\-overwrite\fR] [\fB\-p, \-\-precision\fR \fInumber\fR] [\fB\-q, \-\-quote\fR \fIYes | No\fR] [\fB\-r, \-\-root\fR \fIRootName\fR] [\fB\-v, \-\-VectorComparisonMode\fR \fIAll | \*(L"TanimotoSimilairy, [ ManhattanDistance, ...]\*(R"\fR] [\fB\-\-VectorComparisonFormulism\fR \fIAll | \*(L"AlgebraicForm, [BinaryForm, SetTheoreticForm]\*(R"\fR] [\fB\-w, \-\-WorkingDir\fR dirname] SDFile(s) FPFile(s) TextFile(s)... .SH "DESCRIPTION" .IX Header "DESCRIPTION" Calculate similarity matrices using fingerprint bit-vector or vector strings data in \fI\s-1SD\s0, \s-1FP\s0 and \s-1CSV/TSV\s0\fR text file(s) and generate \s-1CSV/TSV\s0 text file(s) containing values for specified similarity and distance coefficients. .PP The scripts SimilarityMatrixSDFiles.pl and SimilarityMatrixTextFiles.pl have been removed from the current release of MayaChemTools and their functionality merged with this script. .PP The valid \fISDFile\fR extensions are \fI.sdf\fR and \fI.sd\fR. All \s-1SD\s0 files in a current directory can be specified either by \fI*.sdf\fR or the current directory name. .PP The valid \fIFPFile\fR extensions are \fI.fpf\fR and \fI.fp\fR. All \s-1FP\s0 files in a current directory can be specified either by \fI*.fpf\fR or the current directory name. .PP The valid \fITextFile\fR extensions are \fI.csv\fR and \fI.tsv\fR for comma/semicolon and tab delimited text files respectively. All other file names are ignored. All text files in a current directory can be specified by \fI*.csv\fR, \fI*.tsv\fR, or the current directory name. The \fB\-\-indelim\fR option determines the format of \fITextFile(s)\fR. Any file which doesn't correspond to the format indicated by \fB\-\-indelim\fR option is ignored. .PP Example of \fI\s-1FP\s0\fR file containing fingerprints bit-vector string data: .PP .Vb 10 \& # \& # Package = MayaChemTools 7.4 \& # ReleaseDate = Oct 21, 2010 \& # \& # TimeStamp = Mon Mar 7 15:14:01 2011 \& # \& # FingerprintsStringType = FingerprintsBitVector \& # \& # Description = PathLengthBits:AtomicInvariantsAtomTypes:MinLength1:... \& # Size = 1024 \& # BitStringFormat = HexadecimalString \& # BitsOrder = Ascending \& # \& Cmpd1 9c8460989ec8a49913991a6603130b0a19e8051c89184414953800cc21510... \& Cmpd2 000000249400840040100042011001001980410c000000001010088001120... \& ... ... \& ... .. .Ve .PP Example of \fI\s-1FP\s0\fR file containing fingerprints vector string data: .PP .Vb 10 \& # \& # Package = MayaChemTools 7.4 \& # ReleaseDate = Oct 21, 2010 \& # \& # TimeStamp = Mon Mar 7 15:14:01 2011 \& # \& # FingerprintsStringType = FingerprintsVector \& # \& # Description = PathLengthBits:AtomicInvariantsAtomTypes:MinLength1:... \& # VectorStringFormat = IDsAndValuesString \& # VectorValuesType = NumericalValues \& # \& Cmpd1 338;C F N O C:C C:N C=O CC CF CN CO C:C:C C:C:N C:CC C:CF C:CN C: \& N:C C:NC CC:N CC=O CCC CCN CCO CNC NC=O O=CO C:C:C:C C:C:C:N C:C:CC...; \& 33 1 2 5 21 2 2 12 1 3 3 20 2 10 2 2 1 2 2 2 8 2 5 1 1 1 19 2 8 2 2 2 2 \& 6 2 2 2 2 2 2 2 2 3 2 2 1 4 1 5 1 1 18 6 2 2 1 2 10 2 1 2 1 2 2 2 2 ... \& Cmpd2 103;C N O C=N C=O CC CN CO CC=O CCC CCN CCO CNC N=CN NC=O NCN O=C \& O C CC=O CCCC CCCN CCCO CCNC CNC=N CNC=O CNCN CCCC=O CCCCC CCCCN CC...; \& 15 4 4 1 2 13 5 2 2 15 5 3 2 2 1 1 1 2 17 7 6 5 1 1 1 2 15 8 5 7 2 2 2 2 \& 1 2 1 1 3 15 7 6 8 3 4 4 3 2 2 1 2 3 14 2 4 7 4 4 4 4 1 1 1 2 1 1 1 ... \& ... ... \& ... ... .Ve .PP Example of \fI\s-1SD\s0\fR file containing fingerprints bit-vector string data: .PP .Vb 10 \& ... ... \& ... ... \& $$$$ \& ... ... \& ... ... \& ... ... \& 41 44 0 0 0 0 0 0 0 0999 V2000 \& \-3.3652 1.4499 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 \& ... ... \& 2 3 1 0 0 0 0 \& ... ... \& M END \& > <CmpdID> \& Cmpd1 \& \& > <PathLengthFingerprints> \& FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes:MinLengt \& h1:MaxLength8;1024;HexadecimalString;Ascending;9c8460989ec8a49913991a66 \& 03130b0a19e8051c89184414953800cc2151082844a201042800130860308e8204d4028 \& 00831048940e44281c00060449a5000ac80c894114e006321264401600846c050164462 \& 08190410805000304a10205b0100e04c0038ba0fad0209c0ca8b1200012268b61c0026a \& aa0660a11014a011d46 \& \& $$$$ \& ... ... \& ... ... .Ve .PP Example of \s-1CSV\s0 \fIText\fR file containing fingerprints bit-vector string data: .PP .Vb 7 \& "CompoundID","PathLengthFingerprints" \& "Cmpd1","FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes \& :MinLength1:MaxLength8;1024;HexadecimalString;Ascending;9c8460989ec8a4 \& 9913991a6603130b0a19e8051c89184414953800cc2151082844a20104280013086030 \& 8e8204d402800831048940e44281c00060449a5000ac80c894114e006321264401..." \& ... ... \& ... ... .Ve .PP The current release of MayaChemTools supports the following types of fingerprint bit-vector and vector strings: .PP .Vb 6 \& FingerprintsVector;AtomNeighborhoods:AtomicInvariantsAtomTypes:MinRadi \& us0:MaxRadius2;41;AlphaNumericalValues;ValuesString;NR0\-C.X1.BO1.H3\-AT \& C1:NR1\-C.X3.BO3.H1\-ATC1:NR2\-C.X1.BO1.H3\-ATC1:NR2\-C.X3.BO4\-ATC1 NR0\-C.X \& 1.BO1.H3\-ATC1:NR1\-C.X3.BO3.H1\-ATC1:NR2\-C.X1.BO1.H3\-ATC1:NR2\-C.X3.BO4\-A \& TC1 NR0\-C.X2.BO2.H2\-ATC1:NR1\-C.X2.BO2.H2\-ATC1:NR1\-C.X3.BO3.H1\-ATC1:NR2 \& \-C.X2.BO2.H2\-ATC1:NR2\-N.X3.BO3\-ATC1:NR2\-O.X1.BO1.H1\-ATC1 NR0\-C.X2.B... \& \& FingerprintsVector;AtomTypesCount:AtomicInvariantsAtomTypes:ArbitraryS \& ize;10;NumericalValues;IDsAndValuesString;C.X1.BO1.H3 C.X2.BO2.H2 C.X2 \& .BO3.H1 C.X3.BO3.H1 C.X3.BO4 F.X1.BO1 N.X2.BO2.H1 N.X3.BO3 O.X1.BO1.H1 \& O.X1.BO2;2 4 14 3 10 1 1 1 3 2 \& \& FingerprintsVector;AtomTypesCount:SLogPAtomTypes:ArbitrarySize;16;Nume \& ricalValues;IDsAndValuesString;C1 C10 C11 C14 C18 C20 C21 C22 C5 CS F \& N11 N4 O10 O2 O9;5 1 1 1 14 4 2 1 2 2 1 1 1 1 3 1 \& \& FingerprintsVector;AtomTypesCount:SLogPAtomTypes:FixedSize;67;OrderedN \& umericalValues;IDsAndValuesString;C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C \& 12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 C24 C25 C26 C27 CS N1 N \& 2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 N14 NS O1 O2 O3 O4 O5 O6 O7 O8 \& O9 O10 O11 O12 OS F Cl Br I Hal P S1 S2 S3 Me1 Me2;5 0 0 0 2 0 0 0 0 1 \& 1 0 0 1 0 0 0 14 0 4 2 1 0 0 0 0 0 2 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0... \& \& FingerprintsVector;EStateIndicies:ArbitrarySize;11;NumericalValues;IDs \& AndValuesString;SaaCH SaasC SaasN SdO SdssC SsCH3 SsF SsOH SssCH2 SssN \& H SsssCH;24.778 4.387 1.993 25.023 \-1.435 3.975 14.006 29.759 \-0.073 3 \& .024 \-2.270 \& \& FingerprintsVector;EStateIndicies:FixedSize;87;OrderedNumericalValues; \& ValuesString;0 0 0 0 0 0 0 3.975 0 \-0.073 0 0 24.778 \-2.270 0 0 \-1.435 \& 4.387 0 0 0 0 0 0 3.024 0 0 0 0 0 0 0 1.993 0 29.759 25.023 0 0 0 0 1 \& 4.006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \& 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \& \& FingerprintsVector;ExtendedConnectivity:AtomicInvariantsAtomTypes:Radi \& us2;60;AlphaNumericalValues;ValuesString;73555770 333564680 352413391 \& 666191900 1001270906 1371674323 1481469939 1977749791 2006158649 21414 \& 08799 49532520 64643108 79385615 96062769 273726379 564565671 85514103 \& 5 906706094 988546669 1018231313 1032696425 1197507444 1331250018 1338 \& 532734 1455473691 1607485225 1609687129 1631614296 1670251330 17303... \& \& FingerprintsVector;ExtendedConnectivityCount:AtomicInvariantsAtomTypes \& :Radius2;60;NumericalValues;IDsAndValuesString;73555770 333564680 3524 \& 13391 666191900 1001270906 1371674323 1481469939 1977749791 2006158649 \& 2141408799 49532520 64643108 79385615 96062769 273726379 564565671...; \& 3 2 1 1 14 1 2 10 4 3 1 1 1 1 2 1 2 1 1 1 2 3 1 1 2 1 3 3 8 2 2 2 6 2 \& 1 2 1 1 2 1 1 1 2 1 1 2 1 2 1 1 1 1 1 1 1 1 1 2 1 1 \& \& FingerprintsBitVector;ExtendedConnectivityBits:AtomicInvariantsAtomTyp \& es:Radius2;1024;BinaryString;Ascending;0000000000000000000000000000100 \& 0000000001010000000110000011000000000000100000000000000000000000100001 \& 1000000110000000000000000000000000010011000000000000000000000000010000 \& 0000000000000000000000000010000000000000000001000000000000000000000000 \& 0000000000010000100001000000000000101000000000000000100000000000000... \& \& FingerprintsVector;ExtendedConnectivity:FunctionalClassAtomTypes:Radiu \& s2;57;AlphaNumericalValues;ValuesString;24769214 508787397 850393286 8 \& 62102353 981185303 1231636850 1649386610 1941540674 263599683 32920567 \& 1 571109041 639579325 683993318 723853089 810600886 885767127 90326012 \& 7 958841485 981022393 1126908698 1152248391 1317567065 1421489994 1455 \& 632544 1557272891 1826413669 1983319256 2015750777 2029559552 20404... \& \& FingerprintsVector;ExtendedConnectivity:EStateAtomTypes:Radius2;62;Alp \& haNumericalValues;ValuesString;25189973 528584866 662581668 671034184 \& 926543080 1347067490 1738510057 1759600920 2034425745 2097234755 21450 \& 44754 96779665 180364292 341712110 345278822 386540408 387387308 50430 \& 1706 617094135 771528807 957666640 997798220 1158349170 1291258082 134 \& 1138533 1395329837 1420277211 1479584608 1486476397 1487556246 1566... \& \& FingerprintsBitVector;MACCSKeyBits;166;BinaryString;Ascending;00000000 \& 0000000000000000000000000000000001001000010010000000010010000000011100 \& 0100101010111100011011000100110110000011011110100110111111111111011111 \& 11111111111110111000 \& \& FingerprintsBitVector;MACCSKeyBits;322;BinaryString;Ascending;11101011 \& 1110011111100101111111000111101100110000000000000011100010000000000000 \& 0000000000000000000000000000000000000000000000101000000000000000000000 \& 0000000000000000000000000000000000000000000000000000000000000000000000 \& 0000000000000000000000000000000000000011000000000000000000000000000000 \& 0000000000000000000000000000000000000000 \& \& FingerprintsVector;MACCSKeyCount;166;OrderedNumericalValues;ValuesStri \& ng;0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \& 0 0 0 0 0 0 0 1 0 0 3 0 0 0 0 4 0 0 2 0 0 0 0 0 0 0 0 2 0 0 2 0 0 0 0 \& 0 0 0 0 1 1 8 0 0 0 1 0 0 1 0 1 0 1 0 3 1 3 1 0 0 0 1 2 0 11 1 0 0 0 \& 5 0 0 1 2 0 1 1 0 0 0 0 0 1 1 0 1 1 1 1 0 4 0 0 1 1 0 4 6 1 1 1 2 1 1 \& 3 5 2 2 0 5 3 5 1 1 2 5 1 2 1 2 4 8 3 5 5 2 2 0 3 5 4 1 \& \& FingerprintsVector;MACCSKeyCount;322;OrderedNumericalValues;ValuesStri \& ng;14 8 2 0 2 0 4 4 2 1 4 0 0 2 5 10 5 2 1 0 0 2 0 5 13 3 28 5 5 3 0 0 \& 0 4 2 1 1 0 1 1 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 5 3 0 0 0 1 0 \& 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 \& 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 2 0 0 0 0 0 0 0 0 0 \& 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... \& \& FingerprintsBitVector;PathLengthBits:AtomicInvariantsAtomTypes:MinLeng \& th1:MaxLength8;1024;BinaryString;Ascending;001000010011010101011000110 \& 0100010101011000101001011100110001000010001001101000001001001001001000 \& 0010110100000111001001000001001010100100100000000011000000101001011100 \& 0010000001000101010100000100111100110111011011011000000010110111001101 \& 0101100011000000010001000011000010100011101100001000001000100000000... \& \& FingerprintsVector;PathLengthCount:AtomicInvariantsAtomTypes:MinLength \& 1:MaxLength8;432;NumericalValues;IDsAndValuesPairsString;C.X1.BO1.H3 2 \& C.X2.BO2.H2 4 C.X2.BO3.H1 14 C.X3.BO3.H1 3 C.X3.BO4 10 F.X1.BO1 1 N.X \& 2.BO2.H1 1 N.X3.BO3 1 O.X1.BO1.H1 3 O.X1.BO2 2 C.X1.BO1.H3C.X3.BO3.H1 \& 2 C.X2.BO2.H2C.X2.BO2.H2 1 C.X2.BO2.H2C.X3.BO3.H1 4 C.X2.BO2.H2C.X3.BO \& 4 1 C.X2.BO2.H2N.X3.BO3 1 C.X2.BO3.H1:C.X2.BO3.H1 10 C.X2.BO3.H1:C.... \& \& FingerprintsVector;PathLengthCount:MMFF94AtomTypes:MinLength1:MaxLengt \& h8;463;NumericalValues;IDsAndValuesPairsString;C5A 2 C5B 2 C=ON 1 CB 1 \& 8 COO 1 CR 9 F 1 N5 1 NC=O 1 O=CN 1 O=CO 1 OC=O 1 OR 2 C5A:C5B 2 C5A:N \& 5 2 C5ACB 1 C5ACR 1 C5B:C5B 1 C5BC=ON 1 C5BCB 1 C=ON=O=CN 1 C=ONNC=O 1 \& CB:CB 18 CBF 1 CBNC=O 1 COO=O=CO 1 COOCR 1 COOOC=O 1 CRCR 7 CRN5 1 CR \& OR 2 C5A:C5B:C5B 2 C5A:C5BC=ON 1 C5A:C5BCB 1 C5A:N5:C5A 1 C5A:N5CR ... \& \& FingerprintsVector;TopologicalAtomPairs:AtomicInvariantsAtomTypes:MinD \& istance1:MaxDistance10;223;NumericalValues;IDsAndValuesString;C.X1.BO1 \& .H3\-D1\-C.X3.BO3.H1 C.X2.BO2.H2\-D1\-C.X2.BO2.H2 C.X2.BO2.H2\-D1\-C.X3.BO3. \& H1 C.X2.BO2.H2\-D1\-C.X3.BO4 C.X2.BO2.H2\-D1\-N.X3.BO3 C.X2.BO3.H1\-D1\-...; \& 2 1 4 1 1 10 8 1 2 6 1 2 2 1 2 1 2 2 1 2 1 5 1 10 12 2 2 1 2 1 9 1 3 1 \& 1 1 2 2 1 3 6 1 6 14 2 2 2 3 1 3 1 8 2 2 1 3 2 6 1 2 2 5 1 3 1 23 1... \& \& FingerprintsVector;TopologicalAtomPairs:FunctionalClassAtomTypes:MinDi \& stance1:MaxDistance10;144;NumericalValues;IDsAndValuesString;Ar\-D1\-Ar \& Ar\-D1\-Ar.HBA Ar\-D1\-HBD Ar\-D1\-Hal Ar\-D1\-None Ar.HBA\-D1\-None HBA\-D1\-NI H \& BA\-D1\-None HBA.HBD\-D1\-NI HBA.HBD\-D1\-None HBD\-D1\-None NI\-D1\-None No...; \& 23 2 1 1 2 1 1 1 1 2 1 1 7 28 3 1 3 2 8 2 1 1 1 5 1 5 24 3 3 4 2 13 4 \& 1 1 4 1 5 22 4 4 3 1 19 1 1 1 1 1 2 2 3 1 1 8 25 4 5 2 3 1 26 1 4 1 ... \& \& FingerprintsVector;TopologicalAtomTorsions:AtomicInvariantsAtomTypes;3 \& 3;NumericalValues;IDsAndValuesString;C.X1.BO1.H3\-C.X3.BO3.H1\-C.X3.BO4\- \& C.X3.BO4 C.X1.BO1.H3\-C.X3.BO3.H1\-C.X3.BO4\-N.X3.BO3 C.X2.BO2.H2\-C.X2.BO \& 2.H2\-C.X3.BO3.H1\-C.X2.BO2.H2 C.X2.BO2.H2\-C.X2.BO2.H2\-C.X3.BO3.H1\-O...; \& 2 2 1 1 2 2 1 1 3 4 4 8 4 2 2 6 2 2 1 2 1 1 2 1 1 2 6 2 4 2 1 3 1 \& \& FingerprintsVector;TopologicalAtomTorsions:EStateAtomTypes;36;Numerica \& lValues;IDsAndValuesString;aaCH\-aaCH\-aaCH\-aaCH aaCH\-aaCH\-aaCH\-aasC aaC \& H\-aaCH\-aasC\-aaCH aaCH\-aaCH\-aasC\-aasC aaCH\-aaCH\-aasC\-sF aaCH\-aaCH\-aasC\- \& ssNH aaCH\-aasC\-aasC\-aasC aaCH\-aasC\-aasC\-aasN aaCH\-aasC\-ssNH\-dssC a...; \& 4 4 8 4 2 2 6 2 2 2 4 3 2 1 3 3 2 2 2 1 2 1 1 1 2 1 1 1 1 1 1 1 2 1 1 2 \& \& FingerprintsVector;TopologicalAtomTriplets:AtomicInvariantsAtomTypes:M \& inDistance1:MaxDistance10;3096;NumericalValues;IDsAndValuesString;C.X1 \& .BO1.H3\-D1\-C.X1.BO1.H3\-D1\-C.X3.BO3.H1\-D2 C.X1.BO1.H3\-D1\-C.X2.BO2.H2\-D1 \& 0\-C.X3.BO4\-D9 C.X1.BO1.H3\-D1\-C.X2.BO2.H2\-D3\-N.X3.BO3\-D4 C.X1.BO1.H3\-D1 \& \-C.X2.BO2.H2\-D4\-C.X2.BO2.H2\-D5 C.X1.BO1.H3\-D1\-C.X2.BO2.H2\-D6\-C.X3....; \& 1 2 2 2 2 2 2 2 8 8 4 8 4 4 2 2 2 2 4 2 2 2 4 2 2 2 2 1 2 2 4 4 4 2 2 \& 2 4 4 4 8 4 4 2 4 4 4 2 4 4 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 8... \& \& FingerprintsVector;TopologicalAtomTriplets:SYBYLAtomTypes:MinDistance1 \& :MaxDistance10;2332;NumericalValues;IDsAndValuesString;C.2\-D1\-C.2\-D9\-C \& .3\-D10 C.2\-D1\-C.2\-D9\-C.ar\-D10 C.2\-D1\-C.3\-D1\-C.3\-D2 C.2\-D1\-C.3\-D10\-C.3\- \& D9 C.2\-D1\-C.3\-D2\-C.3\-D3 C.2\-D1\-C.3\-D2\-C.ar\-D3 C.2\-D1\-C.3\-D3\-C.3\-D4 C.2 \& \-D1\-C.3\-D3\-N.ar\-D4 C.2\-D1\-C.3\-D3\-O.3\-D2 C.2\-D1\-C.3\-D4\-C.3\-D5 C.2\-D1\-C. \& 3\-D5\-C.3\-D6 C.2\-D1\-C.3\-D5\-O.3\-D4 C.2\-D1\-C.3\-D6\-C.3\-D7 C.2\-D1\-C.3\-D7... \& \& FingerprintsVector;TopologicalPharmacophoreAtomPairs:ArbitrarySize:Min \& Distance1:MaxDistance10;54;NumericalValues;IDsAndValuesString;H\-D1\-H H \& \-D1\-NI HBA\-D1\-NI HBD\-D1\-NI H\-D2\-H H\-D2\-HBA H\-D2\-HBD HBA\-D2\-HBA HBA\-D2\- \& HBD H\-D3\-H H\-D3\-HBA H\-D3\-HBD H\-D3\-NI HBA\-D3\-NI HBD\-D3\-NI H\-D4\-H H\-D4\-H \& BA H\-D4\-HBD HBA\-D4\-HBA HBA\-D4\-HBD HBD\-D4\-HBD H\-D5\-H H\-D5\-HBA H\-D5\-...; \& 18 1 2 1 22 12 8 1 2 18 6 3 1 1 1 22 13 6 5 7 2 28 9 5 1 1 1 36 16 10 \& 3 4 1 37 10 8 1 35 10 9 3 3 1 28 7 7 4 18 16 12 5 1 2 1 \& \& FingerprintsVector;TopologicalPharmacophoreAtomPairs:FixedSize:MinDist \& ance1:MaxDistance10;150;OrderedNumericalValues;ValuesString;18 0 0 1 0 \& 0 0 2 0 0 1 0 0 0 0 22 12 8 0 0 1 2 0 0 0 0 0 0 0 0 18 6 3 1 0 0 0 1 \& 0 0 1 0 0 0 0 22 13 6 0 0 5 7 0 0 2 0 0 0 0 0 28 9 5 1 0 0 0 1 0 0 1 0 \& 0 0 0 36 16 10 0 0 3 4 0 0 1 0 0 0 0 0 37 10 8 0 0 0 0 1 0 0 0 0 0 0 \& 0 35 10 9 0 0 3 3 0 0 1 0 0 0 0 0 28 7 7 4 0 0 0 0 0 0 0 0 0 0 0 18... \& \& FingerprintsVector;TopologicalPharmacophoreAtomTriplets:ArbitrarySize: \& MinDistance1:MaxDistance10;696;NumericalValues;IDsAndValuesString;Ar1\- \& Ar1\-Ar1 Ar1\-Ar1\-H1 Ar1\-Ar1\-HBA1 Ar1\-Ar1\-HBD1 Ar1\-H1\-H1 Ar1\-H1\-HBA1 Ar1 \& \-H1\-HBD1 Ar1\-HBA1\-HBD1 H1\-H1\-H1 H1\-H1\-HBA1 H1\-H1\-HBD1 H1\-HBA1\-HBA1 H1\- \& HBA1\-HBD1 H1\-HBA1\-NI1 H1\-HBD1\-NI1 HBA1\-HBA1\-NI1 HBA1\-HBD1\-NI1 Ar1\-...; \& 46 106 8 3 83 11 4 1 21 5 3 1 2 2 1 1 1 100 101 18 11 145 132 26 14 23 \& 28 3 3 5 4 61 45 10 4 16 20 7 5 1 3 4 5 3 1 1 1 1 5 4 2 1 2 2 2 1 1 1 \& 119 123 24 15 185 202 41 25 22 17 3 5 85 95 18 11 23 17 3 1 1 6 4 ... \& \& FingerprintsVector;TopologicalPharmacophoreAtomTriplets:FixedSize:MinD \& istance1:MaxDistance10;2692;OrderedNumericalValues;ValuesString;46 106 \& 8 3 0 0 83 11 4 0 0 0 1 0 0 0 0 0 0 0 0 21 5 3 0 0 1 2 2 0 0 1 0 0 0 \& 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100 101 18 11 0 0 145 132 26 \& 14 0 0 23 28 3 3 0 0 5 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 61 45 10 4 0 \& 0 16 20 7 5 1 0 3 4 5 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 5 ... .Ve .SH "OPTIONS" .IX Header "OPTIONS" .IP "\fB\-\-alpha\fR \fInumber\fR" 4 .IX Item "--alpha number" Value of alpha parameter for calculating \fITversky\fR similarity coefficient specified for \&\fB\-b, \-\-BitVectorComparisonMode\fR option. It corresponds to weights assigned for bits set to \*(L"1\*(R" in a pair of fingerprint bit-vectors during the calculation of similarity coefficient. Possible values: \fI0 to 1\fR. Default value: <0.5>. .IP "\fB\-\-beta\fR \fInumber\fR" 4 .IX Item "--beta number" Value of beta parameter for calculating \fIWeightedTanimoto\fR and \fIWeightedTversky\fR similarity coefficients specified for \fB\-b, \-\-BitVectorComparisonMode\fR option. It is used to weight the contributions of bits set to \*(L"0\*(R" during the calculation of similarity coefficients. Possible values: \fI0 to 1\fR. Default value of <1> makes \fIWeightedTanimoto\fR and \fIWeightedTversky\fR equivalent to \fITanimoto\fR and \fITversky\fR. .ie n .IP "\fB\-b, \-\-BitVectorComparisonMode\fR \fIAll | ""TanimotoSimilarity,[TverskySimilarity,...]""\fR" 4 .el .IP "\fB\-b, \-\-BitVectorComparisonMode\fR \fIAll | ``TanimotoSimilarity,[TverskySimilarity,...]''\fR" 4 .IX Item "-b, --BitVectorComparisonMode All | TanimotoSimilarity,[TverskySimilarity,...]" Specify what similarity coefficients to use for calculating similarity matrices for fingerprints bit-vector strings data values in \fITextFile(s)\fR: calculate similarity matrices for all supported similarity coefficients or specify a comma delimited list of similarity coefficients. Possible values: \&\fIAll | "TanimotoSimilarity,[TverskySimilarity,...]\fR. Default: \fITanimotoSimilarity\fR .Sp \&\fIAll\fR uses complete list of supported similarity coefficients: \fIBaroniUrbaniSimilarity, BuserSimilarity, CosineSimilarity, DiceSimilarity, DennisSimilarity, ForbesSimilarity, FossumSimilarity, HamannSimilarity, JacardSimilarity, Kulczynski1Similarity, Kulczynski2Similarity, MatchingSimilarity, McConnaugheySimilarity, OchiaiSimilarity, PearsonSimilarity, RogersTanimotoSimilarity, RussellRaoSimilarity, SimpsonSimilarity, SkoalSneath1Similarity, SkoalSneath2Similarity, SkoalSneath3Similarity, TanimotoSimilarity, TverskySimilarity, YuleSimilarity, WeightedTanimotoSimilarity, WeightedTverskySimilarity\fR. These similarity coefficients are described below. .Sp For two fingerprint bit-vectors A and B of same size, let: .Sp .Vb 4 \& Na = Number of bits set to "1" in A \& Nb = Number of bits set to "1" in B \& Nc = Number of bits set to "1" in both A and B \& Nd = Number of bits set to "0" in both A and B \& \& Nt = Number of bits set to "1" or "0" in A or B (Size of A or B) \& Nt = Na + Nb \- Nc + Nd \& \& Na \- Nc = Number of bits set to "1" in A but not in B \& Nb \- Nc = Number of bits set to "1" in B but not in A .Ve .Sp Then, various similarity coefficients [ Ref. 40 \- 42 ] for a pair of bit-vectors A and B are defined as follows: .Sp \&\fIBaroniUrbaniSimilarity\fR: ( \s-1SQRT\s0( Nc * Nd ) + Nc ) / ( \s-1SQRT\s0 ( Nc * Nd ) + Nc + ( Na \- Nc ) + ( Nb \- Nc ) ) ( same as Buser ) .Sp \&\fIBuserSimilarity\fR: ( \s-1SQRT\s0 ( Nc * Nd ) + Nc ) / ( \s-1SQRT\s0 ( Nc * Nd ) + Nc + ( Na \- Nc ) + ( Nb \- Nc ) ) ( same as BaroniUrbani ) .Sp \&\fICosineSimilarity\fR: Nc / \s-1SQRT\s0 ( Na * Nb ) (same as Ochiai) .Sp \&\fIDiceSimilarity\fR: (2 * Nc) / ( Na + Nb ) .Sp \&\fIDennisSimilarity\fR: ( Nc * Nd \- ( ( Na \- Nc ) * ( Nb \- Nc ) ) ) / \s-1SQRT\s0 ( Nt * Na * Nb) .Sp \&\fIForbesSimilarity\fR: ( Nt * Nc ) / ( Na * Nb ) .Sp \&\fIFossumSimilarity\fR: ( Nt * ( ( Nc \- 1/2 ) ** 2 ) / ( Na * Nb ) .Sp \&\fIHamannSimilarity\fR: ( ( Nc + Nd ) \- ( Na \- Nc ) \- ( Nb \- Nc ) ) / Nt .Sp \&\fIJaccardSimilarity\fR: Nc / ( ( Na \- Nc) + ( Nb \- Nc ) + Nc ) = Nc / ( Na + Nb \- Nc ) (same as Tanimoto) .Sp \&\fIKulczynski1Similarity\fR: Nc / ( ( Na \- Nc ) + ( Nb \- Nc) ) = Nc / ( Na + Nb \- 2Nc ) .Sp \&\fIKulczynski2Similarity\fR: ( ( Nc / 2 ) * ( 2 * Nc + ( Na \- Nc ) + ( Nb \- Nc) ) ) / ( ( Nc + ( Na \- Nc ) ) * ( Nc + ( Nb \- Nc ) ) ) = 0.5 * ( Nc / Na + Nc / Nb ) .Sp \&\fIMatchingSimilarity\fR: ( Nc + Nd ) / Nt .Sp \&\fIMcConnaugheySimilarity\fR: ( Nc ** 2 \- ( Na \- Nc ) * ( Nb \- Nc) ) / ( Na * Nb ) .Sp \&\fIOchiaiSimilarity\fR: Nc / \s-1SQRT\s0 ( Na * Nb ) (same as Cosine) .Sp \&\fIPearsonSimilarity\fR: ( ( Nc * Nd ) \- ( ( Na \- Nc ) * ( Nb \- Nc ) ) / \s-1SQRT\s0 ( Na * Nb * ( Na \- Nc + Nd ) * ( Nb \- Nc + Nd ) ) .Sp \&\fIRogersTanimotoSimilarity\fR: ( Nc + Nd ) / ( ( Na \- Nc) + ( Nb \- Nc) + Nt) = ( Nc + Nd ) / ( Na + Nb \- 2Nc + Nt) .Sp \&\fIRussellRaoSimilarity\fR: Nc / Nt .Sp \&\fISimpsonSimilarity\fR: Nc / \s-1MIN\s0 ( Na, Nb) .Sp \&\fISkoalSneath1Similarity\fR: Nc / ( Nc + 2 * ( Na \- Nc) + 2 * ( Nb \- Nc) ) = Nc / ( 2 * Na + 2 * Nb \- 3 * Nc ) .Sp \&\fISkoalSneath2Similarity\fR: ( 2 * Nc + 2 * Nd ) / ( Nc + Nd + Nt ) .Sp \&\fISkoalSneath3Similarity\fR: ( Nc + Nd ) / ( ( Na \- Nc ) + ( Nb \- Nc ) ) = ( Nc + Nd ) / ( Na + Nb \- 2 * Nc ) .Sp \&\fITanimotoSimilarity\fR: Nc / ( ( Na \- Nc) + ( Nb \- Nc ) + Nc ) = Nc / ( Na + Nb \- Nc ) (same as Jaccard) .Sp \&\fITverskySimilarity\fR: Nc / ( alpha * ( Na \- Nc ) + ( 1 \- alpha) * ( Nb \- Nc) + Nc ) = Nc / ( alpha * ( Na \- Nb ) + Nb) .Sp \&\fIYuleSimilarity\fR: ( ( Nc * Nd ) \- ( ( Na \- Nc ) * ( Nb \- Nc ) ) ) / ( ( Nc * Nd ) + ( ( Na \- Nc ) * ( Nb \- Nc ) ) ) .Sp Values of Tanimoto/Jaccard and Tversky coefficients are dependent on only those bit which are set to \*(L"1\*(R" in both A and B. In order to take into account all bit positions, modified versions of Tanimoto [ Ref. 42 ] and Tversky [ Ref. 43 ] have been developed. .Sp Let: .Sp .Vb 3 \& Na\*(Aq = Number of bits set to "0" in A \& Nb\*(Aq = Number of bits set to "0" in B \& Nc\*(Aq = Number of bits set to "0" in both A and B .Ve .Sp Tanimoto': Nc' / ( ( Na' \- Nc') + ( Nb' \- Nc' ) + Nc' ) = Nc' / ( Na' + Nb' \- Nc' ) .Sp Tversky': Nc' / ( alpha * ( Na' \- Nc' ) + ( 1 \- alpha) * ( Nb' \- Nc' ) + Nc' ) = Nc' / ( alpha * ( Na' \- Nb' ) + Nb') .Sp Then: .Sp \&\fIWeightedTanimotoSimilarity\fR = beta * Tanimoto + (1 \- beta) * Tanimoto' .Sp \&\fIWeightedTverskySimilarity\fR = beta * Tversky + (1 \- beta) * Tversky' .IP "\fB\-c, \-\-ColMode\fR \fIColNum | ColLabel\fR" 4 .IX Item "-c, --ColMode ColNum | ColLabel" Specify how columns are identified in \fITextFile(s)\fR: using column number or column label. Possible values: \fIColNum or ColLabel\fR. Default value: \fIColNum\fR. .IP "\fB\-\-CompoundIDCol\fR \fIcol number | col name\fR" 4 .IX Item "--CompoundIDCol col number | col name" This value is \fB\-c, \-\-ColMode\fR mode specific. It specifies input \fITextFile(s)\fR column to use for generating compound \s-1ID\s0 for similarity matrices in output \fITextFile(s)\fR. Possible values: \fIcol number or col label\fR. Default value: \fIfirst column containing the word compoundID in its column label or sequentially generated IDs\fR. .IP "\fB\-\-CompoundIDPrefix\fR \fItext\fR" 4 .IX Item "--CompoundIDPrefix text" Specify compound \s-1ID\s0 prefix to use during sequential generation of compound IDs for input \fISDFile(s)\fR and \fITextFile(s)\fR. Default value: \fICmpd\fR. The default value generates compound IDs which look like Cmpd<Number>. .Sp For input \fISDFile(s)\fR, this value is only used during \fILabelPrefix | MolNameOrLabelPrefix\fR values of \fB\-\-CompoundIDMode\fR option; otherwise, it's ignored. .Sp Examples for \fILabelPrefix\fR or \fIMolNameOrLabelPrefix\fR value of \fB\-\-CompoundIDMode\fR: .Sp .Vb 1 \& Compound .Ve .Sp The values specified above generates compound IDs which correspond to Compound<Number> instead of default value of Cmpd<Number>. .IP "\fB\-\-CompoundIDField\fR \fIDataFieldName\fR" 4 .IX Item "--CompoundIDField DataFieldName" Specify input \fISDFile(s)\fR datafield label for generating compound IDs. This value is only used during \fIDataField\fR value of \fB\-\-CompoundIDMode\fR option. .Sp Examples for \fIDataField\fR value of \fB\-\-CompoundIDMode\fR: .Sp .Vb 2 \& MolID \& ExtReg .Ve .IP "\fB\-\-CompoundIDMode\fR \fIDataField | MolName | LabelPrefix | MolNameOrLabelPrefix\fR" 4 .IX Item "--CompoundIDMode DataField | MolName | LabelPrefix | MolNameOrLabelPrefix" Specify how to generate compound IDs from input \fISDFile(s)\fR for similarity matrix \s-1CSV/TSV\s0 text file(s): use a \fISDFile(s)\fR datafield value; use molname line from \fISDFile(s)\fR; generate a sequential \s-1ID\s0 with specific prefix; use combination of both MolName and LabelPrefix with usage of LabelPrefix values for empty molname lines. .Sp Possible values: \fIDataField | MolName | LabelPrefix | MolNameOrLabelPrefix\fR. Default: \fILabelPrefix\fR. .Sp For \fIMolNameAndLabelPrefix\fR value of \fB\-\-CompoundIDMode\fR, molname line in \fISDFile(s)\fR takes precedence over sequential compound IDs generated using \fILabelPrefix\fR and only empty molname values are replaced with sequential compound IDs. .IP "\fB\-d, \-\-detail\fR \fIInfoLevel\fR" 4 .IX Item "-d, --detail InfoLevel" Level of information to print about lines being ignored. Default: \fI1\fR. Possible values: \&\fI1, 2 or 3\fR. .IP "\fB\-f, \-\-fast\fR" 4 .IX Item "-f, --fast" In this mode, fingerprints columns specified using \fB\-\-FingerprintsCol\fR for \fITextFile(s)\fR and \&\fB\-\-FingerprintsField\fR for \fISDFile(s)\fR are assumed to contain valid fingerprints data and no checking is performed before calculating similarity matrices. By default, fingerprints data is validated before computing pairwise similarity and distance coefficients. .IP "\fB\-\-FingerprintsCol\fR \fIcol number | col name\fR" 4 .IX Item "--FingerprintsCol col number | col name" This value is \fB\-c, \-\-colmode\fR specific. It specifies fingerprints column to use during calculation similarity matrices for \fITextFile(s)\fR. Possible values: \fIcol number or col label\fR. Default value: \fIfirst column containing the word Fingerprints in its column label\fR. .IP "\fB\-\-FingerprintsField\fR \fIFieldLabel\fR" 4 .IX Item "--FingerprintsField FieldLabel" Fingerprints field label to use during calculation similarity matrices for \fISDFile(s)\fR. Default value: \fIfirst data field label containing the word Fingerprints in its label\fR .IP "\fB\-h, \-\-help\fR" 4 .IX Item "-h, --help" Print this help message. .IP "\fB\-\-InDelim\fR \fIcomma | semicolon\fR" 4 .IX Item "--InDelim comma | semicolon" Input delimiter for \s-1CSV\s0 \fITextFile(s)\fR. Possible values: \fIcomma or semicolon\fR. Default value: \fIcomma\fR. For \s-1TSV\s0 files, this option is ignored and \fItab\fR is used as a delimiter. .IP "\fB\-\-InputDataMode\fR \fILoadInMemory | ScanFile\fR" 4 .IX Item "--InputDataMode LoadInMemory | ScanFile" Specify how fingerprints bit-vector or vector strings data from \fI\s-1SD\s0, \s-1FP\s0 and \s-1CSV/TSV\s0\fR fingerprint file(s) is processed: Retrieve, process and load all available fingerprints data in memory; Retrieve and process data for fingerprints one at a time. Possible values : \fILoadInMemory | ScanFile\fR. Default: \fILoadInMemory\fR. .Sp During \fILoadInMemory\fR value of \fB\-\-InputDataMode\fR, fingerprints bit-vector or vector strings data from input file is retrieved, processed, and loaded into memory all at once as fingerprints objects for generation for similarity matrices. .Sp During \fIScanFile\fR value of \fB\-\-InputDataMode\fR, multiple passes over the input fingerprints file are performed to retrieve and process fingerprints bit-vector or vector strings data one at a time to generate fingerprints objects used during generation of similarity matrices. A temporary copy of the input fingerprints file is made at the start and deleted after generating the matrices. .Sp \&\fIScanFile\fR value of \fB\-\-InputDataMode\fR allows processing of arbitrary large fingerprints files without any additional memory requirement. .IP "\fB\-m, \-\-mode\fR \fIAutoDetect | FingerprintsBitVectorString | FingerprintsVectorString\fR" 4 .IX Item "-m, --mode AutoDetect | FingerprintsBitVectorString | FingerprintsVectorString" Format of fingerprint strings data in \fITextFile(s)\fR: automatically detect format of fingerprints string created by MayaChemTools fingerprints generation scripts or explicitly specify its format. Possible values: \fIAutoDetect | FingerprintsBitVectorString | FingerprintsVectorString\fR. Default value: \fIAutoDetect\fR. .IP "\fB\-\-OutDelim\fR \fIcomma | tab | semicolon\fR" 4 .IX Item "--OutDelim comma | tab | semicolon" Delimiter for output \s-1CSV/TSV\s0 text file(s). Possible values: \fIcomma, tab, or semicolon\fR Default value: \fIcomma\fR. .IP "\fB\-\-OutMatrixFormat\fR \fIRowsAndColumns | IDPairsAndValue\fR" 4 .IX Item "--OutMatrixFormat RowsAndColumns | IDPairsAndValue" Specify how similarity or distance values calculated for fingerprints vector and bit-vector strings are written to the output \s-1CSV/TSV\s0 text file(s): Generate text files containing rows and columns with their labels corresponding to compound IDs and each matrix element value corresponding to similarity or distance between corresponding compounds; Generate text files containing rows containing compoundIDs for two compounds followed by similarity or distance value between these compounds. .Sp Possible values: \fIRowsAndColumns, or IDPairsAndValue\fR. Default value: \fIRowsAndColumns\fR. .Sp The value of \fB\-\-OutMatrixFormat\fR in conjunction with \fB\-\-OutMatrixType\fR determines type of data written to output files and allows generation of up to 6 different output data formats: .Sp .Vb 1 \& OutMatrixFormat OutMatrixType \& \& RowsAndColumns FullMatrix [ DEFAULT ] \& RowsAndColumns UpperTriangularMatrix \& RowsAndColumns LowerTriangularMatrix \& \& IDPairsAndValue FullMatrix \& IDPairsAndValue UpperTriangularMatrix \& IDPairsAndValue LowerTriangularMatrix .Ve .Sp Example of data in output file for \fIRowsAndColumns\fR \fB\-\-OutMatrixFormat\fR value for \&\fIFullMatrix\fR valueof \fB\-\-OutMatrixType\fR: .Sp .Vb 10 \& "","Cmpd1","Cmpd2","Cmpd3","Cmpd4","Cmpd5","Cmpd6",... ... \& "Cmpd1","1","0.04","0.25","0.13","0.11","0.2",... ... \& "Cmpd2","0.04","1","0.06","0.05","0.19","0.07",... ... \& "Cmpd3","0.25","0.06","1","0.12","0.22","0.25",... ... \& "Cmpd4","0.13","0.05","0.12","1","0.11","0.13",... ... \& "Cmpd5","0.11","0.19","0.22","0.11","1","0.17",... ... \& "Cmpd6","0.2","0.07","0.25","0.13","0.17","1",... ... \& ... ... .. \& ... ... .. \& ... ... .. .Ve .Sp Example of data in output file for \fIRowsAndColumns\fR \fB\-\-OutMatrixFormat\fR value for \&\fIUpperTriangularMatrix\fR value of \fB\-\-OutMatrixType\fR: .Sp .Vb 10 \& "","Cmpd1","Cmpd2","Cmpd3","Cmpd4","Cmpd5","Cmpd6",... ... \& "Cmpd1","1","0.04","0.25","0.13","0.11","0.2",... ... \& "Cmpd2","1","0.06","0.05","0.19","0.07",... ... \& "Cmpd3","1","0.12","0.22","0.25",... ... \& "Cmpd4","1","0.11","0.13",... ... \& "Cmpd5","1","0.17",... ... \& "Cmpd6","1",... ... \& ... ... .. \& ... ... .. \& ... ... .. .Ve .Sp Example of data in output file for \fIRowsAndColumns\fR \fB\-\-OutMatrixFormat\fR value for \&\fILowerTriangularMatrix\fR value of \fB\-\-OutMatrixType\fR: .Sp .Vb 10 \& "","Cmpd1","Cmpd2","Cmpd3","Cmpd4","Cmpd5","Cmpd6",... ... \& "Cmpd1","1" \& "Cmpd2","0.04","1" \& "Cmpd3","0.25","0.06","1" \& "Cmpd4","0.13","0.05","0.12","1" \& "Cmpd5","0.11","0.19","0.22","0.11","1" \& "Cmpd6","0.2","0.07","0.25","0.13","0.17","1" \& ... ... .. \& ... ... .. \& ... ... .. .Ve .Sp Example of data in output file for \fIIDPairsAndValue\fR \fB\-\-OutMatrixFormat\fR value for <FullMatrix> value of \fBOutMatrixType\fR: .Sp .Vb 10 \& "CmpdID1","CmpdID2","Coefficient Value" \& "Cmpd1","Cmpd1","1" \& "Cmpd1","Cmpd2","0.04" \& "Cmpd1","Cmpd3","0.25" \& "Cmpd1","Cmpd4","0.13" \& ... ... ... \& ... ... ... \& ... ... ... \& "Cmpd2","Cmpd1","0.04" \& "Cmpd2","Cmpd2","1" \& "Cmpd2","Cmpd3","0.06" \& "Cmpd2","Cmpd4","0.05" \& ... ... ... \& ... ... ... \& ... ... ... \& "Cmpd3","Cmpd1","0.25" \& "Cmpd3","Cmpd2","0.06" \& "Cmpd3","Cmpd3","1" \& "Cmpd3","Cmpd4","0.12" \& ... ... ... \& ... ... ... \& ... ... ... .Ve .Sp Example of data in output file for \fIIDPairsAndValue\fR \fB\-\-OutMatrixFormat\fR value for <UpperTriangularMatrix> value of \fB\-\-OutMatrixType\fR: .Sp .Vb 10 \& "CmpdID1","CmpdID2","Coefficient Value" \& "Cmpd1","Cmpd1","1" \& "Cmpd1","Cmpd2","0.04" \& "Cmpd1","Cmpd3","0.25" \& "Cmpd1","Cmpd4","0.13" \& ... ... ... \& ... ... ... \& ... ... ... \& "Cmpd2","Cmpd2","1" \& "Cmpd2","Cmpd3","0.06" \& "Cmpd2","Cmpd4","0.05" \& ... ... ... \& ... ... ... \& ... ... ... \& "Cmpd3","Cmpd3","1" \& "Cmpd3","Cmpd4","0.12" \& ... ... ... \& ... ... ... \& ... ... ... .Ve .Sp Example of data in output file for \fIIDPairsAndValue\fR \fB\-\-OutMatrixFormat\fR value for <LowerTriangularMatrix> value of \fB\-\-OutMatrixType\fR: .Sp .Vb 10 \& "CmpdID1","CmpdID2","Coefficient Value" \& "Cmpd1","Cmpd1","1" \& "Cmpd2","Cmpd1","0.04" \& "Cmpd2","Cmpd2","1" \& "Cmpd3","Cmpd1","0.25" \& "Cmpd3","Cmpd2","0.06" \& "Cmpd3","Cmpd3","1" \& "Cmpd4","Cmpd1","0.13" \& "Cmpd4","Cmpd2","0.05" \& "Cmpd4","Cmpd3","0.12" \& "Cmpd4","Cmpd4","1" \& ... ... ... \& ... ... ... \& ... ... ... .Ve .IP "\fB\-\-OutMatrixType\fR \fIFullMatrix | UpperTriangularMatrix | LowerTriangularMatrix\fR" 4 .IX Item "--OutMatrixType FullMatrix | UpperTriangularMatrix | LowerTriangularMatrix" Type of similarity or distance matrix to calculate for fingerprints vector and bit-vector strings: Calculate full matrix; Calculate lower triangular matrix including diagonal; Calculate upper triangular matrix including diagonal. .Sp Possible values: \fIFullMatrix, UpperTriangularMatrix, or LowerTriangularMatrix\fR. Default value: \&\fIFullMatrix\fR. .Sp The value of \fB\-\-OutMatrixType\fR in conjunction with \fB\-\-OutMatrixFormat\fR determines type of data written to output files. .IP "\fB\-o, \-\-overwrite\fR" 4 .IX Item "-o, --overwrite" Overwrite existing files .IP "\fB\-p, \-\-precision\fR \fInumber\fR" 4 .IX Item "-p, --precision number" Precision of calculated values in the output file. Default: up to \fI2\fR decimal places. Valid values: positive integers. .IP "\fB\-q, \-\-quote\fR \fIYes | No\fR" 4 .IX Item "-q, --quote Yes | No" Put quote around column values in output \s-1CSV/TSV\s0 text file(s). Possible values: \&\fIYes or No\fR. Default value: \fIYes\fR. .IP "\fB\-r, \-\-root\fR \fIRootName\fR" 4 .IX Item "-r, --root RootName" New file name is generated using the root: <Root><BitVectorComparisonMode>.<Ext> or <Root><VectorComparisonMode><VectorComparisonFormulism>.<Ext>. The csv, and tsv <Ext> values are used for comma/semicolon, and tab delimited text files respectively. This option is ignored for multiple input files. .ie n .IP "\fB\-v, \-\-VectorComparisonMode\fR \fIAll | ""TanimotoSimilarity,[ManhattanDistance,...]""\fR" 4 .el .IP "\fB\-v, \-\-VectorComparisonMode\fR \fIAll | ``TanimotoSimilarity,[ManhattanDistance,...]''\fR" 4 .IX Item "-v, --VectorComparisonMode All | TanimotoSimilarity,[ManhattanDistance,...]" Specify what similarity or distance coefficients to use for calculating similarity matrices for fingerprint vector strings data values in \fITextFile(s)\fR: calculate similarity matrices for all supported similarity and distance coefficients or specify a comma delimited list of similarity and distance coefficients. Possible values: \fIAll | \*(L"TanimotoSimilairy,[ManhattanDistance,..]\*(R"\fR. Default: \fITanimotoSimilarity\fR. .Sp The value of \fB\-v, \-\-VectorComparisonMode\fR, in conjunction with \fB\-\-VectorComparisonFormulism\fR, decides which type of similarity and distance coefficient formulism gets used. .Sp \&\fIAll\fR uses complete list of supported similarity and distance coefficients: \fICosineSimilarity, CzekanowskiSimilarity, DiceSimilarity, OchiaiSimilarity, JaccardSimilarity, SorensonSimilarity, TanimotoSimilarity, CityBlockDistance, EuclideanDistance, HammingDistance, ManhattanDistance, SoergelDistance\fR. These similarity and distance coefficients are described below. .Sp \&\fBFingerprintsVector.pm\fR module, used to calculate similarity and distance coefficients, provides support to perform comparison between vectors containing three different types of values: .Sp Type I: OrderedNumericalValues .Sp .Vb 3 \& . Size of two vectors are same \& . Vectors contain real values in a specific order. For example: MACCS keys \& count, Topological pharmnacophore atom pairs and so on. .Ve .Sp Type \s-1II:\s0 UnorderedNumericalValues .Sp .Vb 3 \& . Size of two vectors might not be same \& . Vectors contain unordered real value identified by value IDs. For example: \& Toplogical atom pairs, Topological atom torsions and so on .Ve .Sp Type \s-1III:\s0 AlphaNumericalValues .Sp .Vb 3 \& . Size of two vectors might not be same \& . Vectors contain unordered alphanumerical values. For example: Extended \& connectivity fingerprints, atom neighborhood fingerprints. .Ve .Sp Before performing similarity or distance calculations between vectors containing UnorderedNumericalValues or AlphaNumericalValues, the vectors are transformed into vectors containing unique OrderedNumericalValues using value IDs for UnorderedNumericalValues and values itself for AlphaNumericalValues. .Sp Three forms of similarity and distance calculation between two vectors, specified using \fB\-\-VectorComparisonFormulism\fR option, are supported: \fIAlgebraicForm, BinaryForm or SetTheoreticForm\fR. .Sp For \fIBinaryForm\fR, the ordered list of processed final vector values containing the value or count of each unique value type is simply converted into a binary vector containing 1s and 0s corresponding to presence or absence of values before calculating similarity or distance between two vectors. .Sp For two fingerprint vectors A and B of same size containing OrderedNumericalValues, let: .Sp .Vb 1 \& N = Number values in A or B \& \& Xa = Values of vector A \& Xb = Values of vector B \& \& Xai = Value of ith element in A \& Xbi = Value of ith element in B \& \& SUM = Sum of i over N values .Ve .Sp For SetTheoreticForm of calculation between two vectors, let: .Sp .Vb 2 \& SetIntersectionXaXb = SUM ( MIN ( Xai, Xbi ) ) \& SetDifferenceXaXb = SUM ( Xai ) + SUM ( Xbi ) \- SUM ( MIN ( Xai, Xbi ) ) .Ve .Sp For BinaryForm of calculation between two vectors, let: .Sp .Vb 5 \& Na = Number of bits set to "1" in A = SUM ( Xai ) \& Nb = Number of bits set to "1" in B = SUM ( Xbi ) \& Nc = Number of bits set to "1" in both A and B = SUM ( Xai * Xbi ) \& Nd = Number of bits set to "0" in both A and B \& = SUM ( 1 \- Xai \- Xbi + Xai * Xbi) \& \& N = Number of bits set to "1" or "0" in A or B = Size of A or B = Na + Nb \- Nc + Nd .Ve .Sp Additionally, for BinaryForm various values also correspond to: .Sp .Vb 4 \& Na = | Xa | \& Nb = | Xb | \& Nc = | SetIntersectionXaXb | \& Nd = N \- | SetDifferenceXaXb | \& \& | SetDifferenceXaXb | = N \- Nd = Na + Nb \- Nc + Nd \- Nd = Na + Nb \- Nc \& = | Xa | + | Xb | \- | SetIntersectionXaXb | .Ve .Sp Various similarity and distance coefficients [ Ref 40, Ref 62, Ref 64 ] for a pair of vectors A and B in \fIAlgebraicForm, BinaryForm and SetTheoreticForm\fR are defined as follows: .Sp \&\fBCityBlockDistance\fR: ( same as HammingDistance and ManhattanDistance) .Sp \&\fIAlgebraicForm\fR: \s-1SUM\s0 ( \s-1ABS\s0 ( Xai \- Xbi ) ) .Sp \&\fIBinaryForm\fR: ( Na \- Nc ) + ( Nb \- Nc ) = Na + Nb \- 2 * Nc .Sp \&\fISetTheoreticForm\fR: | SetDifferenceXaXb | \- | SetIntersectionXaXb | = \s-1SUM\s0 ( Xai ) + \s-1SUM\s0 ( Xbi ) \- 2 * ( \s-1SUM\s0 ( \s-1MIN\s0 ( Xai, Xbi ) ) ) .Sp \&\fBCosineSimilarity\fR: ( same as OchiaiSimilarityCoefficient) .Sp \&\fIAlgebraicForm\fR: \s-1SUM\s0 ( Xai * Xbi ) / \s-1SQRT\s0 ( \s-1SUM\s0 ( Xai ** 2) * \s-1SUM\s0 ( Xbi ** 2) ) .Sp \&\fIBinaryForm\fR: Nc / \s-1SQRT\s0 ( Na * Nb) .Sp \&\fISetTheoreticForm\fR: | SetIntersectionXaXb | / \s-1SQRT\s0 ( |Xa| * |Xb| ) = \s-1SUM\s0 ( \s-1MIN\s0 ( Xai, Xbi ) ) / \s-1SQRT\s0 ( \s-1SUM\s0 ( Xai ) * \s-1SUM\s0 ( Xbi ) ) .Sp \&\fBCzekanowskiSimilarity\fR: ( same as DiceSimilarity and SorensonSimilarity) .Sp \&\fIAlgebraicForm\fR: ( 2 * ( \s-1SUM\s0 ( Xai * Xbi ) ) ) / ( \s-1SUM\s0 ( Xai ** 2) + \s-1SUM\s0 ( Xbi **2 ) ) .Sp \&\fIBinaryForm\fR: 2 * Nc / ( Na + Nb ) .Sp \&\fISetTheoreticForm\fR: 2 * | SetIntersectionXaXb | / ( |Xa| + |Xb| ) = 2 * ( \s-1SUM\s0 ( \s-1MIN\s0 ( Xai, Xbi ) ) ) / ( \s-1SUM\s0 ( Xai ) + \s-1SUM\s0 ( Xbi ) ) .Sp \&\fBDiceSimilarity\fR: ( same as CzekanowskiSimilarity and SorensonSimilarity) .Sp \&\fIAlgebraicForm\fR: ( 2 * ( \s-1SUM\s0 ( Xai * Xbi ) ) ) / ( \s-1SUM\s0 ( Xai ** 2) + \s-1SUM\s0 ( Xbi **2 ) ) .Sp \&\fIBinaryForm\fR: 2 * Nc / ( Na + Nb ) .Sp \&\fISetTheoreticForm\fR: 2 * | SetIntersectionXaXb | / ( |Xa| + |Xb| ) = 2 * ( \s-1SUM\s0 ( \s-1MIN\s0 ( Xai, Xbi ) ) ) / ( \s-1SUM\s0 ( Xai ) + \s-1SUM\s0 ( Xbi ) ) .Sp \&\fBEuclideanDistance\fR: .Sp \&\fIAlgebraicForm\fR: \s-1SQRT\s0 ( \s-1SUM\s0 ( ( ( Xai \- Xbi ) ** 2 ) ) ) .Sp \&\fIBinaryForm\fR: \s-1SQRT\s0 ( ( Na \- Nc ) + ( Nb \- Nc ) ) = \s-1SQRT\s0 ( Na + Nb \- 2 * Nc ) .Sp \&\fISetTheoreticForm\fR: \s-1SQRT\s0 ( | SetDifferenceXaXb | \- | SetIntersectionXaXb | ) = \s-1SQRT\s0 ( \s-1SUM\s0 ( Xai ) + \s-1SUM\s0 ( Xbi ) \- 2 * ( \s-1SUM\s0 ( \s-1MIN\s0 ( Xai, Xbi ) ) ) ) .Sp \&\fBHammingDistance\fR: ( same as CityBlockDistance and ManhattanDistance) .Sp \&\fIAlgebraicForm\fR: \s-1SUM\s0 ( \s-1ABS\s0 ( Xai \- Xbi ) ) .Sp \&\fIBinaryForm\fR: ( Na \- Nc ) + ( Nb \- Nc ) = Na + Nb \- 2 * Nc .Sp \&\fISetTheoreticForm\fR: | SetDifferenceXaXb | \- | SetIntersectionXaXb | = \s-1SUM\s0 ( Xai ) + \s-1SUM\s0 ( Xbi ) \- 2 * ( \s-1SUM\s0 ( \s-1MIN\s0 ( Xai, Xbi ) ) ) .Sp \&\fBJaccardSimilarity\fR: ( same as TanimotoSimilarity) .Sp \&\fIAlgebraicForm\fR: \s-1SUM\s0 ( Xai * Xbi ) / ( \s-1SUM\s0 ( Xai ** 2 ) + \s-1SUM\s0 ( Xbi ** 2 ) \- \s-1SUM\s0 ( Xai * Xbi ) ) .Sp \&\fIBinaryForm\fR: Nc / ( ( Na \- Nc ) + ( Nb \- Nc ) + Nc ) = Nc / ( Na + Nb \- Nc ) .Sp \&\fISetTheoreticForm\fR: | SetIntersectionXaXb | / | SetDifferenceXaXb | = \s-1SUM\s0 ( \s-1MIN\s0 ( Xai, Xbi ) ) / ( \s-1SUM\s0 ( Xai ) + \s-1SUM\s0 ( Xbi ) \- \s-1SUM\s0 ( \s-1MIN\s0 ( Xai, Xbi ) ) ) .Sp \&\fBManhattanDistance\fR: ( same as CityBlockDistance and HammingDistance) .Sp \&\fIAlgebraicForm\fR: \s-1SUM\s0 ( \s-1ABS\s0 ( Xai \- Xbi ) ) .Sp \&\fIBinaryForm\fR: ( Na \- Nc ) + ( Nb \- Nc ) = Na + Nb \- 2 * Nc .Sp \&\fISetTheoreticForm\fR: | SetDifferenceXaXb | \- | SetIntersectionXaXb | = \s-1SUM\s0 ( Xai ) + \s-1SUM\s0 ( Xbi ) \- 2 * ( \s-1SUM\s0 ( \s-1MIN\s0 ( Xai, Xbi ) ) ) .Sp \&\fBOchiaiSimilarity\fR: ( same as CosineSimilarity) .Sp \&\fIAlgebraicForm\fR: \s-1SUM\s0 ( Xai * Xbi ) / \s-1SQRT\s0 ( \s-1SUM\s0 ( Xai ** 2) * \s-1SUM\s0 ( Xbi ** 2) ) .Sp \&\fIBinaryForm\fR: Nc / \s-1SQRT\s0 ( Na * Nb) .Sp \&\fISetTheoreticForm\fR: | SetIntersectionXaXb | / \s-1SQRT\s0 ( |Xa| * |Xb| ) = \s-1SUM\s0 ( \s-1MIN\s0 ( Xai, Xbi ) ) / \s-1SQRT\s0 ( \s-1SUM\s0 ( Xai ) * \s-1SUM\s0 ( Xbi ) ) .Sp \&\fBSorensonSimilarity\fR: ( same as CzekanowskiSimilarity and DiceSimilarity) .Sp \&\fIAlgebraicForm\fR: ( 2 * ( \s-1SUM\s0 ( Xai * Xbi ) ) ) / ( \s-1SUM\s0 ( Xai ** 2) + \s-1SUM\s0 ( Xbi **2 ) ) .Sp \&\fIBinaryForm\fR: 2 * Nc / ( Na + Nb ) .Sp \&\fISetTheoreticForm\fR: 2 * | SetIntersectionXaXb | / ( |Xa| + |Xb| ) = 2 * ( \s-1SUM\s0 ( \s-1MIN\s0 ( Xai, Xbi ) ) ) / ( \s-1SUM\s0 ( Xai ) + \s-1SUM\s0 ( Xbi ) ) .Sp \&\fBSoergelDistance\fR: .Sp \&\fIAlgebraicForm\fR: \s-1SUM\s0 ( \s-1ABS\s0 ( Xai \- Xbi ) ) / \s-1SUM\s0 ( \s-1MAX\s0 ( Xai, Xbi ) ) .Sp \&\fIBinaryForm\fR: 1 \- Nc / ( Na + Nb \- Nc ) = ( Na + Nb \- 2 * Nc ) / ( Na + Nb \- Nc ) .Sp \&\fISetTheoreticForm\fR: ( | SetDifferenceXaXb | \- | SetIntersectionXaXb | ) / | SetDifferenceXaXb | = ( \s-1SUM\s0 ( Xai ) + \s-1SUM\s0 ( Xbi ) \- 2 * ( \s-1SUM\s0 ( \s-1MIN\s0 ( Xai, Xbi ) ) ) ) / ( \s-1SUM\s0 ( Xai ) + \s-1SUM\s0 ( Xbi ) \- \s-1SUM\s0 ( \s-1MIN\s0 ( Xai, Xbi ) ) ) .Sp \&\fBTanimotoSimilarity\fR: ( same as JaccardSimilarity) .Sp \&\fIAlgebraicForm\fR: \s-1SUM\s0 ( Xai * Xbi ) / ( \s-1SUM\s0 ( Xai ** 2 ) + \s-1SUM\s0 ( Xbi ** 2 ) \- \s-1SUM\s0 ( Xai * Xbi ) ) .Sp \&\fIBinaryForm\fR: Nc / ( ( Na \- Nc ) + ( Nb \- Nc ) + Nc ) = Nc / ( Na + Nb \- Nc ) .Sp \&\fISetTheoreticForm\fR: | SetIntersectionXaXb | / | SetDifferenceXaXb | = \s-1SUM\s0 ( \s-1MIN\s0 ( Xai, Xbi ) ) / ( \s-1SUM\s0 ( Xai ) + \s-1SUM\s0 ( Xbi ) \- \s-1SUM\s0 ( \s-1MIN\s0 ( Xai, Xbi ) ) ) .ie n .IP "\fB\-\-VectorComparisonFormulism\fR \fIAll | ""AlgebraicForm,[BinaryForm,SetTheoreticForm]""\fR" 4 .el .IP "\fB\-\-VectorComparisonFormulism\fR \fIAll | ``AlgebraicForm,[BinaryForm,SetTheoreticForm]''\fR" 4 .IX Item "--VectorComparisonFormulism All | AlgebraicForm,[BinaryForm,SetTheoreticForm]" Specify fingerprints vector comparison formulism to use for calculation similarity and distance coefficients during \fB\-v, \-\-VectorComparisonMode\fR: use all supported comparison formulisms or specify a comma delimited. Possible values: \fIAll | \*(L"AlgebraicForm,[BinaryForm,SetTheoreticForm]\*(R"\fR. Default value: \fIAlgebraicForm\fR. .Sp \&\fIAll\fR uses all three forms of supported vector comparison formulism for values of \fB\-v, \-\-VectorComparisonMode\fR option. .Sp For fingerprint vector strings containing \fBAlphaNumericalValues\fR data values \- \fBExtendedConnectivityFingerprints\fR, \&\fBAtomNeighborhoodsFingerprints\fR and so on \- all three formulism result in same value during similarity and distance calculations. .IP "\fB\-w, \-\-WorkingDir\fR \fIDirName\fR" 4 .IX Item "-w, --WorkingDir DirName" Location of working directory. Default: current directory. .SH "EXAMPLES" .IX Header "EXAMPLES" To generate a similarity matrix corresponding to Tanimoto similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints in text file present in a column name containing Fingerprint substring by loading all fingerprints data into memory and create a SampleFPHexTanimotoSimilarity.csv file containing compound IDs retrieved from column name containing CompoundID substring, type: .PP .Vb 1 \& % SimilarityMatricesFingerprints.pl \-o SampleFPHex.csv .Ve .PP To generate a similarity matrix corresponding to Tanimoto similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints in \s-1SD\s0 File present in a data field with Fingerprint substring in its label by loading all fingerprints data into memory and create a SampleFPHexTanimotoSimilarity.csv file containing sequentially generated compound IDs with Cmpd prefix, type: .PP .Vb 1 \& % SimilarityMatricesFingerprints.pl \-o SampleFPHex.sdf .Ve .PP To generate a similarity matrix corresponding to Tanimoto similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints in \s-1FP\s0 file by loading all fingerprints data into memory and create a SampleFPHexTanimotoSimilarity.csv file along with compound IDs retrieved from \s-1FP\s0 file, type: .PP .Vb 1 \& % SimilarityMatricesFingerprints.pl \-o SampleFPHex.fpf .Ve .PP To generate a lower triangular similarity matrix corresponding to Tanimoto similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints in text file present in a column name containing Fingerprint substring by loading all fingerprints data into memory and create a SampleFPHexTanimotoSimilarity.csv file containing compound IDs retrieved from column name containing CompoundID substring, type: .PP .Vb 3 \& % SimilarityMatricesFingerprints.pl \-o \-\-InputDataMode LoadInMemory \& \-\-OutMatrixFormat RowsAndColumns \-\-OutMatrixType LowerTriangularMatrix \& SampleFPHex.csv .Ve .PP To generate a upper triangular similarity matrix corresponding to Tanimoto similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints in text file present in a column name containing Fingerprint substring by loading all fingerprints data into memory and create a SampleFPHexTanimotoSimilarity.csv file in IDPairsAndValue format containing compound IDs retrieved from column name containing CompoundID substring, type: .PP .Vb 3 \& % SimilarityMatricesFingerprints.pl \-o \-\-InputDataMode LoadInMemory \& \-\-OutMatrixFormat IDPairsAndValue \-\-OutMatrixType UpperTriangularMatrix \& SampleFPHex.csv .Ve .PP To generate a full similarity matrix corresponding to Tanimoto similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints in text file present in a column name containing Fingerprint substring by scanning file without loading all fingerprints data into memory and create a SampleFPHexTanimotoSimilarity.csv file containing compound IDs retrieved from column name containing CompoundID substring, type: .PP .Vb 3 \& % SimilarityMatricesFingerprints.pl \-o \-\-InputDataMode ScanFile \& \-\-OutMatrixFormat RowsAndColumns \-\-OutMatrixType FullMatrix \& SampleFPHex.csv .Ve .PP To generate a lower triangular similarity matrix corresponding to Tanimoto similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints in text file present in a column name containing Fingerprint substring by scanning file without loading all fingerprints data into memory and create a SampleFPHexTanimotoSimilarity.csv file in IDPairsAndValue format containing compound IDs retrieved from column name containing CompoundID substring, type: .PP .Vb 3 \& % SimilarityMatricesFingerprints.pl \-o \-\-InputDataMode ScanFile \& \-\-OutMatrixFormat IDPairsAndValue \-\-OutMatrixType LowerTriangularMatrix \& SampleFPHex.csv .Ve .PP To generate a similarity matrix corresponding to Tanimoto similarity coefficient using algebraic formulism for fingerprints vector strings data corresponding to supported fingerprints in text file present in a column name containing Fingerprint substring and create a SampleFPCountTanimotoSimilarityAlgebraicForm.csv file containing compound IDs retrieved from column name containing CompoundID substring, type: .PP .Vb 1 \& % SimilarityMatricesFingerprints.pl \-o SampleFPCount.csv .Ve .PP To generate a similarity matrix corresponding to Tanimoto similarity coefficient using algebraic formulism for fingerprints vector strings data corresponding to supported fingerprints in \s-1SD\s0 file present in a data field with Fingerprint substring in its label and create a SampleFPCountTanimotoSimilarityAlgebraicForm.csv file containing sequentially generated compound IDs with Cmpd prefix, type: .PP .Vb 1 \& % SimilarityMatricesFingerprints.pl \-o SampleFPCount.sdf .Ve .PP To generate a similarity matrix corresponding to Tanimoto similarity coefficient using algebraic formulism vector strings data corresponding to supported fingerprints in \s-1FP\s0 file and create a SampleFPCountTanimotoSimilarityAlgebraicForm.csv file along with compound IDs retrieved from \s-1FP\s0 file, type: .PP .Vb 1 \& % SimilarityMatricesFingerprints.pl \-o SampleFPCount.fpf .Ve .PP To generate a similarity matrix corresponding to Tanimoto similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints in text file present in a column name containing Fingerprint substring and create a SampleFPHexTanimotoSimilarity.csv file in IDPairsAndValue format containing compound IDs retrieved from column name containing CompoundID substring, type: .PP .Vb 2 \& % SimilarityMatricesFingerprints.pl \-\-OutMatrixFormat IDPairsAndValue \-o \& SampleFPHex.csv .Ve .PP To generate a similarity matrix corresponding to Tanimoto similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints in \s-1SD\s0 file present in a data field with Fingerprint substring in its label and create a SampleFPHexTanimotoSimilarity.csv file in IDPairsAndValue format containing sequentially generated compound IDs with Cmpd prefix, type: .PP .Vb 2 \& % SimilarityMatricesFingerprints.pl \-\-OutMatrixFormat IDPairsAndValue \-o \& SampleFPHex.sdf .Ve .PP To generate a similarity matrix corresponding to Tanimoto similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints in \s-1FP\s0 file and create a SampleFPHexTanimotoSimilarity.csv file in IDPairsAndValue format along with compound IDs retrieved from \s-1FP\s0 file, type: .PP .Vb 2 \& % SimilarityMatricesFingerprints.pl \-\-OutMatrixFormat IDPairsAndValue \-o \& SampleFPHex.fpf .Ve .PP To generate a similarity matrix corresponding to Tanimoto similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints in \s-1SD\s0 file present in a data field with Fingerprint substring in its label and create a SampleFPHexTanimotoSimilarity.csv file containing compound IDs from mol name line, type: .PP .Vb 2 \& % SimilarityMatricesFingerprints.pl \-\-CompoundIDMode MolName \-o \& SampleFPHex.sdf .Ve .PP To generate a similarity matrix corresponding to Tanimoto similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints present in a data field with Fingerprint substring in its label and create a SampleFPHexTanimotoSimilarity.csv file containing compound IDs from data field name Mol_ID, type: .PP .Vb 2 \& % SimilarityMatricesFingerprints.pl \-\-CompoundIDMode DataField \& \-\-CompoundIDField Mol_ID \-o SampleFPBin.sdf .Ve .PP To generate similarity matrices corresponding to Buser, Dice and Tanimoto similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints present in a column name containing Fingerprint substring and create SampleFPBin[CoefficientName]Similarity.csv files containing compound IDs retrieved from column name containing CompoundID substring, type: .PP .Vb 2 \& % SimilarityMatricesFingerprints.pl \-b "BuserSimilarity,DiceSimilarity, \& TanimotoSimilarity" \-o SampleFPBin.csv .Ve .PP To generate similarity matrices corresponding to Buser, Dice and Tanimoto similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints present in a data field with Fingerprint substring in its label and create SampleFPBin[CoefficientName]Similarity.csv files containing sequentially generated compound IDs with Cmpd prefix, type: .PP .Vb 2 \& % SimilarityMatricesFingerprints.pl \-b "BuserSimilarity,DiceSimilarity, \& TanimotoSimilarity" \-o SampleFPBin.sdf .Ve .PP To generate similarity matrices corresponding to CityBlock distance and Tanimoto similarity coefficients using algebraic formulism for fingerprints vector strings data corresponding to supported fingerprints present in a column name containing Fingerprint substring and create SampleFPCount[CoefficientName]AlgebraicForm.csv files containing compound IDs retrieved from column name containing CompoundID substring, type: .PP .Vb 2 \& % SimilarityMatricesFingerprints.pl \-v "CityBlockDistance, \& TanimotoSimilarity" \-o SampleFPCount.csv .Ve .PP To generate similarity matrices corresponding to CityBlock distance and Tanimoto similarity coefficients using algebraic formulism for fingerprints vector strings data corresponding to supported fingerprints present in a data field with Fingerprint substring in its label and create SampleFPCount[CoefficientName]AlgebraicForm.csv files containing sequentially generated compound IDs with Cmpd prefix, type: .PP .Vb 2 \& % SimilarityMatricesFingerprints.pl \-v "CityBlockDistance, \& TanimotoSimilarity" \-o SampleFPCount.sdf .Ve .PP To generate similarity matrices corresponding to CityBlock distance Tanimoto similarity coefficients using binary formulism for fingerprints vector strings data corresponding to supported fingerprints present in a column name containing Fingerprint substring and create SampleFPCount[CoefficientName]Binary.csv files containing compound IDs retrieved from column name containing CompoundID substring, type: .PP .Vb 3 \& % SimilarityMatricesFingerprints.pl \-v "CityBlockDistance, \& TanimotoSimilarity" \-\-VectorComparisonFormulism BinaryForm \-o \& SampleFPCount.csv .Ve .PP To generate similarity matrices corresponding to CityBlock distance Tanimoto similarity coefficients using binary formulism for fingerprints vector strings data corresponding to supported fingerprints present in a data field with Fingerprint substring in its label and create SampleFPCount[CoefficientName]Binary.csv files containing sequentially generated compound IDs with Cmpd prefix, type: .PP .Vb 3 \& % SimilarityMatricesFingerprints.pl \-v "CityBlockDistance, \& TanimotoSimilarity" \-\-VectorComparisonFormulism BinaryForm \-o \& SampleFPCount.sdf .Ve .PP To generate similarity matrices corresponding to CityBlock distance Tanimoto similarity coefficients using all supported comparison formulisms for fingerprints vector strings data corresponding to supported fingerprints present in a column name containing Fingerprint substring and create SampleFPCount[CoefficientName][FormulismName].csv files containing compound IDs retrieved from column name containing CompoundID substring, type: .PP .Vb 2 \& % SimilarityMatricesFingerprints.pl \-v "CityBlockDistance, \& TanimotoSimilarity" \-\-VectorComparisonFormulism All \-o SampleFPCount.csv .Ve .PP To generate similarity matrices corresponding to CityBlock distance Tanimoto similarity coefficients using all supported comparison formulisms for fingerprints vector strings data corresponding to supported fingerprints present in a data field with Fingerprint substring in its label and create SampleFPCount[CoefficientName][FormulismName].csv files containing sequentially generated compound IDs with Cmpd prefix, type: .PP .Vb 2 \& % SimilarityMatricesFingerprints.pl \-v "CityBlockDistance,TanimotoSimilarity" \& \-\-VectorComparisonFormulism All \-o SampleFPCount.sdf .Ve .PP To generate similarity matrices corresponding to all available similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints present in a column name containing Fingerprint substring and create SampleFPHex[CoefficientName].csv files containing compound IDs retrieved from column name containing CompoundID substring, type: .PP .Vb 2 \& % SimilarityMatricesFingerprints.pl \-m AutoDetect \-\-BitVectorComparisonMode \& All \-\-alpha 0.5 \-beta 0.5 \-o SampleFPHex.csv .Ve .PP To generate similarity matrices corresponding to all available similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints present in a data field with Fingerprint substring in its label and create SampleFPHex[CoefficientName].csv files containing sequentially generated compound IDs with Cmpd prefix, type .PP .Vb 2 \& % SimilarityMatricesFingerprints.pl \-m AutoDetect \-\-BitVectorComparisonMode \& All \-\-alpha 0.5 \-beta 0.5 \-o SampleFPHex.sdf .Ve .PP To generate similarity matrices corresponding to all available similarity and distance coefficients using all comparison formulism for fingerprints vector strings data corresponding to supported fingerprints present in a column name containing Fingerprint substring and create SampleFPCount[CoefficientName][FormulismName].csv files containing compound IDs retrieved from column name containing CompoundID substring, type: .PP .Vb 2 \& % SimilarityMatricesFingerprints.pl \-m AutoDetect \-\-VectorComparisonMode \& All \-\-VectorComparisonFormulism All \-o SampleFPCount.csv .Ve .PP To generate similarity matrices corresponding to all available similarity and distance coefficients using all comparison formulism for fingerprints vector strings data corresponding to supported fingerprints present in a data field with Fingerprint substring in its label and create SampleFPCount[CoefficientName][FormulismName].csv files containing sequentially generated compound IDs with Cmpd prefix, type: .PP .Vb 2 \& % SimilarityMatricesFingerprints.pl \-m AutoDetect \-\-VectorComparisonMode \& All \-\-VectorComparisonFormulism All \-o SampleFPCount.sdf .Ve .PP To generate a similarity matrix corresponding to Tanimoto similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints present in a column number 2 and create a SampleFPHexTanimotoSimilarity.csv file containing compound IDs retrieved column number 1, type: .PP .Vb 2 \& % SimilarityMatricesFingerprints.pl \-\-ColMode ColNum \-\-CompoundIDCol 1 \& \-\-FingerprintsCol 2 \-o SampleFPHex.csv .Ve .PP To generate a similarity matrix corresponding to Tanimoto similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints present in a data field name Fingerprints and create a SampleFPHexTanimotoSimilarity.csv file containing compound IDs present in data field name Mol_ID, type: .PP .Vb 2 \& % SimilarityMatricesFingerprints.pl \-\-FingerprintsField Fingerprints \& \-\-CompoundIDMode DataField \-\-CompoundIDField Mol_ID \-o SampleFPHex.sdf .Ve .PP To generate a similarity matrix corresponding to Tversky similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints present in a column named Fingerprints and create a SampleFPHexTverskySimilarity.tsv file containing compound IDs retrieved column named CompoundID, type: .PP .Vb 4 \& % SimilarityMatricesFingerprints.pl \-\-BitVectorComparisonMode \& TverskySimilarity \-\-alpha 0.5 \-\-ColMode ColLabel \-\-CompoundIDCol \& CompoundID \-\-FingerprintsCol Fingerprints \-\-OutDelim Tab \-\-quote No \& \-o SampleFPHex.csv .Ve .PP To generate a similarity matrix corresponding to Tanimoto similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints present in a data field with Fingerprint substring in its label and create a SampleFPHexTanimotoSimilarity.csv file containing compound IDs from molname line or sequentially generated compound IDs with Mol prefix, type: .PP .Vb 2 \& % SimilarityMatricesFingerprints.pl \-\-CompoundIDMode MolnameOrLabelPrefix \& \-\-CompoundIDPrefix Mol \-o SampleFPHex.sdf .Ve .PP To generate a similarity matrix corresponding to Tanimoto similarity coefficient for fingerprints bit-vector strings data corresponding to supported fingerprints present in a data field with Fingerprint substring in its label and create a SampleFPHexTanimotoSimilarity.tsv file containing sequentially generated compound IDs with Cmpd prefix, type: .PP .Vb 1 \& % SimilarityMatricesFingerprints.pl \-OutDelim Tab \-\-quote No \-o SampleFPHex.sdf .Ve .SH "AUTHOR" .IX Header "AUTHOR" Manish Sud <msud@san.rr.com> .SH "SEE ALSO" .IX Header "SEE ALSO" InfoFingerprintsFiles.pl, SimilaritySearchingFingerprints.pl, AtomNeighborhoodsFingerprints.pl, ExtendedConnectivityFingerprints.pl, MACCSKeysFingerprints.pl, PathLengthFingerprints.pl, TopologicalAtomPairsFingerprints.pl, TopologicalAtomTorsionsFingerprints.pl, TopologicalPharmacophoreAtomPairsFingerprints.pl, TopologicalPharmacophoreAtomTripletsFingerprints.pl .SH "COPYRIGHT" .IX Header "COPYRIGHT" Copyright (C) 2015 Manish Sud. All rights reserved. .PP This file is part of MayaChemTools. .PP MayaChemTools is free software; you can redistribute it and/or modify it under the terms of the \s-1GNU\s0 Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.