Mercurial > repos > deepakjadmin > mayatool3_test2
diff docs/scripts/txt/MACCSKeysFingerprints.txt @ 0:4816e4a8ae95 draft default tip
Uploaded
author | deepakjadmin |
---|---|
date | Wed, 20 Jan 2016 09:23:18 -0500 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/docs/scripts/txt/MACCSKeysFingerprints.txt Wed Jan 20 09:23:18 2016 -0500 @@ -0,0 +1,1074 @@ +NAME + MACCSKeysFingerprints.pl - Generate MACCS key fingerprints for SD files + +SYNOPSIS + MACCSKeysFingerprints.pl SDFile(s)... + + MACCSKeysFingerprints.pl [--AromaticityModel *AromaticityModelType*] + [--BitsOrder *Ascending | Descending*] [-b, --BitStringFormat + *BinaryString | HexadecimalString*] [--CompoundID *DataFieldName or + LabelPrefixString*] [--CompoundIDLabel *text*] [--CompoundIDMode + *DataField | MolName | LabelPrefix | MolNameOrLabelPrefix*] + [--DataFields *"FieldLabel1,FieldLabel2,..."*] [-d, --DataFieldsMode + *All | Common | Specify | CompoundID*] [-f, --Filter *Yes | No*] + [--FingerprintsLabel *text*] [-h, --help] [-k, --KeepLargestComponent + *Yes | No*] [-m, --mode *MACCSKeyBits | MACCSKeyCount*] [--OutDelim + *comma | tab | semicolon*] [--output *SD | FP | text | all*] [-o, + --overwrite] [-q, --quote *Yes | No*] [-r, --root *RootName*] [-s, + --size *number*] [-v, --VectorStringFormat *IDsAndValuesString | + IDsAndValuesPairsString | ValuesAndIDsString | ValuesAndIDsPairsString*] + [-w, --WorkingDir *DirName*] + +DESCRIPTION + Generate MACCS (Molecular ACCess System) keys fingerprints [ Ref 45-47 ] + for *SDFile(s)* and create appropriate SD, FP or CSV/TSV text file(s) + containing fingerprints bit-vector or vector strings corresponding to + molecular fingerprints. + + Multiple SDFile names are separated by spaces. The valid file extensions + are *.sdf* and *.sd*. All other file names are ignored. All the SD files + in a current directory can be specified either by **.sdf* or the current + directory name. + + For each MACCS keys definition, atoms are processed to determine their + membership to the key and the appropriate molecular fingerprints strings + are generated. An atom can belong to multiple MACCS keys. + + For *MACCSKeyBits* value of -m, --mode option, a fingerprint bit-vector + string containing zeros and ones is generated and for *MACCSKeyCount* + value, a fingerprint vector string corresponding to number of MACCS keys + [ Ref 45-47 ] is generated. + + *MACCSKeyBits | MACCSKeyCount* values for -m, --mode option along with + two possible *166 | 322* values of -s, --size supports generation of + four different types of MACCS keys fingerprint: *MACCS166KeyBits, + MACCS166KeyCount, MACCS322KeyBits, MACCS322KeyCount*. + + Example of *SD* file containing MAACS keys fingerprints string data: + + ... ... + ... ... + $$$$ + ... ... + ... ... + ... ... + 41 44 0 0 0 0 0 0 0 0999 V2000 + -3.3652 1.4499 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 + ... ... + 2 3 1 0 0 0 0 + ... ... + M END + > <CmpdID> + Cmpd1 + + > <MACCSKeysFingerprints> + FingerprintsBitVector;MACCSKeyBits;166;BinaryString;Ascending;000000000 + 00000000000000000000000000000000100100001001000000001001000000001110001 + 00101010111100011011000100110110000011011110100110111111111111011111111 + 11111111110111000 + + $$$$ + ... ... + ... ... + + Example of *FP* file containing MAACS keys fingerprints string data: + + # + # Package = MayaChemTools 7.4 + # Release Date = Oct 21, 2010 + # + # TimeStamp = Fri Mar 11 14:57:24 2011 + # + # FingerprintsStringType = FingerprintsBitVector + # + # Description = MACCSKeyBits + # Size = 166 + # BitStringFormat = BinaryString + # BitsOrder = Ascending + # + Cmpd1 00000000000000000000000000000000000000000100100001001000000001... + Cmpd2 00000000000000000000000010000000001000000010000000001000000000... + ... ... + ... .. + + Example of CSV *Text* file containing MAACS keys fingerprints string + data: + + "CompoundID","MACCSKeysFingerprints" + "Cmpd1","FingerprintsBitVector;MACCSKeyBits;166;BinaryString;Ascending; + 00000000000000000000000000000000000000000100100001001000000001001000000 + 00111000100101010111100011011000100110110000011011110100110111111111111 + 01111111111111111110111000" + ... ... + ... ... + + The current release of MayaChemTools generates the following types of + MACCS keys fingerprints bit-vector and vector strings: + + FingerprintsBitVector;MACCSKeyBits;166;BinaryString;Ascending;00000000 + 0000000000000000000000000000000001001000010010000000010010000000011100 + 0100101010111100011011000100110110000011011110100110111111111111011111 + 11111111111110111000 + + FingerprintsBitVector;MACCSKeyBits;166;HexadecimalString;Ascending;000 + 000000021210210e845f8d8c60b79dffbffffd1 + + FingerprintsBitVector;MACCSKeyBits;322;BinaryString;Ascending;11101011 + 1110011111100101111111000111101100110000000000000011100010000000000000 + 0000000000000000000000000000000000000000000000101000000000000000000000 + 0000000000000000000000000000000000000000000000000000000000000000000000 + 0000000000000000000000000000000000000011000000000000000000000000000000 + 0000000000000000000000000000000000000000 + + FingerprintsBitVector;MACCSKeyBits;322;HexadecimalString;Ascending;7d7 + e7af3edc000c1100000000000000500000000000000000000000000000000300000000 + 000000000 + + FingerprintsVector;MACCSKeyCount;166;OrderedNumericalValues;ValuesStri + ng;0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0 0 0 1 0 0 3 0 0 0 0 4 0 0 2 0 0 0 0 0 0 0 0 2 0 0 2 0 0 0 0 + 0 0 0 0 1 1 8 0 0 0 1 0 0 1 0 1 0 1 0 3 1 3 1 0 0 0 1 2 0 11 1 0 0 0 + 5 0 0 1 2 0 1 1 0 0 0 0 0 1 1 0 1 1 1 1 0 4 0 0 1 1 0 4 6 1 1 1 2 1 1 + 3 5 2 2 0 5 3 5 1 1 2 5 1 2 1 2 4 8 3 5 5 2 2 0 3 5 4 1 + + FingerprintsVector;MACCSKeyCount;322;OrderedNumericalValues;ValuesStri + ng;14 8 2 0 2 0 4 4 2 1 4 0 0 2 5 10 5 2 1 0 0 2 0 5 13 3 28 5 5 3 0 0 + 0 4 2 1 1 0 1 1 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 5 3 0 0 0 1 0 + 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 2 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... + +OPTIONS + --AromaticityModel *MDLAromaticityModel | TriposAromaticityModel | + MMFFAromaticityModel | ChemAxonBasicAromaticityModel | + ChemAxonGeneralAromaticityModel | DaylightAromaticityModel | + MayaChemToolsAromaticityModel* + Specify aromaticity model to use during detection of aromaticity. + Possible values in the current release are: *MDLAromaticityModel, + TriposAromaticityModel, MMFFAromaticityModel, + ChemAxonBasicAromaticityModel, ChemAxonGeneralAromaticityModel, + DaylightAromaticityModel or MayaChemToolsAromaticityModel*. Default + value: *MayaChemToolsAromaticityModel*. + + The supported aromaticity model names along with model specific + control parameters are defined in AromaticityModelsData.csv, which + is distributed with the current release and is available under + lib/data directory. Molecule.pm module retrieves data from this file + during class instantiation and makes it available to method + DetectAromaticity for detecting aromaticity corresponding to a + specific model. + + --BitsOrder *Ascending | Descending* + Bits order to use during generation of fingerprints bit-vector + string for *MACCSKeyBits* value of -m, --mode option. Possible + values: *Ascending, Descending*. Default: *Ascending*. + + *Ascending* bit order which corresponds to first bit in each byte as + the lowest bit as opposed to the highest bit. + + Internally, bits are stored in *Ascending* order using Perl vec + function. Regardless of machine order, big-endian or little-endian, + vec function always considers first string byte as the lowest byte + and first bit within each byte as the lowest bit. + + -b, --BitStringFormat *BinaryString | HexadecimalString* + Format of fingerprints bit-vector string data in output SD, FP or + CSV/TSV text file(s) specified by --output used during + *MACCSKeyBits* value of -m, --mode option. Possible values: + *BinaryString, HexadecimalString*. Default value: *BinaryString*. + + *BinaryString* corresponds to an ASCII string containing 1s and 0s. + *HexadecimalString* contains bit values in ASCII hexadecimal format. + + Examples: + + FingerprintsBitVector;MACCSKeyBits;166;BinaryString;Ascending;00000000 + 0000000000000000000000000000000001001000010010000000010010000000011100 + 0100101010111100011011000100110110000011011110100110111111111111011111 + 11111111111110111000 + + FingerprintsBitVector;MACCSKeyBits;166;HexadecimalString;Ascending;000 + 000000021210210e845f8d8c60b79dffbffffd1 + + FingerprintsBitVector;MACCSKeyBits;322;BinaryString;Ascending;11101011 + 1110011111100101111111000111101100110000000000000011100010000000000000 + 0000000000000000000000000000000000000000000000101000000000000000000000 + 0000000000000000000000000000000000000000000000000000000000000000000000 + 0000000000000000000000000000000000000011000000000000000000000000000000 + 0000000000000000000000000000000000000000 + + FingerprintsBitVector;MACCSKeyBits;322;HexadecimalString;Ascending;7d7 + e7af3edc000c1100000000000000500000000000000000000000000000000300000000 + 000000000 + + --CompoundID *DataFieldName or LabelPrefixString* + This value is --CompoundIDMode specific and indicates how compound + ID is generated. + + For *DataField* value of --CompoundIDMode option, it corresponds to + datafield label name whose value is used as compound ID; otherwise, + it's a prefix string used for generating compound IDs like + LabelPrefixString<Number>. Default value, *Cmpd*, generates compound + IDs which look like Cmpd<Number>. + + Examples for *DataField* value of --CompoundIDMode: + + MolID + ExtReg + + Examples for *LabelPrefix* or *MolNameOrLabelPrefix* value of + --CompoundIDMode: + + Compound + + The value specified above generates compound IDs which correspond to + Compound<Number> instead of default value of Cmpd<Number>. + + --CompoundIDLabel *text* + Specify compound ID column label for FP or CSV/TSV text file(s) used + during *CompoundID* value of --DataFieldsMode option. Default: + *CompoundID*. + + --CompoundIDMode *DataField | MolName | LabelPrefix | + MolNameOrLabelPrefix* + Specify how to generate compound IDs and write to FP or CSV/TSV text + file(s) along with generated fingerprints for *FP | text | all* + values of --output option: use a *SDFile(s)* datafield value; use + molname line from *SDFile(s)*; generate a sequential ID with + specific prefix; use combination of both MolName and LabelPrefix + with usage of LabelPrefix values for empty molname lines. + + Possible values: *DataField | MolName | LabelPrefix | + MolNameOrLabelPrefix*. Default: *LabelPrefix*. + + For *MolNameAndLabelPrefix* value of --CompoundIDMode, molname line + in *SDFile(s)* takes precedence over sequential compound IDs + generated using *LabelPrefix* and only empty molname values are + replaced with sequential compound IDs. + + This is only used for *CompoundID* value of --DataFieldsMode option. + + --DataFields *"FieldLabel1,FieldLabel2,..."* + Comma delimited list of *SDFiles(s)* data fields to extract and + write to CSV/TSV text file(s) along with generated fingerprints for + *text | all* values of --output option. + + This is only used for *Specify* value of --DataFieldsMode option. + + Examples: + + Extreg + MolID,CompoundName + + -d, --DataFieldsMode *All | Common | Specify | CompoundID* + Specify how data fields in *SDFile(s)* are transferred to output + CSV/TSV text file(s) along with generated fingerprints for *text | + all* values of --output option: transfer all SD data field; transfer + SD data files common to all compounds; extract specified data + fields; generate a compound ID using molname line, a compound + prefix, or a combination of both. Possible values: *All | Common | + specify | CompoundID*. Default value: *CompoundID*. + + -f, --Filter *Yes | No* + Specify whether to check and filter compound data in SDFile(s). + Possible values: *Yes or No*. Default value: *Yes*. + + By default, compound data is checked before calculating fingerprints + and compounds containing atom data corresponding to non-element + symbols or no atom data are ignored. + + --FingerprintsLabel *text* + SD data label or text file column label to use for fingerprints + string in output SD or CSV/TSV text file(s) specified by --output. + Default value: *MACCSKeyFingerprints*. + + -h, --help + Print this help message. + + -k, --KeepLargestComponent *Yes | No* + Generate fingerprints for only the largest component in molecule. + Possible values: *Yes or No*. Default value: *Yes*. + + For molecules containing multiple connected components, fingerprints + can be generated in two different ways: use all connected components + or just the largest connected component. By default, all atoms + except for the largest connected component are deleted before + generation of fingerprints. + + -m, --mode *MACCSKeyBits | MACCSKeyCount* + Specify type of MACCS keys [ Ref 45-47 ] fingerprints to generate + for molecules in *SDFile(s)*. Possible values: *MACCSKeyBits, + MACCSKeyCount*. Default value: *MACCSKeyBits*. + + For *MACCSKeyBits* value of -m, --mode option, a fingerprint + bit-vector string containing zeros and ones is generated and for + *MACCSKeyCount* value, a fingerprint vector string corresponding to + number of MACCS keys is generated. + + *MACCSKeyBits | MACCSKeyCount* values for -m, --mode option along + with two possible *166 | 322* values of -s, --size supports + generation of four different types of MACCS keys fingerprint: + *MACCS166KeyBits, MACCS166KeyCount, MACCS322KeyBits, + MACCS322KeyCount*. + + Definition of MACCS keys uses the following atom and bond symbols to + define atom and bond environments: + + Atom symbols for 166 keys [ Ref 47 ]: + + A : Any valid periodic table element symbol + Q : Hetro atoms; any non-C or non-H atom + X : Halogens; F, Cl, Br, I + Z : Others; other than H, C, N, O, Si, P, S, F, Cl, Br, I + + Atom symbols for 322 keys [ Ref 46 ]: + + A : Any valid periodic table element symbol + Q : Hetro atoms; any non-C or non-H atom + X : Others; other than H, C, N, O, Si, P, S, F, Cl, Br, I + Z is neither defined nor used + + Bond types: + + - : Single + = : Double + T : Triple + # : Triple + ~ : Single or double query bond + % : An aromatic query bond + + None : Any bond type; no explicit bond specified + + $ : Ring bond; $ before a bond type specifies ring bond + ! : Chain or non-ring bond; ! before a bond type specifies chain bond + + @ : A ring linkage and the number following it specifies the + atoms position in the line, thus @1 means linked back to the first + atom in the list. + + Aromatic: Kekule or Arom5 + + Kekule: Bonds in 6-membered rings with alternate single/double bonds + or perimeter bonds + Arom5: Bonds in 5-membered rings with two double bonds and a hetro + atom at the apex of the ring. + + MACCS 166 keys [ Ref 45-47 ] are defined as follows: + + Key Description + + 1 ISOTOPE + 2 103 < ATOMIC NO. < 256 + 3 GROUP IVA,VA,VIA PERIODS 4-6 (Ge...) + 4 ACTINIDE + 5 GROUP IIIB,IVB (Sc...) + 6 LANTHANIDE + 7 GROUP VB,VIB,VIIB (V...) + 8 QAAA@1 + 9 GROUP VIII (Fe...) + 10 GROUP IIA (ALKALINE EARTH) + 11 4M RING + 12 GROUP IB,IIB (Cu...) + 13 ON(C)C + 14 S-S + 15 OC(O)O + 16 QAA@1 + 17 CTC + 18 GROUP IIIA (B...) + 19 7M RING + 20 SI + 21 C=C(Q)Q + 22 3M RING + 23 NC(O)O + 24 N-O + 25 NC(N)N + 26 C$=C($A)$A + 27 I + 28 QCH2Q + 29 P + 30 CQ(C)(C)A + 31 QX + 32 CSN + 33 NS + 34 CH2=A + 35 GROUP IA (ALKALI METAL) + 36 S HETEROCYCLE + 37 NC(O)N + 38 NC(C)N + 39 OS(O)O + 40 S-O + 41 CTN + 42 F + 43 QHAQH + 44 OTHER + 45 C=CN + 46 BR + 47 SAN + 48 OQ(O)O + 49 CHARGE + 50 C=C(C)C + 51 CSO + 52 NN + 53 QHAAAQH + 54 QHAAQH + 55 OSO + 56 ON(O)C + 57 O HETEROCYCLE + 58 QSQ + 59 Snot%A%A + 60 S=O + 61 AS(A)A + 62 A$A!A$A + 63 N=O + 64 A$A!S + 65 C%N + 66 CC(C)(C)A + 67 QS + 68 QHQH (&...) + 69 QQH + 70 QNQ + 71 NO + 72 OAAO + 73 S=A + 74 CH3ACH3 + 75 A!N$A + 76 C=C(A)A + 77 NAN + 78 C=N + 79 NAAN + 80 NAAAN + 81 SA(A)A + 82 ACH2QH + 83 QAAAA@1 + 84 NH2 + 85 CN(C)C + 86 CH2QCH2 + 87 X!A$A + 88 S + 89 OAAAO + 90 QHAACH2A + 91 QHAAACH2A + 92 OC(N)C + 93 QCH3 + 94 QN + 95 NAAO + 96 5M RING + 97 NAAAO + 98 QAAAAA@1 + 99 C=C + 100 ACH2N + 101 8M RING + 102 QO + 103 CL + 104 QHACH2A + 105 A$A($A)$A + 106 QA(Q)Q + 107 XA(A)A + 108 CH3AAACH2A + 109 ACH2O + 110 NCO + 111 NACH2A + 112 AA(A)(A)A + 113 Onot%A%A + 114 CH3CH2A + 115 CH3ACH2A + 116 CH3AACH2A + 117 NAO + 118 ACH2CH2A > 1 + 119 N=A + 120 HETEROCYCLIC ATOM > 1 (&...) + 121 N HETEROCYCLE + 122 AN(A)A + 123 OCO + 124 QQ + 125 AROMATIC RING > 1 + 126 A!O!A + 127 A$A!O > 1 (&...) + 128 ACH2AAACH2A + 129 ACH2AACH2A + 130 QQ > 1 (&...) + 131 QH > 1 + 132 OACH2A + 133 A$A!N + 134 X (HALOGEN) + 135 Nnot%A%A + 136 O=A > 1 + 137 HETEROCYCLE + 138 QCH2A > 1 (&...) + 139 OH + 140 O > 3 (&...) + 141 CH3 > 2 (&...) + 142 N > 1 + 143 A$A!O + 144 Anot%A%Anot%A + 145 6M RING > 1 + 146 O > 2 + 147 ACH2CH2A + 148 AQ(A)A + 149 CH3 > 1 + 150 A!A$A!A + 151 NH + 152 OC(C)C + 153 QCH2A + 154 C=O + 155 A!CH2!A + 156 NA(A)A + 157 C-O + 158 C-N + 159 O > 1 + 160 CH3 + 161 N + 162 AROMATIC + 163 6M RING + 164 O + 165 RING + 166 FRAGMENTS + + MACCS 322 keys set as defined in tables 1, 2 and 3 [ Ref 46 ] + include: + + . 26 atom properties of type P, as listed in Table 1 + . 32 one-atom environments, as listed in Table 3 + . 264 atom-bond-atom combinations listed in Table 4 + + Total number of keys in three tables is : 322 + + Atom symbol, X, used for 322 keys [ Ref 46 ] doesn't refer to + Halogens as it does for 166 keys. In order to keep the definition of + 322 keys consistent with the published definitions, the symbol X is + used to imply "others" atoms, but it's internally mapped to symbol X + as defined for 166 keys during the generation of key values. + + Atom properties-based keys (26): + + Key Description + 1 A(AAA) or AA(A)A - atom with at least three neighbors + 2 Q - heteroatom + 3 Anot%not-A - atom involved in one or more multiple bonds, not aromatic + 4 A(AAAA) or AA(A)(A)A - atom with at least four neighbors + 5 A(QQ) or QA(Q) - atom with at least two heteroatom neighbors + 6 A(QQQ) or QA(Q)Q - atom with at least three heteroatom neighbors + 7 QH - heteroatom with at least one hydrogen attached + 8 CH2(AA) or ACH2A - carbon with at least two single bonds and at least + two hydrogens attached + 9 CH3(A) or ACH3 - carbon with at least one single bond and at least three + hydrogens attached + 10 Halogen + 11 A(-A-A-A) or A-A(-A)-A - atom has at least three single bonds + 12 AAAAAA@1 > 2 - atom is in at least two different six-membered rings + 13 A($A$A$A) or A$A($A)$A - atom has more than two ring bonds + 14 A$A!A$A - atom is at a ring/chain boundary. When a comparison is done + with another atom the path passes through the chain bond. + 15 Anot%A%Anot%A - atom is at an aromatic/nonaromatic boundary. When a + comparison is done with another atom the path + passes through the aromatic bond. + 16 A!A!A - atom with more than one chain bond + 17 A!A$A!A - atom is at a ring/chain boundary. When a comparison is done + with another atom the path passes through the ring bond. + 18 A%Anot%A%A - atom is at an aromatic/nonaromatic boundary. When a + comparison is done with another atom the + path passes through the nonaromatic bond. + 19 HETEROCYCLE - atom is a heteroatom in a ring. + 20 rare properties: atom with five or more neighbors, atom in + four or more rings, or atom types other than + H, C, N, O, S, F, Cl, Br, or I + 21 rare properties: atom has a charge, is an isotope, has two or + more multiple bonds, or has a triple bond. + 22 N - nitrogen + 23 S - sulfur + 24 O - oxygen + 25 A(AA)A(A)A(AA) - atom has two neighbors, each with three or + more neighbors (including the central atom). + 26 CHACH2 - atom has two hydrocarbon (CH2) neighbors + + Atomic environments properties-based keys (32): + + Key Description + 27 C(CC) + 28 C(CCC) + 29 C(CN) + 30 C(CCN) + 31 C(NN) + 32 C(NNC) + 33 C(NNN) + 34 C(CO) + 35 C(CCO) + 36 C(NO) + 37 C(NCO) + 38 C(NNO) + 39 C(OO) + 40 C(COO) + 41 C(NOO) + 42 C(OOO) + 43 Q(CC) + 44 Q(CCC) + 45 Q(CN) + 46 Q(CCN) + 47 Q(NN) + 48 Q(CNN) + 49 Q(NNN) + 50 Q(CO) + 51 Q(CCO) + 52 Q(NO) + 53 Q(CNO) + 54 Q(NNO) + 55 Q(OO) + 56 Q(COO) + 57 Q(NOO) + 58 Q(OOO) + + Note: The first symbol is the central atom, with atoms bonded to the + central atom listed in parentheses. Q is any non-C, non-H atom. If + only two atoms are in parentheses, there is no implication + concerning the other atoms bonded to the central atom. + + Atom-Bond-Atom properties-based keys: (264) + + Key Description + 59 C-C + 60 C-N + 61 C-O + 62 C-S + 63 C-Cl + 64 C-P + 65 C-F + 66 C-Br + 67 C-Si + 68 C-I + 69 C-X + 70 N-N + 71 N-O + 72 N-S + 73 N-Cl + 74 N-P + 75 N-F + 76 N-Br + 77 N-Si + 78 N-I + 79 N-X + 80 O-O + 81 O-S + 82 O-Cl + 83 O-P + 84 O-F + 85 O-Br + 86 O-Si + 87 O-I + 88 O-X + 89 S-S + 90 S-Cl + 91 S-P + 92 S-F + 93 S-Br + 94 S-Si + 95 S-I + 96 S-X + 97 Cl-Cl + 98 Cl-P + 99 Cl-F + 100 Cl-Br + 101 Cl-Si + 102 Cl-I + 103 Cl-X + 104 P-P + 105 P-F + 106 P-Br + 107 P-Si + 108 P-I + 109 P-X + 110 F-F + 111 F-Br + 112 F-Si + 113 F-I + 114 F-X + 115 Br-Br + 116 Br-Si + 117 Br-I + 118 Br-X + 119 Si-Si + 120 Si-I + 121 Si-X + 122 I-I + 123 I-X + 124 X-X + 125 C=C + 126 C=N + 127 C=O + 128 C=S + 129 C=Cl + 130 C=P + 131 C=F + 132 C=Br + 133 C=Si + 134 C=I + 135 C=X + 136 N=N + 137 N=O + 138 N=S + 139 N=Cl + 140 N=P + 141 N=F + 142 N=Br + 143 N=Si + 144 N=I + 145 N=X + 146 O=O + 147 O=S + 148 O=Cl + 149 O=P + 150 O=F + 151 O=Br + 152 O=Si + 153 O=I + 154 O=X + 155 S=S + 156 S=Cl + 157 S=P + 158 S=F + 159 S=Br + 160 S=Si + 161 S=I + 162 S=X + 163 Cl=Cl + 164 Cl=P + 165 Cl=F + 166 Cl=Br + 167 Cl=Si + 168 Cl=I + 169 Cl=X + 170 P=P + 171 P=F + 172 P=Br + 173 P=Si + 174 P=I + 175 P=X + 176 F=F + 177 F=Br + 178 F=Si + 179 F=I + 180 F=X + 181 Br=Br + 182 Br=Si + 183 Br=I + 184 Br=X + 185 Si=Si + 186 Si=I + 187 Si=X + 188 I=I + 189 I=X + 190 X=X + 191 C#C + 192 C#N + 193 C#O + 194 C#S + 195 C#Cl + 196 C#P + 197 C#F + 198 C#Br + 199 C#Si + 200 C#I + 201 C#X + 202 N#N + 203 N#O + 204 N#S + 205 N#Cl + 206 N#P + 207 N#F + 208 N#Br + 209 N#Si + 210 N#I + 211 N#X + 212 O#O + 213 O#S + 214 O#Cl + 215 O#P + 216 O#F + 217 O#Br + 218 O#Si + 219 O#I + 220 O#X + 221 S#S + 222 S#Cl + 223 S#P + 224 S#F + 225 S#Br + 226 S#Si + 227 S#I + 228 S#X + 229 Cl#Cl + 230 Cl#P + 231 Cl#F + 232 Cl#Br + 233 Cl#Si + 234 Cl#I + 235 Cl#X + 236 P#P + 237 P#F + 238 P#Br + 239 P#Si + 240 P#I + 241 P#X + 242 F#F + 243 F#Br + 244 F#Si + 245 F#I + 246 F#X + 247 Br#Br + 248 Br#Si + 249 Br#I + 250 Br#X + 251 Si#Si + 252 Si#I + 253 Si#X + 254 I#I + 255 I#X + 256 X#X + 257 C$C + 258 C$N + 259 C$O + 260 C$S + 261 C$Cl + 262 C$P + 263 C$F + 264 C$Br + 265 C$Si + 266 C$I + 267 C$X + 268 N$N + 269 N$O + 270 N$S + 271 N$Cl + 272 N$P + 273 N$F + 274 N$Br + 275 N$Si + 276 N$I + 277 N$X + 278 O$O + 279 O$S + 280 O$Cl + 281 O$P + 282 O$F + 283 O$Br + 284 O$Si + 285 O$I + 286 O$X + 287 S$S + 288 S$Cl + 289 S$P + 290 S$F + 291 S$Br + 292 S$Si + 293 S$I + 294 S$X + 295 Cl$Cl + 296 Cl$P + 297 Cl$F + 298 Cl$Br + 299 Cl$Si + 300 Cl$I + 301 Cl$X + 302 P$P + 303 P$F + 304 P$Br + 305 P$Si + 306 P$I + 307 P$X + 308 F$F + 309 F$Br + 310 F$Si + 311 F$I + 312 F$X + 313 Br$Br + 314 Br$Si + 315 Br$I + 316 Br$X + 317 Si$Si + 318 Si$I + 319 Si$X + 320 I$I + 321 I$X + 322 X$X + + --OutDelim *comma | tab | semicolon* + Delimiter for output CSV/TSV text file(s). Possible values: *comma, + tab, or semicolon* Default value: *comma*. + + --output *SD | FP | text | all* + Type of output files to generate. Possible values: *SD, FP, text, or + all*. Default value: *text*. + + -o, --overwrite + Overwrite existing files. + + -q, --quote *Yes | No* + Put quote around column values in output CSV/TSV text file(s). + Possible values: *Yes or No*. Default value: *Yes*. + + -r, --root *RootName* + New file name is generated using the root: <Root>.<Ext>. Default for + new file names: <SDFileName><MACCSKeysFP>.<Ext>. The file type + determines <Ext> value. The sdf, fpf, csv, and tsv <Ext> values are + used for SD, FP, comma/semicolon, and tab delimited text files, + respectively.This option is ignored for multiple input files. + + -s, --size *number* + Size of MACCS keys [ Ref 45-47 ] set to use during fingerprints + generation. Possible values: *166 or 322*. Default value: *166*. + + -v, --VectorStringFormat *ValuesString | IDsAndValuesString | + IDsAndValuesPairsString | ValuesAndIDsString | ValuesAndIDsPairsString* + Format of fingerprints vector string data in output SD, FP or + CSV/TSV text file(s) specified by --output used during + *MACCSKeyCount* value of -m, --mode option. Possible values: + *ValuesString, IDsAndValuesString | IDsAndValuesPairsString | + ValuesAndIDsString | ValuesAndIDsPairsString*. Defaultvalue: + *ValuesString*. + + Examples: + + FingerprintsVector;MACCSKeyCount;166;OrderedNumericalValues;ValuesStri + ng;0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0 0 0 1 0 0 3 0 0 0 0 4 0 0 2 0 0 0 0 0 0 0 0 2 0 0 2 0 0 0 0 + 0 0 0 0 1 1 8 0 0 0 1 0 0 1 0 1 0 1 0 3 1 3 1 0 0 0 1 2 0 11 1 0 0 0 + 5 0 0 1 2 0 1 1 0 0 0 0 0 1 1 0 1 1 1 1 0 4 0 0 1 1 0 4 6 1 1 1 2 1 1 + 3 5 2 2 0 5 3 5 1 1 2 5 1 2 1 2 4 8 3 5 5 2 2 0 3 5 4 1 + + FingerprintsVector;MACCSKeyCount;322;OrderedNumericalValues;ValuesStri + ng;14 8 2 0 2 0 4 4 2 1 4 0 0 2 5 10 5 2 1 0 0 2 0 5 13 3 28 5 5 3 0 0 + 0 4 2 1 1 0 1 1 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 5 3 0 0 0 1 0 + 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 2 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... + + -w, --WorkingDir *DirName* + Location of working directory. Default: current directory. + +EXAMPLES + To generate MACCS keys fingerprints of size 166 in binary bit-vector + string format and create a SampleMACCS166FPBin.csv file containing + sequential compound IDs along with fingerprints bit-vector strings data, + type: + + % MACCSKeysFingerprints.pl -r SampleMACCS166FPBin -o Sample.sdf + + To generate MACCS keys fingerprints of size 166 in binary bit-vector + string format and create SampleMACCS166FPBin.sdf, + SampleMACCS166FPBin.csv and SampleMACCS166FPBin.csv files containing + sequential compound IDs in CSV file along with fingerprints bit-vector + strings data, type: + + % MACCSKeysFingerprints.pl --output all -r SampleMACCS166FPBin + -o Sample.sdf + + To generate MACCS keys fingerprints of size 322 in binary bit-vector + string format and create a SampleMACCS322FPBin.csv file containing + sequential compound IDs along with fingerprints bit-vector strings data, + type: + + % MACCSKeysFingerprints.pl -size 322 -r SampleMACCS322FPBin -o Sample.sdf + + To generate MACCS keys fingerprints of size 166 corresponding to count + of keys in ValuesString format and create a SampleMACCS166FPCount.csv + file containing sequential compound IDs along with fingerprints vector + strings data, type: + + % MACCSKeysFingerprints.pl -m MACCSKeyCount -r SampleMACCS166FPCount + -o Sample.sdf + + To generate MACCS keys fingerprints of size 322 corresponding to count + of keys in ValuesString format and create a SampleMACCS322FPCount.csv + file containing sequential compound IDs along with fingerprints vector + strings data, type: + + % MACCSKeysFingerprints.pl -m MACCSKeyCount -size 322 + -r SampleMACCS322FPCount -o Sample.sdf + + To generate MACCS keys fingerprints of size 166 in hexadecimal + bit-vector string format with ascending bits order and create a + SampleMACCS166FPHex.csv file containing compound IDs from MolName along + with fingerprints bit-vector strings data, type: + + % MACCSKeysFingerprints.pl -m MACCSKeyBits --size 166 --BitStringFormat + HexadecimalString --BitsOrder Ascending --DataFieldsMode CompoundID + --CompoundIDMode MolName -r SampleMACCS166FPBin -o Sample.sdf + + To generate MACCS keys fingerprints of size 166 corresponding to count + of keys in IDsAndValuesString format and create a + SampleMACCS166FPCount.csv file containing compound IDs from MolName line + along with fingerprints vector strings data, type: + + % MACCSKeysFingerprints.pl -m MACCSKeyCount --size 166 + --VectorStringFormat IDsAndValuesString --DataFieldsMode CompoundID + --CompoundIDMode MolName -r SampleMACCS166FPCount -o Sample.sdf + + To generate MACCS keys fingerprints of size 166 corresponding to count + of keys in IDsAndValuesString format and create a + SampleMACCS166FPCount.csv file containing compound IDs using specified + data field along with fingerprints vector strings data, type: + + % MACCSKeysFingerprints.pl -m MACCSKeyCount --size 166 + --VectorStringFormat IDsAndValuesString --DataFieldsMode CompoundID + --CompoundIDMode DataField --CompoundID Mol_ID -r + SampleMACCS166FPCount -o Sample.sdf + + To generate MACCS keys fingerprints of size 322 corresponding to count + of keys in ValuesString format and create a SampleMACCS322FPCount.tsv + file containing compound IDs derived from combination of molecule name + line and an explicit compound prefix along with fingerprints vector + strings data in a column labels MACCSKeyCountFP, type: + + % MACCSKeysFingerprints.pl -m MACCSKeyCount -size 322 --DataFieldsMode + CompoundID --CompoundIDMode MolnameOrLabelPrefix --CompoundID Cmpd + --CompoundIDLabel MolID --FingerprintsLabel MACCSKeyCountFP --OutDelim + Tab -r SampleMACCS322FPCount -o Sample.sdf + + To generate MACCS keys fingerprints of size 166 corresponding to count + of keys in ValuesString format and create a SampleMACCS166FPCount.csv + file containing specific data fields columns along with fingerprints + vector strings data, type: + + % MACCSKeysFingerprints.pl -m MACCSKeyCount --size 166 + --VectorStringFormat ValuesString --DataFieldsMode Specify --DataFields + Mol_ID -r SampleMACCS166FPCount -o Sample.sdf + + To generate MACCS keys fingerprints of size 322 corresponding to count + of keys in ValuesString format and create a SampleMACCS322FPCount.csv + file containing common data fields columns along with fingerprints + vector strings data, type: + + % MACCSKeysFingerprints.pl -m MACCSKeyCount --size 322 + --VectorStringFormat ValuesString --DataFieldsMode Common -r + SampleMACCS322FPCount -o Sample.sdf + + To generate MACCS keys fingerprints of size 166 corresponding to count + of keys in ValuesString format and create SampleMACCS166FPCount.sdf, + SampleMACCS166FPCount.fpf and SampleMACCS166FPCount.csv files containing + all data fields columns in CSV file along with fingerprints vector + strings data, type: + + % MACCSKeysFingerprints.pl -m MACCSKeyCount --size 166 --output all + --VectorStringFormat ValuesString --DataFieldsMode All -r + SampleMACCS166FPCount -o Sample.sdf + +AUTHOR + Manish Sud <msud@san.rr.com> + +SEE ALSO + InfoFingerprintsFiles.pl, SimilarityMatricesFingerprints.pl, + AtomNeighborhoodsFingerprints.pl, ExtendedConnectivityFingerprints.pl, + PathLengthFingerprints.pl, TopologicalAtomPairsFingerprints.pl, + TopologicalAtomTorsionsFingerprints.pl, + TopologicalPharmacophoreAtomPairsFingerprints.pl, + TopologicalPharmacophoreAtomTripletsFingerprints.pl + +COPYRIGHT + Copyright (C) 2015 Manish Sud. All rights reserved. + + This file is part of MayaChemTools. + + MayaChemTools is free software; you can redistribute it and/or modify it + under the terms of the GNU Lesser General Public License as published by + the Free Software Foundation; either version 3 of the License, or (at + your option) any later version. +