| 1. | Wall, L.; Christiansen, T.; Schwartz, R.L. Programming Perl, 2nd edition. O'Reilly Media Inc., September 1996. | |
| 2. | CPAN: Comprehensive Perl archive network. | |
| 3. | FSF: Free software foundation. | |
| 4. | Knuth, D.E. The art of computer programming. Vol. 1-3. 2nd edition. Addison-Wesley, September 1998. | |
| 5. | Press, W.H.; Teukolsky, S.A.; Vetterling, W.T.; Flannery, B.P. Numerical recipies in C: the art of scientific computing. 2nd edition. Cambridge University Press, 1992. | |
| 6. | Orwant, J.; MacDonald, J.; Hietaniemi, J. Mastering algorithms with Perl. O'Reilly Media Inc., August 1999. | |
| 7. | Data for elements in the periodic table. | |
| 8. | Isotope data for elements in the periodic table. | |
| 9. | Main data source for amino acids. | |
| 10. | PerlMol - Perl modules for molecular chemistry. | |
| 11. | OpenBabel: The open source chemistry toolbox. | |
| 12. | CDK: The chemistry development kit. | |
| 13. | JOELIB. | |
| 14. | CTFile Formats. | |
| 15. | Conway, D. Object oriented Perl. 1st edition. O'Reilly Media Inc., January 2000. | |
| 16. | Friedl, J.E.F. Mastering regular expressions. 3rd edition. O'Reilly Media Inc., August 2006. | |
| 17. | Schulz, G.E.; Schirmer, R.H. Principles of protein structure. Springer-Verlag, January 1997. | |
| 18. | Saenger, W. Principles of nucleic acid structure. Springer-Verlag, 1983. | |
| 19. | Cornish-Bowden, A. Nomenclature for incompletely specified bases in nucleic acid sequence. Nucleic Acids Res. 1985, 13, 3021-3030. | |
| 20. | Clapham, C. A concise Oxford dictionary of mathematics. Oxford University Press, 1990. | |
| 21. | Cook, J.L. Conversion factors. Oxford University Press, 1993. | |
| 22. | Pauling, L. The nature of chemical bond. 3rd edition. Cornell University Press, June 1960. | |
| 23. | Daylight theory manual. | |
| 24. | Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Am. Chem. Soc. 1988, 28, 31-36. | |
| 25. | Weininger, D.; Weininger, A.; Weininger, J.L. SMILES. 2. Algorithm for generation of unique SMILES notation. J. Am. Chem. Soc. 1989, 29, 97-101. | |
| 26. | Weininger, D. SMILES. 3. Depit. Graphical depiction of chemical structures. J. Am. Chem. Soc. 1990, 30, 237-243. | |
| 27. | OEChem TK manual. | |
| 28. | Parkin, G. Valence, oxidation number, and formal charge: Three related but fundamentally different concepts. J. Chem. Educ. 2006, 83, 791-799. | |
| 29. | Gateiger, J.; Jochum, C. An algorithm for the perception of synthetically important rigngs. J. Chem. Inf. Comput. Sci. 1979, 19, 43-47. | |
| 30. | Balducci, R.; Pearlman, R.S. Efficient exact solution of the ring perception problem. J. Chem. Inf. Comput. Sci. 1994, 34, 822-831. | |
| 31. | Hanser, T.; Jauffret, P.; Kaufmann, G. A new algorithm for exhaustive ring perception in a molecular graph. J. Chem. Inf. Comput. Sci. 1996, 36, 1146-1152. | |
| 32. | Cahn, R.S.; Ingold, C.; Prelog, V. Specification of molecular chirality. Angew. Chem. Internat. Edit. 1966, 5, 385-415. | |
| 33. | Prelog, V.; Helmchen, G. Basic principles of the CIP-system and proposals for revision. Angew. Chem. Internat. Edit. 1982, 21, 567-583. | |
| 34. | Mata, P.; Lobo, A.M.; Marshall, C.; Johnson, P.A. The CIP seqeunce rules: Analysis and proposal for a revision. Tetrahedron. 1993, 4, 657-668. | |
| 35. | Nourse, J.G.; Carhart, R.E.; Smith, D.H.; Djerassi, C. Exhaustive generation of stereoisomers for structure elucidation. J. Am. Chem. Soc. 1979, 101, 1216-1223. | |
| 36. | Nourse, J.G.; Smith, D.H.; Carhart, R.E.; Djerassi, C. Computer-assisted elucidation of molecular structue with stereochemistry. J. Am. Chem. Soc. 1980, 102, 6289-6295. | |
| 37. | Fused ring systems. | |
| 38. | A hash function for hash table lookup. | |
| 39. | Ralaivola, L.; Swamidass, S.J.; Saigo, H.; Baldi, P. Graph kernals for chemical informatics. Neural Networks. 2005, 18, 1093-1110. | |
| 40. | Willett. P.; Barnard, J.M.; Downs, G.M. Chemical Similarity Searching. J. Chem. Inf. Comput. Sci. 1998, 38, 983-996. | |
| 41. | Holliday, J.D.; Hu, C-Y.; Willett, P. Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings, Combinatorial Chemistry & High Throughput Screening. 2002, Vol. 5, No. 2, 155-166. | |
| 42. | Flinger, M.; Verducci, J.; Blower, P. A modification of the Jacard-Tanimoto similarity index for diverse selection of chemical compounds using binary strings. Technometrics. 2002, 44, 110-119. | |
| 43. | Wang, Y.; Bajorath, J. Balancing the influence of molecular complexity in fingerprint similarity searching. J. Chem. Inf. Comput. Sci. 2008, 48, 75-84. | |
| 44. | Flower, D.R. On the properties of bit string-based measures of chemical similarity. J. Chem. Inf. Comput. Sci. 1998, 38, 379-386. | |
| 45. | The Enkfil.dat and Eksfil.dat files: The keys to understanding MDL keyset technology. | |
| 46. | Durant, J.L.; Leland, B.A.; Henry, D.H.; Nourse, J.G. Reoptimization of MDL Keys for Use in Drug Discovery. J. Chem. Inf. Comput. Sci. 2002, 42, 1273-1280. | |
| 47. | Description of public MACCS keys. | |
| 48. | Morgan, H.L. The generation of a unique machine description for chemical structures - A technique developed at chemical abstracts service. J. Chem. Doc. 1965, 5, 107-112. | |
| 49. | Penny, R.H. A connectivity code for use in describing chemical structures. J. Chem. Doc. 1965, 5, 113-117. J. Chem. Doc. 1973, 3, 153-157. | |
| 50. | Adamson, G.W.; Cowell, J.; Lynch, M.F.; McLure, A.H.; Town, W.G. Yapp, M. Strategic considerations in design of a screening system for substructure searches of chemical structure files. | |
| 51. | Wipke, W.T.; Krishnan, S.; Ouchi, G.I. Hash functions for rapid storage and retrieval of chemical structures. J. Chem. Inf. Comput. Sci. 2002, 42, 1273-1280. 1978, 18, 31- . | |
| 52. | Rogers, D.; Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Mod. 2010, 50, 742-754. | |
| 53. | Faulon, J.-L.; Visco, D.P., Jr.; Pophale, R.S. The Signature Molecular Descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. J. Chem. Inf. Comput. Sci. 2003, 43, 707-720. | |
| 54. | Faulon, J.-L.; Collins, M.J.; Carr, R.D. The signature molecular descriptor. 4. Canonizing molecules using extended valence sequences. J. Chem. Inf. Comput. Sci. 2004, 44, 427-436. | |
| 55. | Bender, A.; Mussa, H.Y.; Glen, R.C.; Reiling, S. Molecular similarity searching using atom environments, information-based feature selection, and a naive bayesian classifier. J. Chem. Inf. Comput. Sci. 2004, 44, 170-178. | |
| 56. | Bender, A.; Mussa, H.Y.; Glen, R.C.; Reiling, S. Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): Evaluation of performance. J. Chem. Inf. Comput. Sci. 2004, 44, 1708-1718. | |
| 57. | Carhart, R.E.; Smith, D.H.; Venkataraghavan, R. Atom pairs as molecular features in structure-activity studies: Definition and application. J. Chem. Inf. Comput. Sci. 1985, 25, 64-73. | |
| 58. | Nilakantan, R.; Bauman, N.; Dixon, J.S.; Venkataraghavan, R. Topological torsion: A new molecular descriptor for SAR applications. Comparison with other descriptors. J. Chem. Inf. Comput. Sci. 1987, 27, 82-85. | |
| 59. | Langham, J.L.; Jain, A.N. Accurate and interpretable computational modeling of chemical mutagenicity. J. Chem. Inf. Comput. Sci. 2008, 48, 1833-1839. | |
| 60. | Schneider, G.; Neidhart, W.; Giller, T.; Schmid, G. Scaffold-hopping by topological pharmacophore search: A contribution to virtual screening. Angew. Chem. Int. Ed. 1999, 38, 2894-2896. | |
| 61. | Fechner, U.; Franke, L.; Renner, S.; Schneider, P. Schneider, G. Comparison of correlation vector methods for ligand-based similarity searching. J. Comput. Aided Mol. Des. 2003, 17, 687-698. | |
| 62. | Fechner, U.; Schneider, G. Evaluation of distance metrics for ligand-based similarity searching. ChemBioChem. 2004, 5, 538-540. | |
| 63. | Downs, G.M.; Willett, P.; Fisanick, W. Similarity searching and clustering of chemical-structure databases using molecular property data. J. Chem. Inf. Comput. Sci., 1994, 34, 1094-1102. | |
| 64. | Chen, X.; Reynolds, C.H.; Performance of similarity measures in 2D fragment-based similarity searching: Comparison of structural descriptors and similarity coefficients. J. Chem. Inf. Comput. Sci. 2002, 42, 1407-1414. | |
| 65. | Steffen, R.; Fechner, U.; Schneider, G. Alignment-free pharmacophore patterns: A correlation-vector approach. Pharmacophores and pharmacophore searches. 2006. Volume 32. Wiley-VCH. 49-80. | |
| 66. | McGregor, M.J.; Muskal, S. M. Pharmacophore fingerprinting. 1. Application to QSAR and focused library design. J. Chem. Inf. Comput. Sci. 1999, 39, 569-574. | |
| 67. | Floyd, R.W. Algorithm 97: Shortest path. Communications of the ACM. 1962, 5, 345. | |
| 68. | Horvath, D. Topological pharmacophores. Cheminformatics approaches to virtual screening. 2008. RSC Publishing. 44-75. | |
| 69. | Ewing, T.; Baber, C.; Feher, M. Novel 2D fingerprints in ligand-based virtual screening. J. Chem. Inf. Model. 2006, 46, 2423-2431. | |
| 70. | Watson, P. Naive Bayes classification using 2D pharmacophore feature triplet vectors. J. Chem. Inf. Model. 2008, 48, 166-178 | |
| 71. | Bonachera, F.; Parent, B.; Barbosa, F.; Froloff, N.; Horvath, D. Fuzzy tricentric pharmacophore fingerprints. 1. Topological fuzzy pharmacophore triplets and adapted molecular similarity scoring schemes. J. Chem. Inf. Model., 2006, 46, 2457-2477. | |
| 72. | Kearsley, S.K.; Sallamack, S.; Fluder, E.M.; Andose, J.D.; Mosley, R.T.; Sheridan, R.P. Chemical Similarity Using Physiochemical Property Descriptors.J. Chem. Inf. Comput. Sci., 1996, 36, 118-127. | |
| 73. | Filimonov, D.; Poroikov, V.; Borodina, Y.; Gloriozova, T. Chemical similarity assessment through multilevel neighborhoods of atoms: Definition and comparison with the other Descriptors. J. Chem. Inf. Comput. Sci., 1999, 39, 666-670. | |
| 74. | RDKit - Cheminformatics and Machine Learning Software. | |
| 75. | Kier, L.B.; Hall, L.H. Electrotopological state indices for atom types: A novel combination of electronic, topological, and valence state information. J. Chem. Inf. Comput. Sci. 1995, 35, 1039-1045. | |
| 76. | Kier, L.B.; Hall, L.H. Molecular structure description - The electrotopological state. Academic Press, 1999. | |
| 77. | Molconn-Z - Program for generation of Molecular Connectivity, Shape, and Information Indices. | |
| 78. | Kier, L.B.; Hall, L.H. The E-State as the basis for molecular structure space definition and structure similarity. J. Chem. Inf. Comput. Sci. 2000, 40, 784-791. | |
| 79. | SYBYL atom types. | |
| 80. | Clark, M.; Cramer III, R.D.; Opdenbosch, N.V. Validation of the general purpose Tripos 5.2 forcefield. J. Comput. Chem. 1989, 10, 982-1012. | |
| 81. | Rappe, A.K.; Casewit, C.J.; Colwell, K.S.; Goddard III, W.A.; Skiff, W.M. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 1992, 114, 10024-10035. | |
| 82. | Rappe, A. K. Personal communication. 2009. | |
| 83. | Halgren, T.A.; Merck Molecular Force Field. I. Basis, Form, Scope, Parameterization, and Performance of MMFF94. 1996, J. Comput. Chem., 17, 490-519. | |
| 84. | Halgren, T.A.; Merck molecular force field. II. MMFF94 van der Waals and electrostatic parameters for intermolecular interactions. J. Compt. Chem. 1996, 17, 520-552. | |
| 85. | Halgren, T.A.; Merck molecular force field. III. Molecular geometries and vibrational frequencies for MMFF94. J. Compt. Chem. 1996, 17, 553-586. | |
| 86. | Halgren, T.A.; Nachbar, R. B.; Merck molecular force field. IV. conformational energies and geometries for MMFF94. J. Compt. Chem. 1996, 17, 587-615. | |
| 87. | Halgren, T.A.; Merck molecular force field. V. Extension of MMFF94 using experimental data, additional computational data, and empirical rules. J. Compt. Chem. 1996, 17, 616-641. | |
| 88. | Mayo, S.L.; Olafson, B.A.; Goddard III, W.A. DREIDING: A Generic Force Field for Molecular Simulations. J. Phys. Chem. 1990, 94, 8897-8909. | |
| 89. | Wildman, S.A.; Crippen, G.M.; Prediction of Physicochemical Parameters by Atomic Contributions. J. Chem. Inf. Comput. Sci. 1999, 39, 868-873. | |
| 90. | Ertl, P.; Rohde, B.; Selzer, P. Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport Properties. J. Med. Chem. 2000, 43, 3714-3717. | |
| 91. | Ertl, P. Personal communication. 2010. | |
| 92. | Veber, D.F.; Johnson, S. R.; Chend, H.Y.; Smith, B.R.; Ward, K.W.; Kopple, K.D. Molecular properties that influence the oral bioavailability of drug candidates. J. Med. Chem. 2002, 45, 2165-2623. | |
| 91. | Lipinski, C.A.; Lombardo, F.; Dominy, B.W.; Feeney, P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug. Del. Rev. 1997, 23, 3-25. | |
| 92. | Congreve M.; Carr R., Murray C., Jhoti H.A. 'rule of three' for fragment-based lead discovery? Drug. Discov. Today. 2003, 8, 876-877. | |
| 93. | Zhao, Y.H.; Abraham, M.H.; Zissimos, A.M. Fast calculation of van der Waals volume as a sum of atomic and bond contributions and its application to drug compounds. J. Org. Chem. 2003, 68, 7368-7373. | |
| 94. | Chen, J.; Holliday, J.; Bradshaw, J.A machine learning approach to weighting schemes in the data fusion of similarity coefficients. J. Chem. Inf. Model. 2009, 49, 185-194. | |
| 95. | Williams, C. Reverse fingerprinting, similarity searching by group fusion and fingerprint bit importance. Molecular Diversity. 2006, 10, 311-332. | |
| 96. | Whittle, M.; Gillet, V.J.; Willett, P.; Loesel, J. Analysis of data fusion methods in virtual screening: Similarity and group Fusion. J. Chem. Inf. Model. 2006, 46, 2206-2219. | |
| 97. | Hert, J.; Willett, P.; Wilton, D.J.; Acklin, P.; Azzaoui, K.; Jacoby, E.; Schuffenhauer, A. New methods for ligand-based virtual screening: Use of data fusion and machine learning to enhance the effectiveness of similarity searching. J. Chem. Inf. Model. 2006, 46, 462-470. | |
| 98. | Chu, C-W.; Holliday, J.D.; Willett, P. Effect of data standardization on chemical clustering and similarity searching. J. Chem. Inf. Model., 2009, 49, 155-161. | |
| 99. | Arif, S.M.; Holliday, J.D.; Willett, P. Inverse frequency weighting of fragments for similarity-based virtual screening. J. Chem. Inf. Model., 2010, 50, 1340-1349. | |
| 100. | Chen, B.; Mueller, C.; Willett, P. Combinations rules for group fusion in similarity-based virtual screening. Mol. Inf. 2010, 29, 533-541. | |
| 101. | Willett, P.; Similarity searching using 2D structural fingerprints. Chemoinformatics and Computational Chemical Biology. Methods in Molecular Biology. 2011, 672, 133-58. | |
| 102. | Berglund, A.E.; Head, R.D. PZIM: A method for similarity searching using atom environments and 2d alignment. J. Chem. Inf. Model. 2010, 50, 1790-1795. | |
| 103. | Baldi, P.; Nasr, R. When is chemical similarity significant? The statistical distribution of chemical similarity scores and its extreme values. J. Chem. Inf. Model. 2010, 50, 1205-1222. | |
| 104. | Godden, J.W.; Stahura, F.L,; Bajorath, J. Anatomy of fingerprint search calculations on structurally diverse sets of active compounds. J. Chem. Inf. Model. 2005, 45, 1812-1819. | |
| 105. | Geppert, H.; Horvath, T.; Gartner, T.; Wrobel, S.; Bajorath, J. Support-vector-machine-based ranking significantly improves the effectiveness of similarity searching using 2d fingerprints and multiple reference compounds. J. Chem. Inf. Model. 2008, 48, 742-746. | |
| 106. | Wang, Y.; Geppert, H.; Bajorath, J. Shannon entropy-based fingerprint similarity search strategy. J. Chem. Inf. Model., 2009, 49, 1687-1691. | |
| 107. | Nisius, B.; Bajorath, J. Molecular fingerprint recombination: Generating hybrid fingerprints for similarity searching from different fingerprint types. ChemMedChem. 2009, 4, 1859-1863. | |
| 108. | Vogt, M.; Bajorath, J. Predicting the Performance of Fingerprint Similarity Searching. Chemoinformatics and Computational Chemical Biology. Methods in Molecular Biology. 2011, 672, 159-173. | |
| 109. | Muchmore, S.W.; Debe, D.A.; Metz, J.T.; Brown, S.P.; Martin, Y. .; Hajduk, P. H. Application of belief theory to similarity data fusion for use in analog searching and lead hopping. J. Chem. Inf. Model. 2008, 48, 941-948. | |
| 110. | Bender, A.; Jenkins, J.L.; Scheiber, J.; Sukuru, S.C.K.; Glick, M.; Davies, J. W. How similar are similarity searching methods? A principal component analysis of molecular descriptor space. J. Chem. Inf. Model. 2009, 49, 108-119. | |
| 111. | Sastry, M.; Lowrie, J.F.; Dixon, S.L.; Sherman, W. Large-scale sstematic analysis of 2D fingerprint methods and parameters to improve virtual screening enrichments. J. Chem. Inf. Model. 2010, 50, 771-784. | |
| 112. | Tiikkainen, P.; Markt, P.; Wolber, G.; Kirchmair, J.; Distinto, S.; Poso, A.; Kallioniemi. O. Critical comparison of virtual screening methods against the MUV data set. J. Chem. Inf. Model., 2009, 49, 2168-2178. | |
| 113. | Venkatraman, V.; Prez-Nueno, V. I.; Mavridis L.; Ritchie, D.W. Comprehensive comparison of ligand-based virtual screening tools against the DUD data set reveals limitations of current 3D methods. J. Chem. Inf. Model., 2010, 50, 2079-2093. | |
| 114. | Chemfp - Cheminformatics fingerprints file formats and tools. | |
| 115. | Yan, A.; Gasteiger, J.; Prediction of aqueous solubility of organic compounds by topological descriptors. QSAR Comb Sci. 2003, 22, 821-829. | |
| 116. | Lovering, F.; Bikker, J.; Humblet, C. Escape from flatland: Increasing saturation as an approach to improving clinical success. J. Med. Chem. 2009, 52, 6752-6756. | |
| 117. | Hann, M.M.; Leach, A.R.; Harper, G. Molecular complexity and its impact on the probability of finding leads for drug discovery. J. Chem. Inf. Comput. Sci. 2001, 41, 856-864. | |
| 118. | Schuffenhauer, S.; Brown, N.; Selzer, P.; Ertl, P.; Jacoby, E. Relationships between molecular complexity, biological activity, and structural diversity. J. Chem. Inf. Model., 2006, 46, 525-535. | |
| 119. | Walters, W.P.; Green, J.; Weiss, J.R.; Murcko, M. A. What do medicinal chemists actually make? A 50-year retrospective. J. Med. Chem. 2011, 54, 6405-6416. | |
| 120. | Park, S.K.; Miller, K.W. Random number generators: Good ones are hard to find. Communications of the ACM. 1998, 10, 1192- 1200. | |
| 121. | Huang R.; Southall N.; Wang Y.; Yasgar A.; Shinn P.; Jadhav A.; Nguyen D. T.; Austin C. P. The NCGC pharmaceutical collection: A comprehensive resource of clinically approved drugs enabling repurposing and chemical genomics. Sci. Transl. Med. 2011, 80ps16. | |
| 122. | Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank. Nucleic Acids Research. 2000, 28, 235-242. | |
| 123. | Jmol: An open-source Java viewer for chemical structures in 3D. | |
| 124. | Lloyd, D. What is aromaticity? J. Chem. Inf. Comput. Sci. 1996, 36, 442-447. | |
| 125. | Sayle, R. Cheminformatics toolkits: A personal perspective. | |
| 126. | Dominus, M. J. Higher-order Perl. | |
| 127. | OpenSMILES. | |
| 128. | Tim Vandermeersch. OpenSMARTS. |