Minghu Song, Ph.D.
Computer Aided-Drug Discovery, Chemoinformatics, Artificial Intelligence and Interpretable Machine Learning in drug discovery, Large-scale biochemical data analyses and Interactive Data Visualization
BS & MS in Chemistry: Xiamen University, 1992-1999
PhD in Organic Chemistry: Rensselaer Polytechnic Institute, 1999-2004
MS in Data Science: UC Berkeley, School of Information, 2015-2017
Research Scientist II, Synta Pharmaceuticals, Lexington, MA 2005-2009
Senior Scientist, Pfizer Inc., Groton, CT 2009-2015
Research Scientist, Yale Center of Molecular Discovery, West Haven, CT 2017-2018
Principal Scientist & CADD group leader, ChemPartner Inc., San Francisco, CA 2019-present
My research focuses on developing novel computational chemistry or chemoinformatics tools to facilitate the design-make-test-analyze (DMTA) cycle in drug discovery while simultaneously collaborating with bench biologists or chemists to apply appropriate computer-aided drug discovery approaches to design novel therapeutic agents. Currently, we have four ongoing research projects:
- Incorporate the relevant biochemical context or existing physics-driven methods into the future machine learning algorithm development for drug discovery: The first wave of applying advanced machine learning or AI approaches in the pharmaceutical research (e.g. organic chemistry synthesis and molecular property prediction) has emerged in recent years. However, most recent ML methods that were originally introduced for imaging recognition and NLP don’t considering the complexity and context of biological systems. In order to develop the 2nd generation of machine learning applications for drug discovery, it is crucial to incorporate either biophysics domain knowledge or accurate physics-driven calculation into the machine learning algorithm development or interpretation processes.
- Build the Data Science products to enable biochemical Data-to-Knowledge (D2K) transition: The goal in this project is to build a knowledge base of protein-ligand interactions and develop the automated processing pipeline or an API so that external researchers can extract pre-calculated features or representations of different protein-ligand binding pockets to build ML models to predict protein-ligand affinity or recognize certain binding sites of chemical matters, visualize web-based interactive plots and submit query to find binding sites with similar shape or 3D interaction motifs.
- Large-scale molecular similarity search and benchmark various implementations of fast similarity search methods with different indexing schemes: Molecule similarity search is an effective computational approach to identify potentially active compounds from a large molecular database. The repaid increase in available chemical space requires faster chemical similarity search tools. We are building the benchmark to comparing multiple large-scale similarity search algorithms and evaluate their performance under different indexing schemes in an unbiased fashion. All methods and input/output datasets would be publically available, which can be exploited to validate new method development in future.
- Interactive visualization of biochemical space and the application of virtual reality in chemical diversity analyses or structure-based drug design
Details about some of the above research projects can be found at our ML-DDD@CT (Machine Learning in Drug Discovery & Development at CT) group website (https://mlddd-ct.github.io/).
- X. Wang, J. Bi, S. Yu, J. Sun, M. Song, Multiplicative Multitask Feature Learning, Journal of Machine Learning Research, 2016, 17(80):1−33.
- N. Greene, M. Song. Predicting in vivo safety characteristics using physiochemical properties and in vitro assays. Future Medicinal Chemistry, 2011, 3(12), 1503-1511.
- S. Chen, Z. Xia, M. Nagai, R. Lu, E. Kostik, T. Przewloka, M. Song, D. Chimmanamada, D. James, S, Zhang, J. Jiang, M. Ono, K. Koya and L. Sun. Novel indolizine compounds as potent inhibitors of phosphodiesterase IV (PDE4): structure–activity relationship. MedChemComm. 2011, 2, 176-180.
- M. Song, M. Clark, Developing and Evaluation of In Silico Models for hERG Liability Prediction. J. Chem. Inf. Model. 2006, 46(1), 392-400.
- M. Song; C. M. Breneman; N. Sukumar. Three Dimensional Quantitative Structure-Activity Relationship Analyses of Piperidine-based CCR5 Receptor Antagonists. Bioorgan. Med. Chem. 2004, 12, 489-499.
- N. Tugcu, M. Song, C. M. Breneman, N. Sukumar, K. Bennett, S. Cramer. Mobile Phase Salt Type Effects for Protein Retention and Selectivity for Anion Exchange Systems. Anal. Chem. 2003, 75(14), 3563-3572.
- J. Bi; K. Bennett; M. Embrechts; C. Breneman; M. Song. Dimensionality Reduction via Sparse Support Vector Machines. Journal of Machine Learning Research, 2003,Vol.3, 1229-1243.
- M. Song; C. M. Breneman; J. Bi; N. Sukumar; N. Tugcu; K. P. Bennett; S. Cramer. Prediction of Protein retention Time in Anion-exchange chromatography using Support Vector Machine Regression. J. Chem. Inf. Comput. Sci. 2002, 42(6), 1347-1357.
Patent on the discovery of HSP90 inhibitor, Ganetespib (STA-9090), in phase 3 clinical trials:
- Ying; D. James, S. Zhang; J. Chae; T. Przewloka; H. Ng; Demko; Zachary, Chimmanamada; Dinesh U., Lee; Chi-wan, Du; Zhenjian, Foley; Kevin, Song; Minghu, Sun; Lijun, Koya; Keizo, Zhou; Dan, Qin; Shuzhen, Triazole compounds that modulate HSP90 activity, 2007, US Patent No. 8,318,790
- Ying; D. Chimmanamada; J. Burlison; S. Zhang; Song, Minghu; J. Chae; S. Schweizer, Preparation of substituted triazoles, particularly 4,5-diphenyl-4H-1,2,4-triazole-3-carboxamides, that modulate Hsp90 activity. PCT Int. Appl. 2010, WO 2010017545