Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

In the era of big data and deep learning, the joining of machine learning (ML) methods with scientific inquiry is one of the most interesting and exciting frontiers. These techniques are already being used to create larger and more complex models, speed up experimentation, and identify new paths of discovery. The study of molecular structures has been a particular focal point for ML in science. As part of progress in drug discovery, protein function and materials sciences, computational methods are being applied to important tasks in understanding molecular interactions, identifying new molecules, and refining the structures of known molecules. A blend of experimental work and deep learning will bring tremendous advancements in these fields. This dissertation is composed of four works which investigate deep learning and generative methods in their application to the study of molecular structure, particularly through the collection of data from Nuclear Magnetic Resonance (NMR) spectroscopy. NMR is a molecular measurement technique in which a molecule is placed in a large magnetic field and perturbed with RF waves. NMR can be crucial in the determination and verification of the structure of a molecule through the measurement of its NMR spectrum. Within these works, we make significant contributions to the study of molecular structures: 1. New machine learning models for generating molecular conformers and for predicting NMR parameters, with novel architecture elements designed to better represent physical properties of molecular tasks leading to state of the art performance. 2. An innovative training method to incorporate multiple sources of data, which allows models to correct for systematic errors in different sources of data. 3. A new dataset for the study and future development of techniques in protein structure generation, including more accurate baselines for studying generated structures. 4. A novel approach to small molecule identification conditioned on NMR spectra, through the use of generative methods operating on 3D coordinates of the atoms. Throughout, we demonstrate how improvements in ML for science come not just from more advanced ML techniques, but also from the careful design of experiments and data collection that enhance these techniques.

Details

PDF

from
to
Export
Download Full History