Files

Abstract

The fusion of artificial intelligence and machine learning techniques with molecular simulation presents a powerful paradigm for understanding and engineering functional molecules and materials. This thesis focuses on data-driven approaches for molecular design and simulation encompassing four themes of study: (i) discovery of self-assembling π-conjugated peptides, (ii) active learning for molecular design, (iii) machine learning-enabled enhanced sampling techniques for molecular simulation, and (iv) methods for generative restoration of atomistic detail into coarse-grained molecules and trajectories. First, the challenge of designing synthetic π-conjugated peptides comprising an aromatic core flanked by oligopeptide wings capable of self-assembling into elongated nanostructures with emergent optoelectronic properties is engaged. High-throughput virtual screening campaigns with experimental validation are performed to discover peptide sequences and π-core chemistries that best facilitate self-assembly and display the most promise as high-charge mobility organic semiconductors. Second, an active learning platform for molecular design is established that seeks to efficiently navigate large chemical spaces to discover high-performing candidates that optimize a target quality metric. This active learning platform is applied to both self-assembling π-conjugated peptides and the discovery of small organic compounds capable of selectively permeating cardiolipin-containing membranes, these molecules could serve as fluorescent dyes and diagnostic tools for quantifying mitochondrial cardiolipin content which is linked to various degenerative diseases. Third, a machine learning-enabled enhanced sampling technique to accelerate barrier crossings and rare event transitions in molecular simulation is developed. This method employs an adaptive sampling workflow that iterates between the data-driven estimation of the slowest evolving dynamical motions from biased simulation data and performing enhanced sampling simulations within these collective variables to promote configurational phase space sampling. Lastly, two deep learning approaches for generatively and non-deterministically reintroducing atomistic detail into coarse-grained molecular representations are developed. One method is designed to maintain temporal coherence in the reconstructed trajectory using frame-by-frame conditioning, and the other provides an approach transferable to generic Cα protein traces. In conclusion, this dissertation showcases the application of data-driven approaches to various aspects of molecular design and simulation, ranging from the discovery of self-assembling π-conjugated peptides to active learning for molecular design, machine learning-enabled enhanced sampling techniques, and generative restoration of atomistic detail in coarse-grained molecules. These findings contribute to the advancement of computational methods for designing functional molecules and understanding complex molecular systems.

Details

Actions

PDF

from
to
Export
Download Full History