Files
Abstract
The human proteome comprises tens of thousands of proteins, each tailored for a specific function by the selective pressures of evolution. The field of protein design seeks to develop proteins with new or enhanced functions at will, ultimately bypassing the evolutionary clock. In this thesis, we introduce general machine-learning methods for accelerating protein design, with a particular focus on modeling protein structure.
First, we propose an approach for fixed-backbone design (Chapter 2), the problem of designing primary sequence and side-chain rotamers for a given backbone conformation. Whereas classic approaches formulate sequence and rotamer design tasks separately, we offer an approach to solve both simultaneously. To realize this, we develop a deep neural network that effectively leverages backbone coordinates. By exploiting backbone geometry, we efficiently represent atomic microenvironments at the coordinate level and ultimately avoid discrete rotamer sampling. This results in more robust designs and accurate quality estimates for downstream tasks.
Next, we introduce a framework for flexible protein-protein docking (Chapter 3), the task of determining the structure of a protein complex given the unbound structures of its constituent chains. Traditional docking methods are limited by their reliance on empirical physics-based scoring functions, inability to accommodate conformational flexibility, and failure to incorporate binding sites. To address these challenges, we propose an end-to-end approach that can model conformational changes and target specific interactions while significantly reducing computational time. As one of the pioneering deep learning methods for this task, we uncover key determinants underlying our success and provide important insights for future research. Finally, we highlight the generality of our approach by extending it to simultaneously dock and co-design the sequence and structure of antibody complementarity-determining regions targeting a specified epitope.