Mapping the Topology of De Novo Binder Fitness Landscapes Using Phage-Assisted Non-Continuous Selection

Lu, Shannon

doi:10.6082/uchicago.15774

Mapping the Topology of De Novo Binder Fitness Landscapes Using Phage-Assisted Non-Continuous Selection

Lu, Shannon

2025

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Cite

Files

Abstract

The development of protein-based affinity reagents plays a critical role in biomedical research, diagnostics, and therapeutic development. However, de novo discovery of high-quality binders remains a major bottleneck, primarily due to poor predictability of selection success and the limited scalability of existing screening technologies. Traditional approaches often rely on trial-and-error experimentation with limited throughput and unclear determinants of success. In this thesis, I present a high-throughput, in vivo platform, Phage-Assisted Non-Continuous Selection (PANCS)-Binders, to systematically investigate the landscape of protein-protein interaction (PPI) discovery. By manipulating three major variables—sampling depth (ranging from 10⁵ to 10¹⁰), target identity (a panel of 96 diverse proteins), and affibody library design (ten distinct libraries varying in composition and mutational architecture)—I define and measure “binder density,” a quantitative metric that reflects how discoverable a given target is within a defined sequence space. Through more than 1,300 individual selections (documented in Supplementary Files 1–4), I find that binder density varies over several orders of magnitude across targets, while remaining largely insensitive to library design. Surprisingly, canonical biochemical and structural properties of targets do not reliably predict selection success, indicating that intrinsic “targetability” is an emergent property not easily captured by simple heuristics. These results underscore the critical importance of target identity in determining discovery outcomes and challenge assumptions about the universality of selection strategies. To enable predictive binder design, I collaborated with machine learning researchers to train neural network models using PANCS-derived sequence landscapes. These models integrate both sequence and target information, and show modest but consistent improvements in generalizing binder prediction across unseen targets. Our machine learning framework incorporates triplet contrastive loss and pre-trained protein language model embeddings to better structure the latent space and enhance discrimination between binder and non-binder classes. Together, these findings establish a scalable framework for dissecting and navigating PPI landscapes at unprecedented scale. They offer actionable insight into the underlying constraints of de novo binder discovery and demonstrate the power of combining high-throughput selections with data-driven modeling. This work provides both a resource and a roadmap for the rational engineering of protein-based affinity reagents.