Go to main content
Formats
Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

The development of protein-based affinity reagents plays a critical role in biomedical research, diagnostics, and therapeutic development. However, de novo discovery of high-quality binders remains a major bottleneck, primarily due to poor predictability of selection success and the limited scalability of existing screening technologies. Traditional approaches often rely on trial-and-error experimentation with limited throughput and unclear determinants of success. In this thesis, I present a high-throughput, in vivo platform, Phage-Assisted Non-Continuous Selection (PANCS)-Binders, to systematically investigate the landscape of protein-protein interaction (PPI) discovery. By manipulating three major variables—sampling depth (ranging from 10⁵ to 10¹⁰), target identity (a panel of 96 diverse proteins), and affibody library design (ten distinct libraries varying in composition and mutational architecture)—I define and measure “binder density,” a quantitative metric that reflects how discoverable a given target is within a defined sequence space. Through more than 1,300 individual selections (documented in Supplementary Files 1–4), I find that binder density varies over several orders of magnitude across targets, while remaining largely insensitive to library design. Surprisingly, canonical biochemical and structural properties of targets do not reliably predict selection success, indicating that intrinsic “targetability” is an emergent property not easily captured by simple heuristics. These results underscore the critical importance of target identity in determining discovery outcomes and challenge assumptions about the universality of selection strategies. To enable predictive binder design, I collaborated with machine learning researchers to train neural network models using PANCS-derived sequence landscapes. These models integrate both sequence and target information, and show modest but consistent improvements in generalizing binder prediction across unseen targets. Our machine learning framework incorporates triplet contrastive loss and pre-trained protein language model embeddings to better structure the latent space and enhance discrimination between binder and non-binder classes. Together, these findings establish a scalable framework for dissecting and navigating PPI landscapes at unprecedented scale. They offer actionable insight into the underlying constraints of de novo binder discovery and demonstrate the power of combining high-throughput selections with data-driven modeling. This work provides both a resource and a roadmap for the rational engineering of protein-based affinity reagents.

Details

from
to
Export
Download Full History