Files
Abstract
The proliferation of Compound AI systems, which integrate multiple models and data streams for real-time decision-making, present significant challenges in achieving low operational latency, maintaining system observability, and ensuring user privacy. Addressing these challenges requires a fundamental shift from optimizing machine learning models or system components in isolation to a holistic approach. This thesis explores data-centric co-design principles that optimize the interplay between machine learning requirements and data management strategies. This work contributes the following: (1) Real-time model routing strategies for incoming streaming data to reduce latency while preserving accuracy. (2) A decentralized observability framework that significantly minimizes logging overhead and respects data borders. (3) Algorithmic data minimization techniques to effectively protect user identifiability while preserving model utility.