Statistical Learning and Optimization under Distribution Shift

Zhao, Boxin

doi:10.6082/uchicago.15697

Statistical Learning and Optimization under Distribution Shift

Zhao, Boxin

2025

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Cite

Files

Abstract

Modern machine learning models are increasingly deployed in settings where the classical assumption of independent and identically distributed (i.i.d.) data is violated due to distribution shifts across domains, populations, or time. This dissertation develops new methodologies in statistical learning and optimization that address such shifts through the unifying principle of adaptation—distinguishing between components that remain stable (preserved) and those that vary (adapted) across environments.

Chapter 2 introduces Trans-Glasso, a two-stage procedure for precision matrix estimation under distribution shift. The method exploits a shared sparsity pattern across domains as the preserved structure and adjusts for domain-specific deviations via differential network estimation. We establish non-asymptotic error bounds, prove minimax optimality, and demonstrate the method's practical effectiveness on both simulated and real-world data.

Chapter 3 presents SMART, a spectral regularization framework for multi-task learning. Assuming that the singular subspaces of the regression matrix are preserved while the projection weights vary across tasks, SMART estimates the target model via a nonconvex optimization problem regularized by source-informed subspaces. The method is supported by theoretical guarantees and achieves strong empirical performance.

Chapter 4 introduces an online client sampling method for federated optimization under data heterogeneity. It leverages the slowly varying informativeness of clients as the preserved structure and dynamically adapts the sampling distribution. Formulated as an online learning problem with bandit feedback, the algorithm builds on Online Stochastic Mirror Descent and achieves consistent improvements over uniform sampling in both theory and practice.

Collectively, these contributions advance a unified framework for learning and optimization under distribution shift by systematically decomposing each problem into preserved and adapted components. This framework enables the principled design of algorithms that are both theoretically grounded and practically effective in the presence of heterogeneous data.