Embed, Preserve, Generate: Advancing Surrogate Models for Scientific Modeling

Ruoxi Jiang

doi:10.6082/uchicago.15091

Embed, Preserve, Generate: Advancing Surrogate Models for Scientific Modeling

Ruoxi Jiang

2025

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Cite

Files

Abstract

Simulators are fundamental tools for studying complex dynamical systems -- such as those in climate modeling, fluid dynamics, and molecular dynamics -- and accelerating scientific discovery. Yet their computational cost and development complexity have spurred interest in data-driven machine learning surrogates as efficient alternatives. A central challenge lies in decoding the intricate interactions of high-dimensional dynamics, where sensitivity to initial conditions in chaotic systems exacerbates errors in long-horizon forecasting. This thesis advances surrogate modeling through novel algorithm designs that address accuracy, interpretability, and stability. First, for inverse problems in scientific inference, we present Embed and Emulate (E&E), a new simulation-based inference (SBI) for parameter inference to fit physical models to real observation data. E&E jointly learns a low-dimensional latent embedding of observational data (serving as a summary statistic) and trains a fast emulator within this latent space. This eliminates the need for costly simulations or high-dimensional emulation during inference, enabling efficient parameter estimation and uncertainty quantification. Next, we tackle chaotic dynamics by integrating representation learning with physical constraints. Our method learns latent structures that preserve the statistical properties of the system across diverse environments, ensuring the robustness of predictions using noisy observations under multi-scenario forecasting. Additionally, we further explore this strategy and propose a novel hierarchical generative model that iteratively generates semantic latent representations. Each model in this series is conditioned on the output of the preceding higher-level models, culminating in image generation, enabling coherent generation of high-fidelity images through structured latent space exploration. Finally, we introduce hierarchical implicit modeling, an autoregressive strategy that dynamically rebalances attention between the adjacent time states and abstract representations of future states during rollouts. Drawing inspiration from the stability properties of numerical implicit time-stepping methods, our approach leverages predictions several steps ahead in time while incorporating spatial abstractions at multiple scales. By progressively conditioning on the hierarchical representations of a sequence of states, our model learns to balance fine-scale accuracy with long-term coherence. The implications extend to diverse domains where reliable long-term forecasting is crucial, from climate science to engineering design.