Inverse Statistical Physics: Connections to Machine Learning and Applications in Biology

Fields, Peter

doi:10.6082/uchicago.16722

Inverse Statistical Physics: Connections to Machine Learning and Applications in Biology

Fields, Peter

2026

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Add to Basket

Cite

Statistical mechanics was historically significant for its ability to link the microscopic descriptions of matter and energy to the macroscopic observations of thermodynamics. Recent cross-disciplinary work has utilized insights from statistical mechanics to solve the inverse problem---going from observable phenomenon to underlying interactions and principles---and has done so from applications in protein research to theory in neuroscience. Inverse statistical physics relies on a key property of information entropy: that fitting a distribution to data such that it obeys given observed constraints (but is otherwise as maximally entropic as possible) leads to a probabilistic model of the system that is least biased from unobserved assumptions, which leads to maximal predictive power. These maximum entropy models belong to a wider class of regularly used architectures in machine learning, known as energy-based models. When such models are fit to real data of complex multi-dimensional systems, ideally the learned distribution is able to generate states exemplary of the ground truth. In practice, however, this is often not the case; specialized sampling must be performed to generate desired outputs. For example, energy-based models of sequences of evolutionary related families of proteins have the ability to learn the generic constraints necessary to make novel functional sequences, which have been validated by in vivo experiments. However, these learned energy functions require re-scaling by a temperature parameter in order to sample novel functional sequences. Here we utilize minimal and physically motivated energy-based models in order to systematically interrogate the differences between the data-generation processes of ground truth and learned models sampled at varying temperatures. This lends itself well to an examination of the surprising ability of temperature tuning of learned energy functions---a poorly understood heuristic used across machine learning---to improve sampling performance. Whether the post-hoc sampling temperature need be raised or lowered, and by how much, depends on several factors: choice of objective function, amount of training data, and most importantly, properties of order and disorder inherent to the true system. Crucially, we show that the need to lower temperature to improve generative performance arises from a tendency of fit models to overestimate the probability mass on excited states when the number of training data is low and the ground truth is characterized by a strong preference for producing few ground states---induced by large "energy gaps" or low ground truth "temperature." Additionally, we show via a minimal setting that the temperature tuning phenomenon may be directly linked to a wide array of empirical evidence for a synergistic cluster of amino acids, or sector, within a sequence that is responsible for determining the functionality of that protein sequence.

Title

Inverse Statistical Physics: Connections to Machine Learning and Applications in Biology

Author

Fields, Peter : University of Chicago : (https://orcid.org/0009-0008-3203-0171 )

Degree Type

Ph.D.

Content Type

Dissertation

Academic Advisor

Stephanie Palmer

Committee Member

David J Schwab
Arvind Murugan
Wendy Zhang
Rama Ranganathan

Keywords

statistical physics; machine learning; proteins; information theory

Digital Object Identifier

https://doi.org/10.6082/uchicago.16722

Funding Information

U.S. National Science Foundation, (https://ror.org/021nxhr62), ROR, Physics Frontier Center for Living Systems, PHY-2317138
U.S. National Science Foundation, (https://ror.org/021nxhr62), ROR, NSF-Simons National Institute for Theory and Mathematics in Biology, DMS-2235451
Simons Foundation, (https://ror.org/01cmst727), ROR, NSF-Simons National Institute for Theory and Mathematics in Biology, MP-TMPS-00005320
U.S. National Science Foundation, (https://ror.org/021nxhr62), ROR, Materials Research Science and Engineering Center (MRSEC), DMR-2011854
U.S. National Science Foundation, (https://ror.org/021nxhr62), ROR, Center for the Physics of Biological Function, PHY-1734030
University of Chicago, (https://ror.org/024mw5h28), ROR, Center for the Physics of Evolving Systems
Institute for Complex Adaptive Matter (ICAM)

Publication Date

2026-03

Language

English

Copyright Statement

Licensing

CC BY-NC-ND

Record Appears in

Physical Sciences Division > Physics
Export
All

Record Created

2026-01-05

Download Full History

Inverse Statistical Physics: Connections to Machine Learning and Applications in Biology

Files

Abstract

Details

PDF

Statistics