Published March 19, 2025
| Version v1
Journal article
Open
Unsupervised Learning of Progress Coordinates during Weighted Ensemble Simulations: Application to NTL9 Protein Folding
Creators
- 1. University of Pittsburgh
- 2. University of Chicago
Description
A major challenge for many rare-event sampling strategies is the identification of progress coordinates that capture the slowest relevant motions. Machine-learning methods that can identify progress coordinates in an unsupervised manner have therefore been of great interest to the simulation community. Here, we developed a general method for identifying progress coordinates "on-the-fly" during weighted ensemble (WE) rare-event sampling via deep learning (DL) of outliers among sampled conformations. Our method identifies outliers in a latent space model of the system's sampled conformations that is periodically trained using a convolutional variational autoencoder. As a proof of principle, we applied our DL-enhanced WE method to simulate the NTL9 protein folding process. To enable rapid tests, our simulations propagated discrete-state synthetic molecular dynamics trajectories using a generative, fine-grained Markov state model. Results revealed that our on-the-fly DL of outliers enhanced the efficiency of WE by >3-fold in estimating the folding rate constant. Our efforts are a significant step forward in the unsupervised learning of slow coordinates during rare event sampling.
Data availability
All input files and scripts needed to run and analyze the WE simulations in this study are provided in the GitHub repository: https://github.com/westpa/DL-enhancedWE and deposited on Zenodo under DOI: 10.5281/zenodo.13387514.Files
leung-et-al-2025-unsupervised-learning-of-progress-coordinates-during-weighted-ensemble-simulations-application-to-ntl9.pdf
Files
(3.1 MB)
| Name | Size | Download all |
|---|---|---|
|
Supporting information md5:d8835fff95abf3ed4dc14385243d802d |
408.7 kB | Preview Download |
|
Article md5:5f648f523527cbac52387ac198ec73be |
2.7 MB | Preview Download |
Additional details
Identifiers
- DOI
- 10.1021/acs.jctc.4c01136
- Other
- oai:uchicago.tind.io:14784
Funding
- National Science Foundation
- CHE-2136142
- National Institutes of Health
- P01AI165077
- National Institutes of Health
- R01 GM1151805
- National Science Foundation
- 2139536