Files
Abstract
Molecular dynamics simulations can give atomistic insight into chemical systems and processes. However, key molecular motions often depend on statistically rare events. Quantitatively describing these motions requires the use of enhanced-sampling schemes that increase the probability of seeing rare events in simulations. In this thesis, we use mathematical approaches to analyze enhanced-sampling algorithms and introduce new ones.
We first turn our attention to umbrella sampling, one of the most widely used enhanced sampling algorithms. In umbrella sampling, forces that bias the system towards the rare events are applied. By combining data from multiple, differently biased simulations, averages against the probability distribution of the unbiased ensemble can be recovered. We analyze this scheme, formally justifying why umbrella sampling works and demonstrating how the scheme scales with certain design choices. We also introduce a new algorithm for recombining the data from separate simulations. This formulation allows for the first rigorous analysis of the error in umbrella sampling.
While umbrella sampling can exponentially accelerate convergence of statistical estimates, the biasing procedure irrevocably alters the system's dynamics. Consequently, umbrella sampling calculations can only produce averages against the system's equilibrium distribution. This prevents direct estimation of dynamical statistics, such as chemical rates or committor probabilities. To address this issue, we introduce a new scheme that connects dynamical statistics to operator-theoretic descriptions of the system's dynamics. Using the operator theoretic description enables us to avoid the challenging task of directly sampling the ensemble of reactive pathways. Moreover, the scheme does not require knowledge of a good low-dimensional description of the system, and does not require tight control over the system's dynamics. We also show that dynamical estimates from Markov state models (MSMs) correspond to a specific realization of our scheme.
Finally, we turn our attention to the problem of dimensionality reduction. One common approach is to consider the spectral properties of the operators discussed in the previous section. A common scheme for estimating dynamical quantities is the variational approach to conformational dynamics (VAC). Just as our work generalizes the estimation of rates and committors using MSMs, VAC generalizes the calculation of the eigenvalues and eigenvectors of the MSM transition matrix. We analyze VAC schemes and study how the choice of basis set, the amount of sampling, and the choice of lag time in the scheme affects the approximation of the system's slow modes. Our analysis shows that the output of VAC can be strongly dependent on the lag time and leads to new heuristics for choosing this parameter.
A unifying theme in this work is the importance of sampling error. Not only does it motivate the development of new enhanced sampling schemes, error analyses give us insight into existing numerical schemes. In umbrella sampling, error analysis informs simulation design choices and suggests how computational resources may be better allocated. In VAC, it guides parameter choice and helps inform what can be reasonably expected from these schemes. Our work shows how error propagates through numerical schemes can lead to improved algorithms for performing molecular dynamics.