Files
Abstract
The hydrophobicity of proteins and similar surfaces, which display chemical heterogeneity at the nanoscale, drives countless aqueous interactions and assemblies. However, predicting how surface chemical patterning influences hydrophobicity remains a challenge. Here, we address this challenge by using molecular simulations and machine learning to characterize and model the hydrophobicity of a diverse library of patterned surfaces, spanning a wide range of sizes, shapes, and chemical compositions. We find that simple models, based only on polar content, are inaccurate, whereas complex neural network models are accurate but challenging to interpret. However, by systematically incorporating chemical correlations between surface groups into our models, we are able to construct a series of minimal models of hydrophobicity, which are both accurate and interpretable. Our models highlight that the number of proximal polar groups is a key determinant of hydrophobicity and that polar neighbors enhance hydrophobicity. Although our minimal models are trained on particular patch size and shape, their interpretability enables us to generalize them to rectangular patches of all shapes and sizes. We also demonstrate how our models can be used to predict hot-spot locations with the largest marginal contributions to hydrophobicity and to design chemical patterns that have a fixed polar content but vary widely in their hydrophobicity. Our data-driven models and the principles they furnish for modulating hydrophobicity could facilitate the design of novel materials and engineered proteins with stronger interactions or enhanced solubilities.