Deep learning is transforming scientific research, education, healthcare, and more. However, our understanding of how even the simplest neural networks learn to make surprisingly accurate predictions remains limited. For instance, although shallow ReLU neural networks can in principle approximate any continuous function, deeper networks often perform better in practice. Why does this happen? What benefit does depth provide? This thesis describes frameworks for understanding the role of depth in neural networks using ideas from statistical learning theory and function-space perspectives, focusing on the complex interplay between depth, structure in data, regularization, and optimization. It is demonstrated that even though shallow networks can approximate any continuous function, learning a good approximation from finite training data can be significantly easier using a deeper network. In particular, a family of functions is identified that are provably hard to learn without sufficient depth. Moreover, it is also shown that deeper models naturally introduce a bias towards functions with latent low-dimensional structure. Finally, this thesis provides an example of an under- determined linear inverse problem in which deep neural networks adapt to underlying low-dimensional structure in the data when trained with standard techniques and without explicit guidance, resulting in improved robustness to noise at test time. These results shed light on how deep neural networks generalize well in practice by naturally capturing hidden patterns in data.