Go to main content

As practitioners advance machine learning, they target new structures in training data, propose new models, and formulate new learning problems that better reflect real-world applications. Each of these advances demands new algorithms. In this thesis, we consider three classes of modern machine learning problems, featuring interactions between novel models, data acquisition methods, and data distributions. For each of these problems, we analyze the relevant geometric structures that govern the difficulty of the learning or optimization problem. We produce new insights that lead to novel algorithms, thereby enabling new applications and advancing machine learning practice. First, Chapter 2 studies models developed to capture complex structures in data. Many datasets exhibit higher-order structure, such as group memberships or multi-way interactions, that standard graph-based models cannot represent. Recent work replaces graphs with hypergraphs in several key applications. Unfortunately, the deployment of hypergraph-based methods is hindered by a lack of efficient primitives for core optimization problems, such as solving hypergraph Laplacian systems. We develop efficient algorithms for these key optimization problems, enabling scalable hypergraph-based methods for semi-supervised learning and clustering. Rather than modeling new structure in existing, up-front data, Chapters 3 and 4 ask how learners can adapt when data acquisition becomes part of the learning problem. We consider two distinct settings: in Chapter 3, the learner searches for a hidden goal in an unknown environment. The learner must explore the environment to gather data relating to the goal's location, while aiming to minimize the total travel cost. In Chapter 4, the learner seeks to recover an unknown partition of a set by submitting queries to an oracle, while minimizing the total number of queries. In both settings, the gathered data may contain errors. We develop algorithms that learn efficiently while remaining robust to adversarial errors, and prove matching lower bounds, confirming the cost incurred by our algorithms is near-optimal. Chapters 5 and 6 turn to the third force: how contemporary optimization algorithms interact with geometric structure in loss landscapes and data. These chapters study adaptive gradient methods. Default implementations of these algorithms are not rotationally equivariant; rotations in both data and parameter space can dramatically affect their performance. Chapter 5 proposes a data-driven reparameterization method to improve the convergence of adaptive algorithms. We relate the impact of this reparameterization to approximate low-rank structure, an empirically prominent geometry that existing methods fail to exploit. We empirically evaluate this reparameterization and the ubiquity of approximate low-rank structure. Chapter 6 focuses on how rotations in data space impact generalization. We prove arbitrarily small rotations of the training data can drastically change the solutions produced by adaptive algorithms. We then show how to remedy this sensitivity by proving that the proposed reparameterization method makes adaptive gradient methods robust to data rotations. As datasets, models, problem formulations, and optimizers in machine learning continually evolve, one aim of this thesis is to serve as a stepping stone for future work. In Chapter 7, we highlight avenues for related future research, building on the novel algorithms and analysis developed herein.

Metric
From
To
Interval
Export
Download Full History