Files
Abstract
Clustering and partitioning tasks have found widespread applications across computing. In machine learning, clustering represents the quintessential unsupervised learning task: grouping similar data points to discover structure in data. In operations research and combinatorial optimization, one is often interested in finding bottlenecks in a network, to identify possible weakness and points of failure. In this work, we discuss recent progress in better understanding computational aspects of clustering and partitioning. Our primary goal is establishing formal mathematical guarantees on the performance of clustering algorithms, as well as proving impossibility results to determine the inherent hardness of the problems we consider. In the first part of the thesis, we discuss graph partitioning tasks, focusing on the theory behind finding small vertex separators: few vertices which, when removed, disconnect the graph into large pieces. We design approximation algorithms for this problem, based on rounding natural convex relaxations. We also outline a recently uncovered connection between this problem and the fastest mixing random walk process on a graph with a target stationary distribution. In the second part of this work we discuss some algorithmic results in partitioning hypergraphs. We introduce a new, expressive class of hypergraph cut functions. We then design approximation algorithms for hypergraph generalizations of the minimum conductance cut problem by leveraging and extending techniques from spectral graph theory to the hypergraph regime. We prove our results for all the cut functions in our newly-defined class. In the process, we also improve on a popular primal-dual algorithmic framework for graph partitioning algorithms. Finally, we address the problem of learning partitions in an interactive way, by querying a same-cluster oracle, which determines whether two points belong to the same cluster. In this context we develop and analyze novel error-resistant algorithms, and provide complementary lower bounds, showing that our algorithms achieve optimal query complexity. To this end, we develop a new analytic framework based on modeling this task as a Rényi-Ulam liar game.