Files

Abstract

In the Correlation Clustering problem, we are given a graph with its edges labeled as ``similar" and ``dissimilar" by a noisy binary classifier, and the goal is to produce a clustering of the vertices that matches with the edge labels as much as possible. For a clustering $\calC$ of graph $G$, a similar edge is in disagreement with $\calC$, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with $\calC$ if its endpoints belong to the same cluster. The disagreements vector, $\disagree$, is a vector indexed by the vertices of $G$ such that the $v$-th coordinate $\disagree_v$ equals the weight of all disagreeing edges incident on $v$. The goal is to produce a clustering that minimizes the $\ell_p$ norm of the disagreements vector for $p\geq 1$. Correlation Clustering has been mainly studied under two models where the input graph is (i) complete and unweighted, and (ii) arbitrary and weighted. In this thesis, we introduce a new model of Correlation Clustering that better captures real life instances. In this model the input graph is complete with bounded edge weights. More specifically, every similar edge has weight in the range of $[\alpha\mathbf{w},\mathbf{w}]$ and every dissimilar edge has weight at least $\alpha\mathbf{w}$ (where $\alpha \leq 1$ and $\mathbf{w}>0$ is a scaling parameter). In the first part of the thesis, we study the $\ell_1$ objective (MinDisagree) under this model. We give $3+2\ln\nicefrac{1}{\alpha}$ approximation algorithm and an asymptotically matching integrity gap. In the second part of the thesis, we give an $O\left((\nicefrac{1}{\alpha})^{\nicefrac{1}{2}-\nicefrac{1}{2p}}\cdot \log\nicefrac{1}{\alpha}\right)$ approximation algorithm for the $\ell_p$ objective. Furthermore, we show an almost matching integrality gap.

Details

Actions

PDF

from
to
Export
Download Full History