Format | |
---|---|
BibTeX | |
MARCXML | |
TextMARC | |
MARC | |
DataCite | |
DublinCore | |
EndNote | |
NLM | |
RefWorks | |
RIS |
Files
Abstract
In the Correlation Clustering problem, we are given a graph with its edges labeled as ``similar" and ``dissimilar" by a noisy binary classifier, and the goal is to produce a clustering of the vertices that matches with the edge labels as much as possible. For a clustering \calC of graph G, a similar edge is in disagreement with \calC, if its endpoints belong to distinct clusters; and a dissimilar edge is in disagreement with \calC if its endpoints belong to the same cluster. The disagreements vector, \disagree, is a vector indexed by the vertices of G such that the v-th coordinate \disagreev equals the weight of all disagreeing edges incident on v. The goal is to produce a clustering that minimizes the ℓp norm of the disagreements vector for p≥1.
Correlation Clustering has been mainly studied under two models where the input graph is (i) complete and unweighted, and (ii) arbitrary and weighted. In this thesis, we introduce a new model of Correlation Clustering that better captures real life instances. In this model the input graph is complete with bounded edge weights. More specifically, every similar edge has weight in the range of [αw,w] and every dissimilar edge has weight at least αw (where α≤1 and w>0 is a scaling parameter).
In the first part of the thesis, we study the ℓ1 objective (MinDisagree) under this model. We give 3+2ln\nicefrac1α approximation algorithm and an asymptotically matching integrity gap.
In the second part of the thesis, we give an O((\nicefrac1α)\nicefrac12−\nicefrac12p⋅log\nicefrac1α) approximation algorithm for the ℓp objective. Furthermore, we show an almost matching integrality gap.