# Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost

@article{Saha2019CorrelationCW,
title={Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost},
author={Barna Saha and Sanjay Subramanian},
journal={ArXiv},
year={2019},
volume={abs/1908.04976}
}
• Published 14 August 2019
• Computer Science
• ArXiv
Several clustering frameworks with interactive (semi-supervised) queries have been studied in the past. Recently, clustering with same-cluster queries has become popular. An algorithm in this setting has access to an oracle with full knowledge of an optimal clustering, and the algorithm can ask the oracle queries of the form, "Does the optimal clustering put vertices $u$ and $v$ in the same cluster?" Due to its simplicity, this querying model can easily be implemented in real crowd-sourcing…
13 Citations

## Figures and Tables from this paper

• Computer Science
WWW
• 2020
This paper devise a correlation clustering algorithm that, given a budget of Q queries, attains a solution whose expected number of disagreements is at most , where is the optimal cost for the instance.
• Computer Science
NeurIPS
• 2019
This work investigates correlation clustering as an active learning problem: each similarity score can be learned by making a query, and the goal is to minimise both the disagreements and the total number of queries.
• Computer Science
NeurIPS
• 2020
An algorithm is designed that can reconstruct the latent clustering exactly while using only a small number of oracle queries, and can also learn the clusters using low-stretch separators, a class of ellipsoids with additional theoretical guarantees.
• Computer Science, Mathematics
NeurIPS
• 2021
We study an active cluster recovery problem where, given a set of n points and an oracle answering queries like “are these two points in the same cluster?”, the task is to recover exactly all
• Computer Science
Algorithms
• 2021
This work studies clustering algorithms which operates with ordinal or comparison-based queries (operations) and provides several variants of these algorithms using ordinal operations and, in particular, non-trivial trade-offs between the number of high-cost and low-cost operations that are used.
• Computer Science
Journal of Combinatorial Optimization
• 2022
This paper introduces the Lower bounded correlation clustering problem (LBCorCP) and gives three algorithms for this problem, which can quickly solve some special instances in polynomial time and obtain a smaller approximation ratio.
• Computer Science
NeurIPS
• 2021
This paper proposes a semisupervised active clustering framework, where the learner is allowed to interact with an oracle, asking for the similarity between a certain set of chosen items, and proves that having a few of such similarity queries enables one to get a polynomial-time approximation algorithm to an otherwise conjecturally NP-hard problem.
• Computer Science
CIKM
• 2021
This work proposes two algorithms with provable theoretical guarantees and verifies their effectiveness via an extensive set of experiments on both synthetic and real-world data.
• Computer Science, Mathematics
COLT
• 2022
This work develops robust learning methods that tolerate general semi-random noise obtaining qualitatively the same guarantees as the best possible methods in the fully-random model.
• Computer Science
• 2023
A generic framework for active clustering with queries for pairwise similarities between objects, which can be any positive or negative number, yielding full flexibility in the type of feedback that a user/annotator can provide.

## References

SHOWING 1-10 OF 24 REFERENCES

• Computer Science
NIPS
• 2016
A probabilistic polynomial-time (BPP) algorithm is provided for clustering in a setting where the expert conforms to a center-based clustering with a notion of margin, and a lower bound on the number of queries needed to have a computationally efficient clustering algorithm in this setting is proved.
• Computer Science
NIPS
• 2017
This paper provides the first information theoretic lower bound on the number of queries for clustering with noisy oracle in both situations, and designs novel algorithms that closely match this query complexity lower bound, even when theNumber of clusters is unknown.
• Computer Science
ITCS
• 2018
This paper extends the work of Ashtiani et al. to the approximation setting by showing that a few of such same-cluster queries enables one to get a polynomial-time (1+eps)-approximation algorithm for the k-means problem without any margin assumption on the input dataset.
• Computer Science
LATIN
• 2018
This work obtains an (1+\eps)-approximation algorithm for any small \eps with running time that is polynomial in the input parameters and also in k and 1/\eps and gives non-trivial upper and lower bounds on the number of same-cluster queries.
• Computer Science
SODA '06
• 2006
This paper focuses on the situation when the number of clusters is stipulated to be a small constant k, and finds that for every k, there is a polynomial time approximation scheme for both maximizing agreements and minimizing disagreements.
• Computer Science
ICALP
• 2018
The number of queries needed for $(1 - \epsilon)$-accuracy in Euclidean $k$-means must linearly depend on the dimension of the underlying Euclidan space, and for finite metric space \$k-mean, it must at least be logarithmic in the number of candidate centers.
• Computer Science
ALT
• 2008
A query-based model in which users can provide feedback to a clustering algorithm in a natural way via split and merge requests is introduced and the "clusterability" of different concept classes in this framework is analyzed.
• Computer Science, Mathematics
44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings.
• 2003
This work considers the problem of clustering a collection of elements based on pairwise judgments of similarity and dissimilarity, and gives a factor 4 approximation for minimization on complete graphs, and a factor O(log n) approximation for general graphs.