Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost

  title={Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost},
  author={Barna Saha and Sanjay Subramanian},
Several clustering frameworks with interactive (semi-supervised) queries have been studied in the past. Recently, clustering with same-cluster queries has become popular. An algorithm in this setting has access to an oracle with full knowledge of an optimal clustering, and the algorithm can ask the oracle queries of the form, "Does the optimal clustering put vertices $ u $ and $ v $ in the same cluster?" Due to its simplicity, this querying model can easily be implemented in real crowd-sourcing… 

Figures and Tables from this paper

Query-Efficient Correlation Clustering

This paper devise a correlation clustering algorithm that, given a budget of Q queries, attains a solution whose expected number of disagreements is at most , where is the optimal cost for the instance.

Correlation Clustering with Adaptive Similarity Queries

This work investigates correlation clustering as an active learning problem: each similarity score can be learned by making a query, and the goal is to minimise both the disagreements and the total number of queries.

Exact Recovery of Mangled Clusters with Same-Cluster Queries

An algorithm is designed that can reconstruct the latent clustering exactly while using only a small number of oracle queries, and can also learn the clusters using low-stretch separators, a class of ellipsoids with additional theoretical guarantees.

On Margin-Based Cluster Recovery with Oracle Queries

We study an active cluster recovery problem where, given a set of n points and an oracle answering queries like “are these two points in the same cluster?”, the task is to recover exactly all

Optimal Clustering in Stable Instances Using Combinations of Exact and Noisy Ordinal Queries

This work studies clustering algorithms which operates with ordinal or comparison-based queries (operations) and provides several variants of these algorithms using ordinal operations and, in particular, non-trivial trade-offs between the number of high-cost and low-cost operations that are used.

Approximation algorithms for the lower bounded correlation clustering problem

This paper introduces the Lower bounded correlation clustering problem (LBCorCP) and gives three algorithms for this problem, which can quickly solve some special instances in polynomial time and obtain a smaller approximation ratio.

Fuzzy Clustering with Similarity Queries

This paper proposes a semisupervised active clustering framework, where the learner is allowed to interact with an oracle, asking for the similarity between a certain set of chosen items, and proves that having a few of such similarity queries enables one to get a polynomial-time approximation algorithm to an otherwise conjecturally NP-hard problem.

Learning to Cluster via Same-Cluster Queries

This work proposes two algorithms with provable theoretical guarantees and verifies their effectiveness via an extensive set of experiments on both synthetic and real-world data.

Clustering with Queries under Semi-Random Noise

This work develops robust learning methods that tolerate general semi-random noise obtaining qualitatively the same guarantees as the best possible methods in the fully-random model.

Active Learning with Positive and Negative Pairwise Feedback

A generic framework for active clustering with queries for pairwise similarities between objects, which can be any positive or negative number, yielding full flexibility in the type of feedback that a user/annotator can provide.



Clustering with Same-Cluster Queries

A probabilistic polynomial-time (BPP) algorithm is provided for clustering in a setting where the expert conforms to a center-based clustering with a notion of margin, and a lower bound on the number of queries needed to have a computationally efficient clustering algorithm in this setting is proved.

Clustering with Noisy Queries

This paper provides the first information theoretic lower bound on the number of queries for clustering with noisy oracle in both situations, and designs novel algorithms that closely match this query complexity lower bound, even when theNumber of clusters is unknown.

Approximate Clustering with Same-Cluster Queries

This paper extends the work of Ashtiani et al. to the approximation setting by showing that a few of such same-cluster queries enables one to get a polynomial-time (1+eps)-approximation algorithm for the k-means problem without any margin assumption on the input dataset.

Approximate Correlation Clustering Using Same-Cluster Queries

This work obtains an (1+\eps)-approximation algorithm for any small \eps with running time that is polynomial in the input parameters and also in k and 1/\eps and gives non-trivial upper and lower bounds on the number of same-cluster queries.

Correlation clustering with a fixed number of clusters

This paper focuses on the situation when the number of clusters is stipulated to be a small constant k, and finds that for every k, there is a polynomial time approximation scheme for both maximizing agreements and minimizing disagreements.

Semi-Supervised Algorithms for Approximately Optimal and Accurate Clustering

The number of queries needed for $(1 - \epsilon)$-accuracy in Euclidean $k$-means must linearly depend on the dimension of the underlying Euclidan space, and for finite metric space $k-mean, it must at least be logarithmic in the number of candidate centers.

Correlation clustering in general weighted graphs

Clustering with Interactive Feedback

A query-based model in which users can provide feedback to a clustering algorithm in a natural way via split and merge requests is introduced and the "clusterability" of different concept classes in this framework is analyzed.

Clustering with qualitative information

This work considers the problem of clustering a collection of elements based on pairwise judgments of similarity and dissimilarity, and gives a factor 4 approximation for minimization on complete graphs, and a factor O(log n) approximation for general graphs.

Going weighted: Parameterized algorithms for cluster editing