• Corpus ID: 24906133

# Clustering with Noisy Queries

@article{Mazumdar2017ClusteringWN,
title={Clustering with Noisy Queries},
author={Arya Mazumdar and Barna Saha},
journal={ArXiv},
year={2017},
volume={abs/1706.07510}
}
• Published 22 June 2017
• Computer Science
• ArXiv
In this paper, we initiate a rigorous theoretical study of clustering with noisy queries (or a faulty oracle). Given a set of $n$ elements, our goal is to recover the true clustering by asking minimum number of pairwise queries to an oracle. Oracle can answer queries of the form : "do elements $u$ and $v$ belong to the same cluster?" -- the queries can be asked interactively (adaptive queries), or non-adaptively up-front, but its answer can be erroneous with probability $p$. In this paper, we…
65 Citations

## Figures from this paper

### Query Complexity of Clustering with Side Information

• Computer Science
NIPS
• 2017
The dramatic power of side information aka similarity matrix on reducing the query complexity of clustering is shown, and intriguing connection to popular community detection models such as the {\em stochastic block model}, significantly generalizes them, and opens up many venues for interesting future research.

### Top-m Clustering with a Noisy Oracle

• Computer Science
2019 National Conference on Communications (NCC)
• 2019
The goal is to identify the top-m clusters in terms of size, using the noisy answers from the oracle, and provides an upper bound which is a function of the number of recovered clusters $m$ and the sizes of the top clusters.

### Same-Cluster Querying for Overlapping Clusters

• Computer Science
NeurIPS
• 2019
This paper provides upper bounds (with algorithms) on the sufficient number of queries on the more practical scenario of overlapping clusters, and provides algorithmic results under both arbitrary (worst-case) and statistical modeling assumptions.

### Optimal Clustering with Noisy Queries via Multi-Armed Bandit

• Computer Science
ICML
• 2022
An interesting connection between the problem and multi-armed bandit might provide useful insights for other similar problems, and a new polynomial time algorithm with O ( n ( k +log n ) δ 2 + poly( k, 1 δ , log n )) queries is proposed.

### Correlation Clustering with Same-Cluster Queries Bounded by Optimal Cost

• Computer Science
ESA
• 2019

### Crowdsourcing Algorithms for Entity Resolution

• Computer Science
Proc. VLDB Endow.
• 2014
This paper considers the problem of designing optimal strategies for asking questions to humans that minimize the expected number of questions asked, and analyzes several strategies that can be claimed as "optimal" for this problem in a recent work but can perform arbitrarily bad in theory.

### Correlation Clustering with Noisy Partial Information

• Computer Science
COLT
• 2015
A semi-random model for the Correlation Clustering problem on arbitrary graphs G is proposed and two approximation algorithms for Correlationclustering instances from this model are given.

### Fault-Tolerant Entity Resolution with the Crowd

• Computer Science
ArXiv
• 2015
This paper establishes how to deduce a consistent ER solution from noisy worker answers as part of the data interpretation problem, and focuses on the next-crowdsource problem which is to find the next task that maximizes the information gain of the ER result for the minimal additional cost.

### Aggregating crowdsourced binary ratings

• Computer Science
WWW
• 2013
This paper obtains bounds on the error rate of the algorithm and shows it is governed by the expansion of the graph, and demonstrates, using several synthetic and real datasets, that the algorithm outperforms the state of the art.