# How to Design Robust Algorithms using Noisy Comparison Oracle

title={How to Design Robust Algorithms using Noisy Comparison Oracle},
author={Raghavendra Addanki and Sainyam Galhotra and Barna Saha},
journal={Proc. VLDB Endow.},
year={2021},
volume={14},
pages={1703-1716}
}
• Published 12 May 2021
• Computer Science
• Proc. VLDB Endow.
Metric based comparison operations such as finding maximum, nearest and farthest neighbor are fundamental to studying various clustering techniques such as k -center clustering and agglomerative hierarchical clustering. These techniques crucially rely on accurate estimation of pairwise distance between records. However, computing exact features of the records, and their pairwise distances is often challenging, and sometimes not possible. We circumvent this challenge by leveraging weak…

## Figures and Tables from this paper

• Computer Science
ArXiv
• 2021
Partitioned Nearest Neighbors Local Depth is introduced, a computationally tractable variant of PaLD leveraging the K-nearest neighbors digraph on S and shows that the probability of randomization-induced error δ in PaNNLD is no more than 2e−δ K.
• Computer Science
ArXiv
• 2022
This paper proposes a new revenue function that allows one to measure the goodness of dendrograms using only comparisons and shows that this function is closely related to Dasgupta’s cost for hierarchical clustering that uses pairwise similarities.
New facets of fast algorithm design for large scale data analysis that emphasizes on the role of developing approximation algorithms for better polynomial time/query complexity are covered.
• Computer Science
SIGMOD Conference
• 2022
HierER is developed, a querying strategy that uses record pair similarities to minimize the number of oracle queries while maximizing the identified hierarchical structure and is shown theoretically and empirically that HierER is effective under different similarity noise models and can scale up to million-size datasets.
• Computer Science
Algorithms
• 2021
This work studies clustering algorithms which operates with ordinal or comparison-based queries (operations) and provides several variants of these algorithms using ordinal operations and, in particular, non-trivial trade-offs between the number of high-cost and low-cost operations that are used.
• Computer Science, Mathematics
2021 IEEE International Symposium on Information Theory (ISIT)
• 2021
Active algorithms are proposed, based on ideas such as UCB and Thompson sampling developed in the closely related Multi-Armed Bandit problem, which adaptively decide which queries to send to the oracle and are able to solve the canonical $k$-center problem within an approximation ratio of two with high probability.

## References

SHOWING 1-10 OF 83 REFERENCES

• Computer Science
NeurIPS
• 2019
This paper proposes an active algorithm to find the nearest neighbor graph of a dataset of n items and demonstrates efficiency of the method empirically and theoretically, needing only O(n log(n)Delta^-2) queries in favorable settings, where Delta-2 accounts for the effect of noise.
• Computer Science
AISTATS
• 2018
This paper introduces a new weak oracle model, where a non-malicious user responds to a pairwise comparison query only when she is quite sure about the answer, and proposes two algorithms which provably locate the target object in a number of comparisons close to the entropy of the target distribution.
• Computer Science, Mathematics
WWW
• 2020
This work provides a polynomial time algorithm that recovers all signs correctly with high probability in the presence of noise with queries, improving on the current state-of-the-art due to Mazumdar and Saha.
• Computer Science
NIPS
• 2017
This paper provides the first information theoretic lower bound on the number of queries for clustering with noisy oracle in both situations, and designs novel algorithms that closely match this query complexity lower bound, even when theNumber of clusters is unknown.
• Computer Science
ArXiv
• 2017
The influence of allowing "not-sure" answers from a weak oracle and proposed algorithms to efficiently handle uncertainties are studied and effective performance of the approach in overcoming uncertainties is shown.
• Computer Science
ACM Trans. Database Syst.
• 2014
Efficient algorithms that are guaranteed to achieve correct results with high probability are given, and the cost of these algorithms are analyzed in terms of the total number of comparisons, and it is shown that they are essentially the best possible.
• Computer Science
NIPS
• 2016
A probabilistic polynomial-time (BPP) algorithm is provided for clustering in a setting where the expert conforms to a center-based clustering with a notion of margin, and a lower bound on the number of queries needed to have a computationally efficient clustering algorithm in this setting is proved.
• Computer Science
ITCS
• 2018
This paper extends the work of Ashtiani et al. to the approximation setting by showing that a few of such same-cluster queries enables one to get a polynomial-time (1+eps)-approximation algorithm for the k-means problem without any margin assumption on the input dataset.
• Computer Science
NIPS
• 2017
The dramatic power of side information aka similarity matrix on reducing the query complexity of clustering is shown, and intriguing connection to popular community detection models such as the {\em stochastic block model}, significantly generalizes them, and opens up many venues for interesting future research.
• Computer Science
ArXiv
• 2017
It is shown that a small query complexity is adequate for effective clustering with high probability by providing better pairs to the weak oracle and an effective algorithm to handle such uncertainties in query responses is proposed.