How to Design Robust Algorithms using Noisy Comparison Oracle
@article{Addanki2021HowTD, title={How to Design Robust Algorithms using Noisy Comparison Oracle}, author={Raghavendra Addanki and Sainyam Galhotra and Barna Saha}, journal={Proc. VLDB Endow.}, year={2021}, volume={14}, pages={1703-1716} }
Metric based comparison operations such as finding maximum, nearest and farthest neighbor are fundamental to studying various clustering techniques such as
k
-center clustering and agglomerative hierarchical clustering. These techniques crucially rely on accurate estimation of pairwise distance between records. However, computing exact features of the records, and their pairwise distances is often challenging, and sometimes not possible. We circumvent this challenge by leveraging weakβ¦Β
Figures and Tables from this paper
6 Citations
Partitioned K-nearest neighbor local depth for scalable comparison-based learning
- Computer ScienceArXiv
- 2021
Partitioned Nearest Neighbors Local Depth is introduced, a computationally tractable variant of PaLD leveraging the K-nearest neighbors digraph on S and shows that the probability of randomization-induced error Ξ΄ in PaNNLD is no more than 2eβΞ΄ K.
A Revenue Function for Comparison-Based Hierarchical Clustering
- Computer ScienceArXiv
- 2022
This paper proposes a new revenue function that allows one to measure the goodness of dendrograms using only comparisons and shows that this function is closely related to Dasguptaβs cost for hierarchical clustering that uses pairwise similarities.
Approximation Algorithms for Large Scale Data Analysis
- Computer SciencePODS
- 2021
New facets of fast algorithm design for large scale data analysis that emphasizes on the role of developing approximation algorithms for better polynomial time/query complexity are covered.
Hierarchical Entity Resolution using an Oracle
- Computer ScienceSIGMOD Conference
- 2022
HierER is developed, a querying strategy that uses record pair similarities to minimize the number of oracle queries while maximizing the identified hierarchical structure and is shown theoretically and empirically that HierER is effective under different similarity noise models and can scale up to million-size datasets.
Optimal Clustering in Stable Instances Using Combinations of Exact and Noisy Ordinal Queries
- Computer ScienceAlgorithms
- 2021
This work studies clustering algorithms which operates with ordinal or comparison-based queries (operations) and provides several variants of these algorithms using ordinal operations and, in particular, non-trivial trade-offs between the number of high-cost and low-cost operations that are used.
Greedy $k$-Center from Noisy Distance Samples
- Computer Science, Mathematics2021 IEEE International Symposium on Information Theory (ISIT)
- 2021
Active algorithms are proposed, based on ideas such as UCB and Thompson sampling developed in the closely related Multi-Armed Bandit problem, which adaptively decide which queries to send to the oracle and are able to solve the canonical $k$-center problem within an approximation ratio of two with high probability.
References
SHOWING 1-10 OF 83 REFERENCES
Learning Nearest Neighbor Graphs from Noisy Distance Samples
- Computer ScienceNeurIPS
- 2019
This paper proposes an active algorithm to find the nearest neighbor graph of a dataset of n items and demonstrates efficiency of the method empirically and theoretically, needing only O(n log(n)Delta^-2) queries in favorable settings, where Delta-2 accounts for the effect of noise.
Comparison Based Learning from Weak Oracles
- Computer ScienceAISTATS
- 2018
This paper introduces a new weak oracle model, where a non-malicious user responds to a pairwise comparison query only when she is quite sure about the answer, and proposes two algorithms which provably locate the target object in a number of comparisons close to the entropy of the target distribution.
Clustering with a faulty oracle
- Computer Science, MathematicsWWW
- 2020
This work provides a polynomial time algorithm that recovers all signs correctly with high probability in the presence of noise with queries, improving on the current state-of-the-art due to Mazumdar and Saha.
Clustering with Noisy Queries
- Computer ScienceNIPS
- 2017
This paper provides the first information theoretic lower bound on the number of queries for clustering with noisy oracle in both situations, and designs novel algorithms that closely match this query complexity lower bound, even when theNumber of clusters is unknown.
Semi-Supervised Active Clustering with Weak Oracles
- Computer ScienceArXiv
- 2017
The influence of allowing "not-sure" answers from a weak oracle and proposed algorithms to efficiently handle uncertainties are studied and effective performance of the approach in overcoming uncertainties is shown.
Top-k and Clustering with Noisy Comparisons
- Computer ScienceACM Trans. Database Syst.
- 2014
Efficient algorithms that are guaranteed to achieve correct results with high probability are given, and the cost of these algorithms are analyzed in terms of the total number of comparisons, and it is shown that they are essentially the best possible.
Clustering with Same-Cluster Queries
- Computer ScienceNIPS
- 2016
A probabilistic polynomial-time (BPP) algorithm is provided for clustering in a setting where the expert conforms to a center-based clustering with a notion of margin, and a lower bound on the number of queries needed to have a computationally efficient clustering algorithm in this setting is proved.
Approximate Clustering with Same-Cluster Queries
- Computer ScienceITCS
- 2018
This paper extends the work of Ashtiani et al. to the approximation setting by showing that a few of such same-cluster queries enables one to get a polynomial-time (1+eps)-approximation algorithm for the k-means problem without any margin assumption on the input dataset.
Query Complexity of Clustering with Side Information
- Computer ScienceNIPS
- 2017
The dramatic power of side information aka similarity matrix on reducing the query complexity of clustering is shown, and intriguing connection to popular community detection models such as the {\em stochastic block model}, significantly generalizes them, and opens up many venues for interesting future research.
Relaxed Oracles for Semi-Supervised Clustering
- Computer ScienceArXiv
- 2017
It is shown that a small query complexity is adequate for effective clustering with high probability by providing better pairs to the weak oracle and an effective algorithm to handle such uncertainties in query responses is proposed.