• Corpus ID: 25928202

# Semi-Supervised Active Clustering with Weak Oracles

@article{Kim2017SemiSupervisedAC,
title={Semi-Supervised Active Clustering with Weak Oracles},
author={Taewan Kim and Joydeep Ghosh},
journal={ArXiv},
year={2017},
volume={abs/1709.03202}
}
• Published 11 September 2017
• Computer Science
• ArXiv
Semi-supervised active clustering (SSAC) utilizes the knowledge of a domain expert to cluster data points by interactively making pairwise "same-cluster" queries. However, it is impractical to ask human oracles to answer every pairwise query. In this paper, we study the influence of allowing "not-sure" answers from a weak oracle and propose algorithms to efficiently handle uncertainties. Different types of model assumptions are analyzed to cover realistic scenarios of oracle abstraction. In the…
6 Citations

## Figures and Tables from this paper

Relaxed Oracles for Semi-Supervised Clustering
• Computer Science
ArXiv
• 2017
It is shown that a small query complexity is adequate for effective clustering with high probability by providing better pairs to the weak oracle and an effective algorithm to handle such uncertainties in query responses is proposed.
Same-Cluster Querying for Overlapping Clusters
• Computer Science
NeurIPS
• 2019
This paper provides upper bounds (with algorithms) on the sufficient number of queries on the more practical scenario of overlapping clusters, and provides algorithmic results under both arbitrary (worst-case) and statistical modeling assumptions.
A PAC-Theory of Clustering with Advice
The trade-offs between computational and advice complexities of learning are investigated, showing that using a little bit of advice can turn an otherwise computationally hard clustering problem into a tractable one.
How to Design Robust Algorithms using Noisy Comparison Oracle
• Computer Science
Proc. VLDB Endow.
• 2021
This paper studies various problems that include finding maximum, nearest/farthest neighbor search under two different noise models called adversarial and probabilistic noise, and gives robust algorithms for k -center clustering and agglomerative hierarchical clustering.
Entropy-based active sparse subspace clustering
• Computer Science
Multimedia Tools and Applications
• 2018
A novel extension for SSC with active learning framework is proposed, in which the most informative pairwise constraints are selected to guide the SSC for accurate clustering results.
Query K-means Clustering and the Double Dixie Cup Problem
• Computer Science
NeurIPS
• 2018
We consider the problem of approximate $K$-means clustering with outliers and side information provided by same-cluster queries and possibly noisy answers. Our solution shows that, under some mild

## References

SHOWING 1-10 OF 15 REFERENCES
A probabilistic framework for semi-supervised clustering
• Computer Science
KDD
• 2004
A probabilistic model for semi-supervised clustering based on Hidden Markov Random Fields (HMRFs) that provides a principled framework for incorporating supervision into prototype-based clustering and experimental results demonstrate the advantages of the proposed framework.
Active Semi-Supervision for Pairwise Constrained Clustering
• Computer Science
SDM
• 2004
Experimental and theoretical results confirm that this active querying of pairwise constraints significantly improves the accuracy of clustering when given a relatively small amount of supervision.
Clustering under Perturbation Resilience
• Computer Science
SIAM J. Comput.
• 2016
This paper presents an algorithm that can optimally cluster instances resilient to $(1 + \sqrt{2})$-factor perturbations, solving an open problem of Awasthi et al.
Clustering with Constraints: Feasibility Issues and the k-Means Algorithm
• Computer Science
SDM
• 2005
A key finding is that determining whether there is a feasible solution satisfying all constraints is, in general, NP-complete, and this motivates the derivation of a new version of the k-Means algorithm that minimizes the constrained vector quantization error but at each iteration does not attempt to satisfy all constraints.
Representation Learning for Clustering: A Statistical Framework
• Computer Science
UAI
• 2015
A formal statistical model for analyzing the sample complexity of learning a clustering representation with this paradigm is provided and a notion of capacity of a class of possible representations is introduced, in the spirit of the VC-dimension, showing that classes of representations that have finite such dimension can be successfully learned with sample size error bounds.
Clustering Via Crowdsourcing
• Computer Science
ArXiv
• 2016
A major contribution of this paper is to reduce the query complexity to linear or even sublinear in $n$ when mild side information is provided by a machine, and even in presence of crowd errors which are not correctable via resampling.
Clustering with Bregman Divergences
• Computer Science
J. Mach. Learn. Res.
• 2005
This paper proposes and analyzes parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences, and shows that there is a bijection between regular exponential families and a largeclass of BRegman diverGences, that is called regular Breg man divergence.
Semi-Supervised Clustering with User Feedback
• Computer Science
• 2003
This work presents an approach to clustering based on the observation that "it is easier to criticize than to construct" and demonstrates semi-supervised clustering with a system that learns to cluster news stories from a Reuters data set.
A Dimension-Independent Generalization Bound for Kernel Supervised Principal Component Analysis
• Computer Science
FE@NIPS
• 2015
This work provides a guarantee indicating that KSPCA generalizes well even when the number of parameters is large, as long as they have small norms, which justies the good performance of KSP CA on high-dimensional data.
User-Friendly Tail Bounds for Sums of Random Matrices
• J. Tropp
• Mathematics
Found. Comput. Math.
• 2012
This paper presents new probability inequalities for sums of independent, random, self-adjoint matrices and provides noncommutative generalizations of the classical bounds associated with the names Azuma, Bennett, Bernstein, Chernoff, Hoeffding, and McDiarmid.