Information-theoretic metric learning

An information-theoretic approach to learning a Mahalanobis distance function that can handle a wide variety of constraints and can optionally incorporate a prior on the distance function and derive regret bounds for the resulting algorithm. Expand Co-clustering documents and words using bipartite spectral graph partitioning

- I. Dhillon
- Computer Science, Mathematics
- KDD '01
- 26 August 2001

A new spectral co-clustering algorithm is used that uses the second left and right singular vectors of an appropriately scaled word-document matrix to yield good bipartitionings and it can be shown that the singular vectors solve a real relaxation to the NP-complete graph bipartitionsing problem. Expand Clustering with Bregman Divergences

This paper proposes and analyzes parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergences, and shows that there is a bijection between regular exponential families and a largeclass of BRegman diverGences, that is called regular Breg man divergence. Expand Information-theoretic co-clustering

This work presents an innovative co-clustering algorithm that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages and demonstrates that the algorithm works well in practice, especially in the presence of sparsity and high-dimensionality. Expand Clustering on the Unit Hypersphere using von Mises-Fisher Distributions

A generative mixture-model approach to clustering directional data based on the von Mises-Fisher distribution, which arises naturally for data distributed on the unit hypersphere, and derives and analyzes two variants of the Expectation Maximization framework for estimating the mean and concentration parameters of this mixture. Expand Concept Decompositions for Large Sparse Text Data Using Clustering

The concept vectors produced by the spherical k-means algorithm constitute a powerful sparse and localized “basis” for text data sets and are localized in the word space, are sparse, and tend towards orthonormality. Expand Learning with Noisy Labels

The problem of binary classification in the presence of random classification noise is theoretically studied—the learner sees labels that have independently been flipped with some small probability, and methods used in practice such as biased SVM and weighted logistic regression are provably noise-tolerant. Expand Weighted Graph Cuts without Eigenvectors A Multilevel Approach

This paper develops a fast high-quality multilevel algorithm that directly optimizes various weighted graph clustering objectives, such as the popular ratio cut, normalized cut, and ratio association criteria, and demonstrates that the algorithm is applicable to large-scale clustering tasks such as image segmentation, social network analysis, and gene network analysis. Expand Kernel k-means: spectral clustering and normalized cuts

The generality of the weighted kernel k-means objective function is shown, and the spectral clustering objective of normalized cut is derived as a special case, leading to a novel weightedkernel k-Means algorithm that monotonically decreases the normalized cut. Expand Large-scale Multi-label Learning with Missing Labels

This paper studies the multi-label problem in a generic empirical risk minimization (ERM) framework and develops techniques that exploit the structure of specific loss functions - such as the squared loss function - to obtain efficient algorithms. Expand