• Corpus ID: 1324009

Agglomerative Information Bottleneck

@inproceedings{Slonim1999AgglomerativeIB,
  title={Agglomerative Information Bottleneck},
  author={Noam Slonim and Naftali Tishby},
  booktitle={NIPS},
  year={1999}
}
We introduce a novel distributional clustering algorithm that maximizes the mutual information per cluster between data and given categories. [] Key Method The algorithm is compared with the top-down soft version of the information bottleneck method and a relationship between the hard and soft results is established. We demonstrate the algorithm on the 20 Newsgroups data set. For a subset of two news-groups we achieve compression by 3 orders of magnitudes loosing only 10% of the original mutual information.

Figures from this paper

Agglomerative Multivariate Information Bottleneck

This paper presents a new family of simple agglomerative algorithms to construct such systems of inter-related clusters and analyzes the behavior of these algorithms and applies them to several real-life datasets.

The Density-Based Agglomerative Information Bottleneck

The concept of density-based chains is adopted to evaluate the information loss among the neighbors of an element, rather than the information Loss between pairs of elements, to alleviate the sub-optimality problem in aIB while simultaneously keeping the useful hierarchical clustering tree-structure.

The Density Connectivity Information Bottleneck

The DCIB algorithm is proposed, a density connectivity information bottleneck algorithm that applies the information bottleneck method to quantify the relative information during the clustering procedure and can preserve more relative information and achieve higher precision than the aIB algorithm.

Information Bottleneck Co-clustering

An agglomerative Information Bottleneck Co-clustering approach, which automatically captures the relation between the numbers of clusters, and leverages an annealing-style strategy to bypass local optima.

Multi-way distributional clustering via pairwise interactions

An extensive empirical study of two-way, three-way and four-way applications of the MDC scheme using six real-world datasets including the 20 News-groups and the Enron email collection shows that the algorithms consistently and significantly outperform previous state-of-the-art information theoretic clustering algorithms.

Information Theoretic Clustering Using Minimum Spanning Trees

In this work we propose a new information-theoretic clustering algorithm that infers cluster memberships by direct optimization of a non-parametric mutual information estimate between data

Finding the Optimal Cardinality Value for Information Bottleneck Method

Empirical results in the documentation clustering scenario indicates that the proposed method works well for the determination of the optimal parameter value for information bottleneck method.

Determine the Optimal Parameter for Information Bottleneck Method

Empirical results in the documentation clustering scenario indicates that the proposed method works well for the determination of the optimal parameter value for information bottleneck method.

Data Clustering by Markovian Relaxation and the Information Bottleneck Method

This method combines a pairwise based approach with a vector-quantization method which provide a meaningful interpretation to the resulting clusters and can cluster data with no geometric or other bias and makes no assumption about the underlying distribution.

Multivariate Information Bottleneck

A general principled framework for multivariate extensions of the information bottleneck method is introduced that provides insights about bottleneck variations and enables us to characterize the solutions of these variations.
...

References

SHOWING 1-10 OF 14 REFERENCES

Pairwise Data Clustering by Deterministic Annealing

A deterministic annealing approach to pairwise clustering is described which shares the robustness properties of maximum entropy inference and the resulting Gibbs probability distributions are estimated by mean-field approximation.

Distributional Clustering of English Words

Deterministic annealing is used to find lowest distortion sets of clusters: as the annealed parameter increases, existing clusters become unstable and subdivide, yielding a hierarchical "soft" clustering of the data.

Agnostic Classification of Markovian Sequences

The method for the classification of discrete sequences whenever they can be compressed is introduced and its application for hierarchical clustering of languages and for estimating similarities of protein sequences is illustrated.

Divergence measures based on the Shannon entropy

A novel class of information-theoretic divergence measures based on the Shannon entropy is introduced, which do not require the condition of absolute continuity to be satisfied by the probability distributions involved and are established in terms of bounds.

Class-Based n-gram Models of Natural Language

This work addresses the problem of predicting a word from previous words in a sample of text and discusses n-gram models based on classes of words, finding that these models are able to extract classes that have the flavor of either syntactically based groupings or semanticallybased groupings, depending on the nature of the underlying statistics.

Learning from Dyadic Data

This paper proposes an annealed version of the standard EM algorithm for model fitting which is empirically evaluated on a variety of data sets from different domains.

Elements of Information Theory

The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.

Deterministic annealing for clustering, compression, classification, regression, and related optimization problems

  • K. Rose
  • Computer Science
    Proc. IEEE
  • 1998
The deterministic annealing approach to clustering and its extensions has demonstrated substantial performance improvement over standard supervised and unsupervised learning methods in a variety of

of the Association for Computational Linguistics:

The information bottleneck method: Extracting relevant information from concurrent data. Yet unpublished manuscript

  • NEC Research Institute TR,
  • 1998