• Publications
  • Influence
The information bottleneck method
We define the relevant information in a signal $x\in X$ as being the information that this signal provides about another signal $y\in \Y$. Examples include the information that face images provideExpand
  • 1,853
  • 206
The Hierarchical Hidden Markov Model: Analysis and Applications
We introduce, analyze and demonstrate a recursive hierarchical generalization of the widely used hidden Markov models, which we name Hierarchical Hidden Markov Models (HHMM). Our model is motivatedExpand
  • 878
  • 109
The power of amnesia: Learning probabilistic automata with variable memory length
We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we nameExpand
  • 477
  • 66
Opening the Black Box of Deep Neural Networks via Information
Despite their great success, there is still no comprehensive theoretical understanding of learning with Deep Neural Networks (DNNs) or their inner organization. Previous work proposed to analyze DNNsExpand
  • 568
  • 60
Selective Sampling Using the Query by Committee Algorithm
We analyze the “query by committee” algorithm, a method for filtering informative queries from a random stream of inputs. We show that if the two-member committee algorithm achieves information gainExpand
  • 1,072
  • 57
Agglomerative Information Bottleneck
We introduce a novel distributional clustering algorithm that maximizes the mutual information per cluster between data and given categories. This algorithm can be considered as a bottom up hardExpand
  • 398
  • 56
Margin based feature selection - theory and algorithms
Feature selection is the task of choosing a small set out of a given set of features that capture the relevant properties of the data. In the context of supervised classification problems theExpand
  • 390
  • 47
Information Bottleneck for Gaussian Variables
The problem of extracting the relevant aspects of data was addressed through the information bottleneck (IB) method, by (soft) clustering one variable while preserving information about another -Expand
  • 206
  • 37
Distributional Clustering of English Words
We describe and evaluate experimentally a method for clustering words according to their distribution in particular syntactic contexts. Words are represented by the relative frequency distributionsExpand
  • 1,118
  • 36
Deep learning and the information bottleneck principle
Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between theExpand
  • 463
  • 32