• Publications
  • Influence
The information bottleneck method
TLDR
The variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning, as will be described in detail elsewhere. Expand
The Hierarchical Hidden Markov Model: Analysis and Applications
TLDR
This work introduces, analyzes and demonstrates a recursive hierarchical generalization of the widely used hidden Markov models, which is motivated by the complex multi-scale structure which appears in many natural sequences, particularly in language, handwriting and speech. Expand
Opening the Black Box of Deep Neural Networks via Information
TLDR
This work demonstrates the effectiveness of the Information-Plane visualization of DNNs and shows that the training time is dramatically reduced when adding more hidden layers, and the main advantage of the hidden layers is computational. Expand
The power of amnesia: Learning probabilistic automata with variable memory length
TLDR
It is proved that the algorithm presented can efficiently learn distributions generated by PSAs, and it is shown that for any target PSA, the KL-divergence between the distributiongenerated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity. Expand
Agglomerative Information Bottleneck
TLDR
A novel distributional clustering algorithm that maximizes the mutual information per cluster between data and given categories and achieves compression by 3 orders of magnitudes loosing only 10% of the original mutual information. Expand
Selective Sampling Using the Query by Committee Algorithm
TLDR
It is shown that if the two-member committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the number of queries, and this exponential decrease holds for query learning of perceptrons. Expand
Deep learning and the information bottleneck principle
TLDR
It is argued that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer. Expand
Margin based feature selection - theory and algorithms
TLDR
This paper introduces a margin based feature selection criterion and applies it to measure the quality of sets of features and devise novel selection algorithms for multi-class classification problems and provide theoretical generalization bound. Expand
Distributional Clustering of English Words
TLDR
Deterministic annealing is used to find lowest distortion sets of clusters: as the annealed parameter increases, existing clusters become unstable and subdivide, yielding a hierarchical "soft" clustering of the data. Expand
Taming the Noise in Reinforcement Learning via Soft Updates
TLDR
G-learning is proposed, a new off-policy learning algorithm that regularizes the noise in the space of optimal actions by penalizing deterministic policies at the beginning of the learning, which enables naturally incorporating prior distributions over optimal actions when available. Expand
...
1
2
3
4
5
...