Share This Author
The information bottleneck method
The variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning, as will be described in detail elsewhere.
The Hierarchical Hidden Markov Model: Analysis and Applications
This work introduces, analyzes and demonstrates a recursive hierarchical generalization of the widely used hidden Markov models, which is motivated by the complex multi-scale structure which appears in many natural sequences, particularly in language, handwriting and speech.
Opening the Black Box of Deep Neural Networks via Information
This work demonstrates the effectiveness of the Information-Plane visualization of DNNs and shows that the training time is dramatically reduced when adding more hidden layers, and the main advantage of the hidden layers is computational.
The power of amnesia: Learning probabilistic automata with variable memory length
It is proved that the algorithm presented can efficiently learn distributions generated by PSAs, and it is shown that for any target PSA, the KL-divergence between the distributiongenerated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity.
Deep learning and the information bottleneck principle
It is argued that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer.
Agglomerative Information Bottleneck
A novel distributional clustering algorithm that maximizes the mutual information per cluster between data and given categories and achieves compression by 3 orders of magnitudes loosing only 10% of the original mutual information.
Selective Sampling Using the Query by Committee Algorithm
- Y. Freund, H. Seung, E. Shamir, Naftali Tishby
- Computer ScienceMachine-mediated learning
- 1 September 1997
It is shown that if the two-member committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the number of queries, and this exponential decrease holds for query learning of perceptrons.
Distributional Clustering of English Words
- Fernando C Pereira, Naftali Tishby, Lillian Lee
- Computer ScienceAnnual Meeting of the Association for…
- 22 June 1993
Deterministic annealing is used to find lowest distortion sets of clusters: as the annealed parameter increases, existing clusters become unstable and subdivide, yielding a hierarchical "soft" clustering of the data.
Margin based feature selection - theory and algorithms
- Ran Gilad-Bachrach, A. Navot, Naftali Tishby
- Computer ScienceInternational Conference on Machine Learning
- 4 July 2004
This paper introduces a margin based feature selection criterion and applies it to measure the quality of sets of features and devise novel selection algorithms for multi-class classification problems and provide theoretical generalization bound.
Taming the Noise in Reinforcement Learning via Soft Updates
- Roy Fox, Ari Pakman, Naftali Tishby
- Computer ScienceConference on Uncertainty in Artificial…
- 28 December 2015
G-learning is proposed, a new off-policy learning algorithm that regularizes the noise in the space of optimal actions by penalizing deterministic policies at the beginning of the learning, which enables naturally incorporating prior distributions over optimal actions when available.