#### Filter Results:

- Full text PDF available (189)

#### Publication Year

1988

2017

- This year (4)
- Last 5 years (28)
- Last 10 years (73)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Brain Region

#### Cell Type

#### Data Set Used

#### Key Phrases

#### Method

#### Organism

Learn More

- Naftali Tishby, Fernando Pereira, William Bialek
- ArXiv
- 1999

We define the relevant information in a signal x ∈ X as being the information that this signal provides about another signal y ∈ Y . Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. Understanding the signal x requires more than just… (More)

- Shai Fine, Yoram Singer, Naftali Tishby
- Machine Learning
- 1998

We introduce, analyze and demonstrate a recursive hierarchical generalization of the widely used hidden Markov models, which we name Hierarchical Hidden Markov Models (HHMM). Our model is motivated by the complex multi-scale structure which appears in many natural sequences, particularly in language, handwriting and speech. We seek a systematic unsupervised… (More)

- Yoav Freund, H. Sebastian Seung, Eli Shamir, Naftali Tishby
- Machine Learning
- 1997

We analyze the “query by committee” algorithm, a method for filtering informative queries from a random stream of inputs. We show that if the two-member committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the number of queries. We show that, in particular, this exponential decrease… (More)

- Fernando Pereira, Naftali Tishby, Lillian Lee
- ACL
- 1993

We describe and experimentally evaluate a method for automatically clustering words according to their distribution in particular syntactic contexts. Deterministic annealing is used to find lowest distortion sets of clusters. As the annealing parameter increases, existing clusters become unstable and subdivide, yielding a hierarchical “soft” clustering of… (More)

- Dana Ron, Yoram Singer, Naftali Tishby
- Machine Learning
- 1996

We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the… (More)

- Noam Slonim, Naftali Tishby
- NIPS
- 1999

We introduce a novel distributional clustering algorithm that explicitly maximizes the mutual information per cluster between the data and given categories. This algorithm can be considered as a bottom up hard version of the recently introduced “Information Bottleneck Method”. We relate the mutual information between clusters and categories to the Bayesian… (More)

- Ran Gilad-Bachrach, Amir Navot, Naftali Tishby
- ICML
- 2004

Feature selection is the task of choosing a small set out of a given set of features that capture the relevant properties of the data. In the context of supervised classification problems the relevance is determined by the given labels on the training data. A good choice of features is a key for building compact and accurate classifiers. In this paper we… (More)

- Noam Slonim, Naftali Tishby
- SIGIR
- 2000

We present a novel implementation of the recently introduced <i>information bottleneck method</i> for unsupervised document clustering. Given a joint empirical distribution of words and documents, <i>p</i>(<i>x</i>, <i>y</i>), we first cluster the words, <i>Y</i>, so that the obtained word clusters, Ytilde;, maximally preserve the information on the… (More)

- Noam Slonim, Nir Friedman, Naftali Tishby
- SIGIR
- 2002

We present a novel sequential clustering algorithm which is motivated by the <i>Information Bottleneck (IB)</i> method. In contrast to the agglomerative <i>IB</i> algorithm, the new sequential (<i>sIB</i>) approach is guaranteed to converge to a local maximum of the information with time and space complexity typically linear in the data size. information,… (More)

- William Bialek, Ilya Nemenman, Naftali Tishby
- Neural Computation
- 2001

We define predictive information I(pred)(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T:I(pred)(T) can remain finite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a finite… (More)