Learn More
We introduce a novel distributional clustering algorithm that explicitly maximizes the mutual information per cluster between the data and given categories. This algorithm can be considered as a bottom up hard version of the recently introduced “Information Bottleneck Method”. We relate the mutual information between clusters and categories to the Bayesian(More)
We present a novel implementation of the recently introduced <i>information bottleneck method</i> for unsupervised document clustering. Given a joint empirical distribution of words and documents, <i>p</i>(<i>x</i>, <i>y</i>), we first cluster the words, <i>Y</i>, so that the obtained word clusters, Ytilde;, maximally preserve the information on the(More)
We present a novel sequential clustering algorithm which is motivated by the <i>Information Bottleneck (IB)</i> method. In contrast to the agglomerative <i>IB</i> algorithm, the new sequential (<i>sIB</i>) approach is guaranteed to converge to a local maximum of the information with time and space complexity typically linear in the data size. information,(More)
Deciphering the noncoding regulatory genome has proved a formidable challenge. Despite the wealth of available gene expression data, there currently exists no broadly applicable method for characterizing the regulatory elements that shape the rich underlying dynamics. We present a general framework for detecting such regulatory DNA and RNA motifs that(More)
The information bottleneck (IB) method is an unsupervised model independent data organization technique. Given a joint distribution, p(X, Y), this method constructs a new variable, T, that extracts partitions, or clusters, over the values of X that are informative about Y. Algorithms that are motivated by the IB method have already been applied to text(More)
Addition of glucose to yeast cells increases their growth rate and results in a massive restructuring of their transcriptional output. We have used microarray analysis in conjunction with conditional mutations to obtain a systems view of the signaling network responsible for glucose-induced transcriptional changes. We found that several well-studied(More)
While discussing a concrete controversial topic, most humans will find it challenging to swiftly raise a diverse set of convincing and relevant claims that should set the basis of their arguments. Here, we formally define the challenging task of automatic claim detection in a given context and discuss its associated unique difficulties. Further, we outline(More)
We address the practical problems of estimating the information relations that characterize large networks. Building on methods developed for analysis of the neural code, we show that reliable estimates of mutual information can be obtained with manageable computational effort. The same methods allow estimation of higher order, multi–information terms.(More)