# The Information Sieve

@article{Steeg2016TheIS, title={The Information Sieve}, author={Greg Ver Steeg and A. G. Galstyan}, journal={ArXiv}, year={2016}, volume={abs/1507.02284} }

We introduce a new framework for unsupervised learning of representations based on a novel hierarchical decomposition of information. Intuitively, data is passed through a series of progressively fine-grained sieves. Each layer of the sieve recovers a single latent factor that is maximally informative about multivariate dependence in the data. The data is transformed after each pass so that the remaining unexplained information trickles down to the next layer. Ultimately, we are left with a set…

## Figures and Tables from this paper

## 17 Citations

### Sifting Common Information from Many Variables

- Computer ScienceIJCAI
- 2017

This work uses the recently introduced information sieve decomposition to formulate an incremental version of the common information problem that admits a simple fixed point solution, fast convergence, and complexity that is linear in the number of variables.

### Learning Structured Latent Factors from Dependent Data: A Generative Model Framework from Information-Theoretic Perspective

- Computer ScienceICML
- 2020

This paper presents a novel framework for learning generative models with various underlying structures in the latent space that represents the inductive bias in the form of mask variables to model the dependency structure in the graphical model and extends the theory of multivariate information bottleneck to enforce it.

### On the Estimation of Mutual Information

- Computer Science, MathematicsProceedings
- 2020

This paper explores the robustness of this family of estimators like the one developed by Kraskov et al., and its improved versions in the context of the design criteria.

### The Design of Mutual Information as a Global Correlation Quantifier

- Computer Science
- 2019

The design derivation allows us to improve the notion and efficacy of statistical sufficiency by expressing it in terms of a normalized MI that represents the percentage in which a statistic or transformation is a sufficient.

### The Design of Global Correlation Quantifiers and Continuous Notions of Statistical Sufficiency

- Computer ScienceEntropy
- 2020

Using first principles from inference, a set of functionals are found to be uniquely capable of determining whether a certain class of inferential transformations, ρ→∗ρ′, preserve, destroy or create correlations.

### Multivariate Extension of Matrix-Based Rényi's $\alpha$α-Order Entropy Functional

- Computer ScienceIEEE Trans. Pattern Anal. Mach. Intell.
- 2020

This paper defines the matrix-based Renyi's \alpha-order joint entropy among multiple variables and shows how this definition can ease the estimation of various information quantities that measure the interactions among several variables, such as interactive information and total correlation.

### Multivariate Extension of Matrix-based Renyi's α-order Entropy Functional

- Computer ScienceIEEE transactions on pattern analysis and machine intelligence
- 2019

This paper defines the matrix-based Renyi's α-order joint entropy among multiple variables and shows how this definition can ease the estimation of various information quantities that measure the interactions amongmultiple variables, such as interactive information and total correlation.

### Nonparanormal Information Estimation

- Computer Science, MathematicsICML
- 2017

This work proposes estimators for mutual information when p is assumed to be a nonparanormal (a.k.a., Gaussian copula) model, a semiparametric compromise between Gaussian and nonparametric extremes, and shows these estimators strike a practical balance between robustness and scaling with dimensionality.

### Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation

- Computer ScienceIEEE Transactions on Information Theory
- 2018

A unified way is obtained of obtaining a unified way of obtaining the kernel and NN estimators, and it is shown that the asymptotic bias of the proposed estimator is universal; it can be precomputed and subtracted from the estimate.

## References

SHOWING 1-10 OF 35 REFERENCES

### Maximally Informative Hierarchical Representations of High-Dimensional Data

- Computer ScienceAISTATS
- 2015

A new approach to unsupervised learning of deep representations that is both principled and practical is established and demonstrates the usefulness of the approach on both synthetic and real-world data.

### Sifting Common Information from Many Variables

- Computer ScienceIJCAI
- 2017

This work uses the recently introduced information sieve decomposition to formulate an incremental version of the common information problem that admits a simple fixed point solution, fast convergence, and complexity that is linear in the number of variables.

### Discovering Structure in High-Dimensional Data Through Correlation Explanation

- Computer ScienceNIPS
- 2014

It is demonstrated that Correlation Explanation automatically discovers meaningful structure for data from diverse sources including personality tests, DNA, and human language.

### Representation Learning: A Review and New Perspectives

- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2013

Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.

### An Information-Maximization Approach to Blind Separation and Blind Deconvolution

- Computer ScienceNeural Computation
- 1995

It is suggested that information maximization provides a unifying framework for problems in "blind" signal processing and dependencies of information transfer on time delays are derived.

### Nonnegative Decomposition of Multivariate Information

- Computer ScienceArXiv
- 2010

This work reconsider from first principles the general structure of the information that a set of sources provides about a given variable and proposes a definition of partial information atoms that exhaustively decompose the Shannon information in a multivariate system in terms of the redundancy between synergies of subsets of the sources.

### Deep learning and the information bottleneck principle

- Computer Science2015 IEEE Information Theory Workshop (ITW)
- 2015

It is argued that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer.

### Learning Representations by Maximizing Compression

- Computer ScienceArXiv
- 2011

We give an algorithm that learns a representation of data through compression. The algorithm 1) predicts bits sequentially from those previously seen and 2) has a structure and a number of…

### Reducing the Dimensionality of Data with Neural Networks

- Computer ScienceScience
- 2006

This work describes an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

### Demystifying Information-Theoretic Clustering

- Computer ScienceICML
- 2014

This work proposes a novel method for clustering data which is grounded in information-theoretic principles and requires no parametric assumptions, and returns to the axiomatic foundations of information theory to define a meaningful clustering measure based on the notion of consistency under coarse-grained data.