• Corpus ID: 8829881

# The Information Sieve

@article{Steeg2016TheIS,
title={The Information Sieve},
author={Greg Ver Steeg and A. G. Galstyan},
journal={ArXiv},
year={2016},
volume={abs/1507.02284}
}
• Published 8 July 2015
• Computer Science
• ArXiv
We introduce a new framework for unsupervised learning of representations based on a novel hierarchical decomposition of information. Intuitively, data is passed through a series of progressively fine-grained sieves. Each layer of the sieve recovers a single latent factor that is maximally informative about multivariate dependence in the data. The data is transformed after each pass so that the remaining unexplained information trickles down to the next layer. Ultimately, we are left with a set…
17 Citations

## Figures and Tables from this paper

### Sifting Common Information from Many Variables

• Computer Science
IJCAI
• 2017
This work uses the recently introduced information sieve decomposition to formulate an incremental version of the common information problem that admits a simple fixed point solution, fast convergence, and complexity that is linear in the number of variables.

### Learning Structured Latent Factors from Dependent Data: A Generative Model Framework from Information-Theoretic Perspective

• Computer Science
ICML
• 2020
This paper presents a novel framework for learning generative models with various underlying structures in the latent space that represents the inductive bias in the form of mask variables to model the dependency structure in the graphical model and extends the theory of multivariate information bottleneck to enforce it.

### On the Estimation of Mutual Information

• Computer Science, Mathematics
Proceedings
• 2020
This paper explores the robustness of this family of estimators like the one developed by Kraskov et al., and its improved versions in the context of the design criteria.

### The Design of Mutual Information as a Global Correlation Quantifier

• Computer Science
• 2019
The design derivation allows us to improve the notion and efficacy of statistical sufficiency by expressing it in terms of a normalized MI that represents the percentage in which a statistic or transformation is a sufficient.

### The Design of Global Correlation Quantifiers and Continuous Notions of Statistical Sufficiency

• Computer Science
Entropy
• 2020
Using first principles from inference, a set of functionals are found to be uniquely capable of determining whether a certain class of inferential transformations, ρ→∗ρ′, preserve, destroy or create correlations.

### Multivariate Extension of Matrix-Based Rényi's $\alpha$α-Order Entropy Functional

• Computer Science
IEEE Trans. Pattern Anal. Mach. Intell.
• 2020
This paper defines the matrix-based Renyi's \alpha-order joint entropy among multiple variables and shows how this definition can ease the estimation of various information quantities that measure the interactions among several variables, such as interactive information and total correlation.

### Multivariate Extension of Matrix-based Renyi's α-order Entropy Functional

• Computer Science
IEEE transactions on pattern analysis and machine intelligence
• 2019
This paper defines the matrix-based Renyi's α-order joint entropy among multiple variables and shows how this definition can ease the estimation of various information quantities that measure the interactions amongmultiple variables, such as interactive information and total correlation.

### Nonparanormal Information Estimation

• Computer Science, Mathematics
ICML
• 2017
This work proposes estimators for mutual information when p is assumed to be a nonparanormal (a.k.a., Gaussian copula) model, a semiparametric compromise between Gaussian and nonparametric extremes, and shows these estimators strike a practical balance between robustness and scaling with dimensionality.

### Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation

• Computer Science
IEEE Transactions on Information Theory
• 2018
A unified way is obtained of obtaining a unified way of obtaining the kernel and NN estimators, and it is shown that the asymptotic bias of the proposed estimator is universal; it can be precomputed and subtracted from the estimate.

## References

SHOWING 1-10 OF 35 REFERENCES

### Maximally Informative Hierarchical Representations of High-Dimensional Data

• Computer Science
AISTATS
• 2015
A new approach to unsupervised learning of deep representations that is both principled and practical is established and demonstrates the usefulness of the approach on both synthetic and real-world data.

### Sifting Common Information from Many Variables

• Computer Science
IJCAI
• 2017
This work uses the recently introduced information sieve decomposition to formulate an incremental version of the common information problem that admits a simple fixed point solution, fast convergence, and complexity that is linear in the number of variables.

### Discovering Structure in High-Dimensional Data Through Correlation Explanation

• Computer Science
NIPS
• 2014
It is demonstrated that Correlation Explanation automatically discovers meaningful structure for data from diverse sources including personality tests, DNA, and human language.

### Representation Learning: A Review and New Perspectives

• Computer Science
IEEE Transactions on Pattern Analysis and Machine Intelligence
• 2013
Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.

### An Information-Maximization Approach to Blind Separation and Blind Deconvolution

• Computer Science
Neural Computation
• 1995
It is suggested that information maximization provides a unifying framework for problems in "blind" signal processing and dependencies of information transfer on time delays are derived.

### Nonnegative Decomposition of Multivariate Information

• Computer Science
ArXiv
• 2010
This work reconsider from first principles the general structure of the information that a set of sources provides about a given variable and proposes a definition of partial information atoms that exhaustively decompose the Shannon information in a multivariate system in terms of the redundancy between synergies of subsets of the sources.

### Deep learning and the information bottleneck principle

• Computer Science
2015 IEEE Information Theory Workshop (ITW)
• 2015
It is argued that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer.

### Learning Representations by Maximizing Compression

• Computer Science
ArXiv
• 2011
We give an algorithm that learns a representation of data through compression. The algorithm 1) predicts bits sequentially from those previously seen and 2) has a structure and a number of

### Reducing the Dimensionality of Data with Neural Networks

• Computer Science
Science
• 2006
This work describes an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

### Demystifying Information-Theoretic Clustering

• Computer Science
ICML
• 2014
This work proposes a novel method for clustering data which is grounded in information-theoretic principles and requires no parametric assumptions, and returns to the axiomatic foundations of information theory to define a meaningful clustering measure based on the notion of consistency under coarse-grained data.