• Corpus ID: 8829881

The Information Sieve

@article{Steeg2016TheIS,
  title={The Information Sieve},
  author={Greg Ver Steeg and A. G. Galstyan},
  journal={ArXiv},
  year={2016},
  volume={abs/1507.02284}
}
We introduce a new framework for unsupervised learning of representations based on a novel hierarchical decomposition of information. Intuitively, data is passed through a series of progressively fine-grained sieves. Each layer of the sieve recovers a single latent factor that is maximally informative about multivariate dependence in the data. The data is transformed after each pass so that the remaining unexplained information trickles down to the next layer. Ultimately, we are left with a set… 

Sifting Common Information from Many Variables

TLDR
This work uses the recently introduced information sieve decomposition to formulate an incremental version of the common information problem that admits a simple fixed point solution, fast convergence, and complexity that is linear in the number of variables.

Learning Structured Latent Factors from Dependent Data: A Generative Model Framework from Information-Theoretic Perspective

TLDR
This paper presents a novel framework for learning generative models with various underlying structures in the latent space that represents the inductive bias in the form of mask variables to model the dependency structure in the graphical model and extends the theory of multivariate information bottleneck to enforce it.

On the Estimation of Mutual Information

TLDR
This paper explores the robustness of this family of estimators like the one developed by Kraskov et al., and its improved versions in the context of the design criteria.

The Design of Mutual Information as a Global Correlation Quantifier

TLDR
The design derivation allows us to improve the notion and efficacy of statistical sufficiency by expressing it in terms of a normalized MI that represents the percentage in which a statistic or transformation is a sufficient.

The Design of Global Correlation Quantifiers and Continuous Notions of Statistical Sufficiency

TLDR
Using first principles from inference, a set of functionals are found to be uniquely capable of determining whether a certain class of inferential transformations, ρ→∗ρ′, preserve, destroy or create correlations.

PCANet: An energy perspective

Multivariate Extension of Matrix-Based Rényi's $\alpha$α-Order Entropy Functional

TLDR
This paper defines the matrix-based Renyi's \alpha-order joint entropy among multiple variables and shows how this definition can ease the estimation of various information quantities that measure the interactions among several variables, such as interactive information and total correlation.

Multivariate Extension of Matrix-based Renyi's α-order Entropy Functional

TLDR
This paper defines the matrix-based Renyi's α-order joint entropy among multiple variables and shows how this definition can ease the estimation of various information quantities that measure the interactions amongmultiple variables, such as interactive information and total correlation.

Nonparanormal Information Estimation

TLDR
This work proposes estimators for mutual information when p is assumed to be a nonparanormal (a.k.a., Gaussian copula) model, a semiparametric compromise between Gaussian and nonparametric extremes, and shows these estimators strike a practical balance between robustness and scaling with dimensionality.

Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation

TLDR
A unified way is obtained of obtaining a unified way of obtaining the kernel and NN estimators, and it is shown that the asymptotic bias of the proposed estimator is universal; it can be precomputed and subtracted from the estimate.

References

SHOWING 1-10 OF 35 REFERENCES

Maximally Informative Hierarchical Representations of High-Dimensional Data

TLDR
A new approach to unsupervised learning of deep representations that is both principled and practical is established and demonstrates the usefulness of the approach on both synthetic and real-world data.

Sifting Common Information from Many Variables

TLDR
This work uses the recently introduced information sieve decomposition to formulate an incremental version of the common information problem that admits a simple fixed point solution, fast convergence, and complexity that is linear in the number of variables.

Discovering Structure in High-Dimensional Data Through Correlation Explanation

TLDR
It is demonstrated that Correlation Explanation automatically discovers meaningful structure for data from diverse sources including personality tests, DNA, and human language.

Representation Learning: A Review and New Perspectives

TLDR
Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.

An Information-Maximization Approach to Blind Separation and Blind Deconvolution

TLDR
It is suggested that information maximization provides a unifying framework for problems in "blind" signal processing and dependencies of information transfer on time delays are derived.

Nonnegative Decomposition of Multivariate Information

TLDR
This work reconsider from first principles the general structure of the information that a set of sources provides about a given variable and proposes a definition of partial information atoms that exhaustively decompose the Shannon information in a multivariate system in terms of the redundancy between synergies of subsets of the sources.

Deep learning and the information bottleneck principle

TLDR
It is argued that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer.

Learning Representations by Maximizing Compression

We give an algorithm that learns a representation of data through compression. The algorithm 1) predicts bits sequentially from those previously seen and 2) has a structure and a number of

Reducing the Dimensionality of Data with Neural Networks

TLDR
This work describes an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

Demystifying Information-Theoretic Clustering

TLDR
This work proposes a novel method for clustering data which is grounded in information-theoretic principles and requires no parametric assumptions, and returns to the axiomatic foundations of information theory to define a meaningful clustering measure based on the notion of consistency under coarse-grained data.