• Corpus ID: 31182916

Predictability , Complexity , and Learning

  title={Predictability , Complexity , and Learning},
  author={A. U.S.},
  • A. U.S.
  • Published 2002
  • Computer Science
We deŽne predictive information Ipred(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T: Ipred(T) can remain Žnite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a Žnite number of parameters, then Ipred(T) grows logarithmically with a coefŽcient that counts the dimensionality of the model space. In contrast, power… 

Figures from this paper

Quantifying Emergence in Terms of Persistent Mutual Information
We define Persistent Mutual Information (PMI) as the Mutual (Shannon) Information between the past history of a system and its evolution significantly later in the future. This quantifies how much
Comparing Information-Theoretic Measures of Complexity in Boltzmann Machines
This work numerically measures how complexity changes as a function of network dynamics and network parameters and applies an extension of one such information-theoretic measure of complexity to understand incremental Hebbian learning in Hopfield networks, a fully recurrent architecture model of autoassociation memory.
Spectral Simplicity of Apparent Complexity, Part I: The Nondiagonalizable Metadynamics of Prediction
The first closed-form expressions for complexity measures, couched either in terms of the Drazin inverse or the eigenvalues and projection operators of the appropriate transition dynamic of the recently introduced meromorphic functional calculus are established.
Signatures of Infinity: Nonergodicity and Resource Scaling in Prediction, Complexity, and Learning
The result is an alternative view of the relationship between predictability, complexity, and learning that highlights the distinct ways in which informational and correlational divergences arise in complex ergodic and nonergodic processes.
Structure or Noise?
It is shown how rate-distortion theory provides a mechanism for automated theory building by naturally distinguishing between regularity and randomness by constructing an objective function for model making whose extrema embody the trade-off between a model's structural complexity and its predictive power.
Optimal Causal Inference
This work establishes that the optimal causal filtering method leads to a graded model-complexity hierarchy of approximations to the causal architecture, and shows for nonideal cases with finite data that the correct number of states can be found by adjusting for statistical fluctuations in probability estimates.
Information dynamics: patterns of expectation and surprise in the perception of music
It is proposed that the use of several time-varying information measures, computed in the context of a probabilistic model that evolves as a sample of the process unfolds, as a way to characterise temporal structure in music could form the basis of a theoretically coherent yet computationally plausible model of human perception of formal structure.
Trimming the Independent Fat: Sufficient Statistics, Mutual Information, and Predictability from Effective Channel States
It is demonstrated that this is in fact possible: the information X's minimal sufficient statistic preserves about Y is exactly the information that Y's minimal necessary statistic preservesAbout X, equivalent to the mutual information between the forward- and reverse-time causal states of computational mechanics.
Predictive Information in a Nonequilibrium Critical Model
We propose predictive information, that is, information between a long past of duration T and the entire infinitely long future of a time series, as a general order parameter to study phase
Predictive PAC Learning and Process Decompositions
It is argued that it is natural in predictive PAC to condition not on the past observations but on the mixture component of the sample path, and a novel PAC generalization bound for mixtures of learnable processes with a generalization error that is not worse than that of each mixture component.


Information theory and learning: a physical approach
It is proved that predictive information provides the unique measure for the complexity of dynamics underlying the time series and there are classes of models characterized by {\em power-law growth of the predictive information} that are qualitatively more complex than any of the systems that have been investigated before.
Bounds for predictive errors in the statistical mechanics of supervised learning.
Within a Bayesian framework, by generalizing inequalities known from statistical mechanics, general upper and lower bounds for a cumulative entropic error are calculated, which measures the success in the supervised learning of an unknown rule from examples, and find that the information gain from observing a new example decreases universally like d/m.
Information-theoretic asymptotics of Bayes methods
The authors examine the relative entropy distance D/sub n/ between the true density and the Bayesian density and show that the asymptotic distance is (d/2)(log n)+c, where d is the dimension of the parameter vector.
Assume {P θ : θ ∈ Θ} is a set of probability distributions with a common dominating measure on a complete separable metric space Y. A state θ * ∈Θ is chosen by Nature. A statistician obtains n
Statistical Inference, Occam's Razor, and Statistical Mechanics on the Space of Probability Distributions
A precise understanding of how Occam's razor, the principle that simpler models should be preferred until the data justify more complex models, is automatically embodied by probability theory is arrived at.
Mutual Information, Fisher Information, and Population Coding
It is shown that in the context of population coding, the mutual information between the activity of a large array of neurons and a stimulus to which the neurons are tuned is naturally related to the Fisher information.
Unsupervised and supervised learning: Mutual information between parameters and observations
The exact bounds and asymptotic behaviors for the mutual information as a function of the data size and of some properties of the probability of theData given the parameter are derived.
General bounds on the mutual information between a parameter and n conditionally independent observations
B bounds are given in terms of the metric and information dimensions of the parameter space with respect to the Hellinger distance and the supremum of the mutual information over choices of the prior dis tribution is bound.
Universal coding, information, prediction, and estimation
A connection between universal codes and the problems of prediction and statistical estimation is established. A known lower bound for the mean length of universal codes is sharpened and generalized,