Predictability , Complexity , and Learning
@inproceedings{US2002PredictabilityC, title={Predictability , Complexity , and Learning}, author={A. U.S.}, year={2002} }
We dene predictive information Ipred(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T: Ipred(T) can remain nite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a nite number of parameters, then Ipred(T) grows logarithmically with a coefcient that counts the dimensionality of the model space. In contrast, power…
321 Citations
Quantifying Emergence in Terms of Persistent Mutual Information
- Computer ScienceAdv. Complex Syst.
- 2010
We define Persistent Mutual Information (PMI) as the Mutual (Shannon) Information between the past history of a system and its evolution significantly later in the future. This quantifies how much…
Comparing Information-Theoretic Measures of Complexity in Boltzmann Machines
- Computer ScienceEntropy
- 2017
This work numerically measures how complexity changes as a function of network dynamics and network parameters and applies an extension of one such information-theoretic measure of complexity to understand incremental Hebbian learning in Hopfield networks, a fully recurrent architecture model of autoassociation memory.
Spectral Simplicity of Apparent Complexity, Part I: The Nondiagonalizable Metadynamics of Prediction
- MathematicsChaos
- 2018
The first closed-form expressions for complexity measures, couched either in terms of the Drazin inverse or the eigenvalues and projection operators of the appropriate transition dynamic of the recently introduced meromorphic functional calculus are established.
Signatures of Infinity: Nonergodicity and Resource Scaling in Prediction, Complexity, and Learning
- Computer SciencePhysical review. E, Statistical, nonlinear, and soft matter physics
- 2015
The result is an alternative view of the relationship between predictability, complexity, and learning that highlights the distinct ways in which informational and correlational divergences arise in complex ergodic and nonergodic processes.
Structure or Noise?
- Computer ScienceArXiv
- 2007
It is shown how rate-distortion theory provides a mechanism for automated theory building by naturally distinguishing between regularity and randomness by constructing an objective function for model making whose extrema embody the trade-off between a model's structural complexity and its predictive power.
Optimal Causal Inference
- Computer Science, MathematicsArXiv
- 2007
This work establishes that the optimal causal filtering method leads to a graded model-complexity hierarchy of approximations to the causal architecture, and shows for nonideal cases with finite data that the correct number of states can be found by adjusting for statistical fluctuations in probability estimates.
Information dynamics: patterns of expectation and surprise in the perception of music
- Computer ScienceConnect. Sci.
- 2009
It is proposed that the use of several time-varying information measures, computed in the context of a probabilistic model that evolves as a sample of the process unfolds, as a way to characterise temporal structure in music could form the basis of a theoretically coherent yet computationally plausible model of human perception of formal structure.
Trimming the Independent Fat: Sufficient Statistics, Mutual Information, and Predictability from Effective Channel States
- Computer SciencePhysical review. E
- 2017
It is demonstrated that this is in fact possible: the information X's minimal sufficient statistic preserves about Y is exactly the information that Y's minimal necessary statistic preservesAbout X, equivalent to the mutual information between the forward- and reverse-time causal states of computational mechanics.
Predictive Information in a Nonequilibrium Critical Model
- Physics
- 2013
We propose predictive information, that is, information between a long past of duration T and the entire infinitely long future of a time series, as a general order parameter to study phase…
Predictive PAC Learning and Process Decompositions
- Computer ScienceNIPS
- 2013
It is argued that it is natural in predictive PAC to condition not on the past observations but on the mixture component of the sample path, and a novel PAC generalization bound for mixtures of learnable processes with a generalization error that is not worse than that of each mixture component.
References
SHOWING 1-10 OF 69 REFERENCES
Information theory and learning: a physical approach
- Computer ScienceArXiv
- 2000
It is proved that predictive information provides the unique measure for the complexity of dynamics underlying the time series and there are classes of models characterized by {\em power-law growth of the predictive information} that are qualitatively more complex than any of the systems that have been investigated before.
Bounds for predictive errors in the statistical mechanics of supervised learning.
- Computer SciencePhysical review letters
- 1995
Within a Bayesian framework, by generalizing inequalities known from statistical mechanics, general upper and lower bounds for a cumulative entropic error are calculated, which measures the success in the supervised learning of an unknown rule from examples, and find that the information gain from observing a new example decreases universally like d/m.
Information-theoretic asymptotics of Bayes methods
- Computer ScienceIEEE Trans. Inf. Theory
- 1990
The authors examine the relative entropy distance D/sub n/ between the true density and the Bayesian density and show that the asymptotic distance is (d/2)(log n)+c, where d is the dimension of the parameter vector.
MUTUAL INFORMATION, METRIC ENTROPY AND CUMULATIVE RELATIVE ENTROPY RISK
- Mathematics
- 1997
Assume {P θ : θ ∈ Θ} is a set of probability distributions with a common dominating measure on a complete separable metric space Y. A state θ * ∈Θ is chosen by Nature. A statistician obtains n…
Statistical Inference, Occam's Razor, and Statistical Mechanics on the Space of Probability Distributions
- MathematicsNeural Computation
- 1997
A precise understanding of how Occam's razor, the principle that simpler models should be preferred until the data justify more complex models, is automatically embodied by probability theory is arrived at.
Mutual Information, Fisher Information, and Population Coding
- Computer ScienceNeural Computation
- 1998
It is shown that in the context of population coding, the mutual information between the activity of a large array of neurons and a stimulus to which the neurons are tuned is naturally related to the Fisher information.
Unsupervised and supervised learning: Mutual information between parameters and observations
- Computer Science
- 1999
The exact bounds and asymptotic behaviors for the mutual information as a function of the data size and of some properties of the probability of theData given the parameter are derived.
General bounds on the mutual information between a parameter and n conditionally independent observations
- Mathematics, Computer ScienceCOLT '95
- 1995
B bounds are given in terms of the metric and information dimensions of the parameter space with respect to the Hellinger distance and the supremum of the mutual information over choices of the prior dis tribution is bound.
Universal coding, information, prediction, and estimation
- Computer ScienceIEEE Trans. Inf. Theory
- 1984
A connection between universal codes and the problems of prediction and statistical estimation is established. A known lower bound for the mean length of universal codes is sharpened and generalized,…