Minimum description length induction, Bayesianism, and Kolmogorov complexity

  title={Minimum description length induction, Bayesianism, and Kolmogorov complexity},
  author={Paul M. B. Vit{\'a}nyi and Ming Li},
The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles minimum description length (MDL) and minimum message length (MML), abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the fundamental inequality, which in broad terms states that the principle is… 

MDL induction, Bayesianism, and Kolmogorov complexity

  • P. VitányiMing Li
  • Computer Science
    Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252)
  • 1998
The relationship between the Bayesian approach and the minimum description length approach is established and shows that data compression is almost always the best strategy, both in hypothesis identification and prediction.

Advances in Minimum Description Length: Theory and Applications

Advances in Minimum Description Length is a sourcebook that will introduce the scientific community to the foundations of MDL, recent theoretical advances, and practical applications, and examples of how to apply MDL in research settings that range from bioinformatics and machine learning to psychology.

Simplicity, information, Kolmogorov complexity and prediction

The relation between data compression and learning is treated and it is shown that compression is almost always the best strategy, both in hypotheses identiication by using the minimum description length (MDL) principle and in prediction methods in the style of R. Solomonoo.

Kolmogorov's structure functions and model selection

The goodness-of-fit of an individual model with respect to individual data is precisely quantify and it is shown that-within the obvious constraints-every graph is realized by the structure function of some data.

Meaningful Information

  • P. Vitányi
  • Computer Science, Mathematics
    IEEE Transactions on Information Theory
  • 2006
The theory of recursive functions statistic, the maximum and minimum value, the existence of absolutely nonstochastic objects (that have maximal sophistication-all the information in them is meaningful and there is no residual randomness), and the relation to the halting problem and further algorithmic properties are developed.

Kolmogorov's structure functions with an application to the foundations of model selection

  • N. VereshchaginP. Vitányi
  • Computer Science, Mathematics
    The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings.
  • 2002
Kolmogorov (1974) proposed a non-probabilistic approach to statistics, an individual combinatorial relation between the data and its model. We vindicate, for the first time, the rightness of the

Algorithmic statistics

The algorithmic theory of statistic, sufficient statistic, and minimal sufficient statistic is developed and it is shown that a function is a probabilistic sufficient statistic iff it is with high probability (in an appropriate sense) an algorithmic sufficient statistic.

Minimum Description Length Revisited

This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine

Applying MDL to Learning Best Model Granularity

This work test how the theory of the Minimum Description Length behaves in practice on a general problem in model selection: that of learning the best model granularity, which depends critically on the granularity of the parameters.

On the Convergence Speed of MDL Predictions for Bernoulli Sequences

A new upper bound on the prediction error for countable Bernoulli classes is derived, which implies a small bound (comparable to the one for Bayes mixtures) for certain important model classes.




If the authors let Pu(x) = Pr{U prints x} be the probability that a given computer U prints x when given a random program, it can be shown that log(1/Pu(x)) - K( x) for all x, thus establishing a vital link between the "universal" probability measure Pu and the " universal" complexity K.

The Definition of Random Sequences

The Minimum Description Length Principle in Coding and Modeling

The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms.

Complexity-based induction systems: Comparisons and convergence theorems

Levin has shown that if tilde{P}'_{M}(x) is an unnormalized form of this measure, and P( x) is any computable probability measure on strings, x, then \tilde{M}'_M}\geqCP (x) where C is a constant independent of x .

Inductive reasoning and Kolmogorov complexity

  • Ming LiP. Vitányi
  • Computer Science
    [1989] Proceedings. Structure in Complexity Theory Fourth Annual Conference
  • 1989
The thesis is developed that Solomonoff's method is fundamental in the sense that many other induction principles can be viewed as particular ways to obtain computable approximations to it.

Minimum complexity density estimation

An index of resolvability is proved to bound the rate of convergence of minimum complexity density estimators as well as the information-theoretic redundancy of the corresponding total description length to demonstrate the statistical effectiveness of the minimum description-length principle as a method of inference.

An Introduction to Kolmogorov Complexity and Its Applications

The book presents a thorough treatment of the central ideas and their applications of Kolmogorov complexity with a wide range of illustrative applications, and will be ideal for advanced undergraduate students, graduate students, and researchers in computer science, mathematics, cognitive sciences, philosophy, artificial intelligence, statistics, and physics.

Learning about the Parameter of the Bernoulli Model

  • V. Vovk
  • Computer Science, Mathematics
    J. Comput. Syst. Sci.
  • 1997
We consider the problem of learning as much information as possible about the parameter?of the Bernoulli model {P????0, 1} from the statistical datax?{0, 1}n,n?1 being the sample size. Explicating

Kolmogorov Complexity, Data Compression, and Inference

If a sequence of random variables has Shannon entropy H, it is well known that there exists an efficient description of this sequence which requires only H bits. But the entropy H of a sequence also