# Kolmogorov's structure functions and model selection

@article{Vereshchagin2002KolmogorovsSF, title={Kolmogorov's structure functions and model selection}, author={Nikolai K. Vereshchagin and Paul M. B. Vit{\'a}nyi}, journal={IEEE Transactions on Information Theory}, year={2002}, volume={50}, pages={3265-3290} }

In 1974, Kolmogorov proposed a nonprobabilistic approach to statistics and model selection. Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal (Kolmogorov) complexity. The "structure function" of the given data expresses the relation between the complexity level constraint on a model class and the least log-cardinality of a model in the class containing the data. We show that the structure function…

## 158 Citations

### Few Paths, Fewer Words: Model Selection With Automatic Structure Functions

- Computer ScienceExp. Math.
- 2019

This work replaces Turing machines by finite automata and Kolmogorov complexity by Shallit and Wang’s automatic complexity by using structure functions to solve the problem of finding an optimal statistical model for a given binary string.

### A Monotone Modal Logic for Algorithmic Statistics

- Computer Science
- 2013

Algorithmic models, a broader notion of modeling in the context of Algorithmic Statistics, are introduced, and it is shown that algorithmic models are stable over all simple total recursive functions.

### Algorithmic Statistics: Normal Objects and Universal Models

- Mathematics, Computer ScienceCSR
- 2016

This paper shows that there are "many types" of normal strings and states that there is a normal object x such that all models $$S_{ij}$$ are not strong for x.

### Meaningful Information

- Computer Science, MathematicsIEEE Transactions on Information Theory
- 2006

The theory of recursive functions statistic, the maximum and minimum value, the existence of absolutely nonstochastic objects (that have maximal sophistication-all the information in them is meaningful and there is no residual randomness), and the relation to the halting problem and further algorithmic properties are developed.

### Facticity as the amount of self-descriptive information in a data set

- Computer ScienceArXiv
- 2012

This approach overcomes problems with earlier proposals to use two-part code to define the meaningfulness or usefulness of a data set by proving that facticity is definite.

### The cluster structure function

- Computer ScienceArXiv
- 2022

The optimal clustering is the one selected by analyzing the cluster structure function, which maps the number of parts of a partition to values related to the deﬁciencies of being good models by the parts.

### A Similarity Measure Using Smallest Context-Free Grammars

- Computer Science2010 Data Compression Conference
- 2010

A new complexity approximation for the Kolmogorov complexity of strings based on compression with smallest Context Free Grammars is presented, which takes into account the size of the string model, in a representation similar to the Minimum Description Length.

### Using MDL for Grammar Induction

- Computer ScienceICGI
- 2006

It is proved that, in DFA induction, already as a result of a single deterministic merge of two nodes, divergence of randomness deficiency and MDL code can occur, which shows why the applications of MDL to grammar induction so far have been disappointing.

### Some Properties of Antistochastic Strings

- Computer Science, MathematicsTheory of Computing Systems
- 2016

This paper demonstrates that the antistochastic strings have the following property (Theorem 6): if an antist Cochastic string x has complexity k, then any k bit of information about x are enough to reconstruct x (with logarithmic advice).

### A Computational Theory of Meaning

- Computer Science
- 2017

A Fregean theory of computational semantics that unifies various approaches to the analysis of the concept of information: a meaning of an object is a routine that computes it and the tension between Shannon Information and Kolmogorov Complexity is addressed.

## References

SHOWING 1-10 OF 52 REFERENCES

### Minimum description length induction, Bayesianism, and Kolmogorov complexity

- Computer ScienceIEEE Trans. Inf. Theory
- 2000

In general, it is shown that data compression is almost always the best strategy, both in model selection and prediction.

### Meaningful Information

- Computer Science, MathematicsIEEE Transactions on Information Theory
- 2006

The theory of recursive functions statistic, the maximum and minimum value, the existence of absolutely nonstochastic objects (that have maximal sophistication-all the information in them is meaningful and there is no residual randomness), and the relation to the halting problem and further algorithmic properties are developed.

### Algorithmic statistics

- Computer ScienceIEEE Trans. Inf. Theory
- 2001

The algorithmic theory of statistic, sufficient statistic, and minimal sufficient statistic is developed and it is shown that a function is a probabilistic sufficient statistic iff it is with high probability (in an appropriate sense) an algorithmic sufficient statistic.

### Kolmogorov's structure function for probability models

- Computer Science, MathematicsProceedings of the IEEE Information Theory Workshop
- 2002

The extension of Kolmogorov's great ideas to probability model classes turns out to add a new chapter to the MDL theory, which also provides an alternative approach to Shannon's rate-distortion theory.

### A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH

- Mathematics
- 1983

of the number of bits required to write down the observed data, has been reformulated to extend the classical maximum likelihood principle. The principle permits estimation of the number of the…

### KOLMOGOROV'S CONTRIBUTIONS TO INFORMATION THEORY AND ALGORITHMIC COMPLEXITY

- Mathematics, Computer Science
- 1989

If the authors let Pu(x) = Pr{U prints x} be the probability that a given computer U prints x when given a random program, it can be shown that log(1/Pu(x)) - K( x) for all x, thus establishing a vital link between the "universal" probability measure Pu and the " universal" complexity K.

### Algorithmic Complexity and Stochastic Properties of Finite Binary Sequences

- Computer Science, MathematicsComput. J.
- 1999

This paper is a survey of concepts and results related to simple Kolmogorov complexity, prefix complexity and resource bounded complexity. We consider also a new type of complexity statistical…

### Learning about the Parameter of the Bernoulli Model

- Computer Science, MathematicsJ. Comput. Syst. Sci.
- 1997

We consider the problem of learning as much information as possible about the parameter?of the Bernoulli model {P????0, 1} from the statistical datax?{0, 1}n,n?1 being the sample size. Explicating…