# Turing-Universal Learners with Optimal Scaling Laws

@article{Nakkiran2021TuringUniversalLW, title={Turing-Universal Learners with Optimal Scaling Laws}, author={Preetum Nakkiran}, journal={ArXiv}, year={2021}, volume={abs/2111.05321} }

For a given distribution, learning algorithm, and performance metric, the rate of convergence (or datascaling law) is the asymptotic behavior of the algorithm’s test performance as a function of number of train samples. Many learning methods in both theory and practice have power-law rates, i.e. performance scales as n−α for some α > 0. Moreover, both theoreticians and practitioners are concerned with improving the rates of their learning algorithms under settings of interest. We observe the…

## References

SHOWING 1-10 OF 28 REFERENCES

### A theory of universal learning

- Computer ScienceSTOC
- 2021

There are only three possible rates of universal learning, which aims to understand the performance of learning algorithms on every data distribution, but without requiring uniformity over the distribution: exponential, linear, or arbitrarily slow rates.

### Learning Curve Theory

- Computer ScienceArXiv
- 2021

This work develops and theoretically analyse the simplest possible (toy) model that can exhibit n−β learning curves for arbitrary power β > 0, and determines whether power laws are universal or depend on the data distribution.

### Complexity-based induction systems: Comparisons and convergence theorems

- Computer ScienceIEEE Trans. Inf. Theory
- 1978

Levin has shown that if tilde{P}'_{M}(x) is an unnormalized form of this measure, and P( x) is any computable probability measure on strings, x, then \tilde{M}'_M}\geqCP (x) where C is a constant independent of x .

### The Shape of Learning Curves: a Review

- Computer ScienceArXiv
- 2021

This review recounts the origins of the term, provides a formal definition of the learning curve, and provides a comprehensive overview of the literature regarding the shape of learning curves.

### Asymptotic learning curves of kernel methods: empirical data versus teacher–student paradigm

- Computer ScienceJournal of Statistical Mechanics: Theory and Experiment
- 2020

This work measures β when applying kernel methods to real datasets, and argues that these rather large exponents are possible due to the small effective dimension of the data.

### Any Discrimination Rule Can Have an Arbitrarily Bad Probability of Error for Finite Sample Size

- MathematicsIEEE Transactions on Pattern Analysis and Machine Intelligence
- 1982

Any attempt to find a nontrivial distribution-free upper bound for Rn will fail, and any results on the rate of convergence of Rn to R* must use assumptions about the distribution of (X, Y).

### Learning from examples in large neural networks.

- Computer SciencePhysical review letters
- 1990

Numerical results on training in layered neural networks indicate that the generalization error improves gradually in some cases, and sharply in others, and statistical mechanics is used to study generalization curves in large layered networks.

### Understanding Machine Learning - From Theory to Algorithms

- Computer Science
- 2014

The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way in an advanced undergraduate or beginning graduate course.

### Deep Learning Scaling is Predictable, Empirically

- Computer ScienceArXiv
- 2017

A large scale empirical characterization of generalization error and model size growth as training sets grow is presented and it is shown that model size scales sublinearly with data size.

### A Theory of Universal Artificial Intelligence based on Algorithmic Complexity

- Computer ScienceArXiv
- 2000

This work constructs a modified algorithm AI tl, which is still eectively more intelligent than any other time t and space l bounded agent, and gives strong arguments that the resulting AI model is the most intelligent unbiased agent possible.