Field Theoretical Analysis of On-line Learning of Probability Distributions

@article{Aida1999FieldTA,
  title={Field Theoretical Analysis of On-line Learning of Probability Distributions},
  author={Toshiaki Aida},
  journal={Physical Review Letters},
  year={1999},
  volume={83},
  pages={3554-3557}
}
  • T. Aida
  • Published 30 November 1999
  • Computer Science
  • Physical Review Letters
On-line learning of probability distributions is analyzed from the field theoretical point of view. We can obtain an optimal on-line learning algorithm, since renormalization group enables us to control the number of degrees of freedom of a system according to the number of examples. We do not learn parameters of a model, but probability distributions themselves. Therefore, the algorithm requires no a priori knowledge of a model. 

Figures from this paper

Adaptive on-line learning of probability distributions from field theories

  • T. Aida
  • Computer Science
    Proceedings 1999 International Conference on Information Intelligence and Systems (Cat. No.PR00446)
  • 1999
An adaptive algorithm is considered in on-line learning of probability functions, which infers a distribution underlying observed data that requires no a priori knowledge of a model.

Recognition and geometrical on-line learning algorithm of probability distributions

  • T. Aida
  • Computer Science
    Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium
  • 2000
An online learning algorithm for probability distributions is constructed in a reparameterization invariant form and can be optimal, since conformal gauge reduces the problem to a noncovariant case.

Information theory and learning: a physical approach

It is proved that predictive information provides the unique measure for the complexity of dynamics underlying the time series and there are classes of models characterized by {\em power-law growth of the predictive information} that are qualitatively more complex than any of the systems that have been investigated before.

Drift estimation from a simple field theory

Given the outcome of a Wiener process, what can be said about the drift and diffusion coefficients? If the process is stationary, these coefficients are related to the mean and variance of the

Scaling of a length scale for regression and prediction

  • T. Aida
  • Physics
    Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing
  • 2002
A model with a length scale to smooth the data is constructed, which decreases an uncertain region near a boundary as the speed of the variation of original signals increases, which is a crucial property for accurate prediction.

Predictability, Complexity, and Learning

It is argued that the divergent part of Ipred(T) provides the unique measure for the complexity of dynamics underlying a time series.

Bayesian field theory: nonparametric approaches to density estimation

  • J. C. Lemm
  • Computer Science
    Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium
  • 2000
Nonparametric approaches to density estimation are discussed from a Bayesian perspective and a numerical example shows that this can be computationally feasible for low-dimensional problems.

Can Gaussian Process Regression Be Made Robust Against Model Mismatch?

  • Peter Sollich
  • Computer Science
    Deterministic and Statistical Methods in Machine Learning
  • 2004
In lower-dimensional learning scenarios, the theory predicts—in excellent qualitative and good quantitative accord with simulations—that evidence maximization eliminates logarithmically slow learning and recovers the optimal scaling of the decrease of generalization error with training set size.

How is sensory information processed?

This work analyzes how abstract Bayesian learners would perform on different data and discusses possible experiments that can determine which learning–theoretic computation is performed by a particular organism.

Detecting joint tendencies of multiple time series

The moving average smoother decomposes time‐series data x(t) into a systematic part plus fluctuations, i.e., x(t) = x(t)+δx(t). In the language of Bayesian inference, smoothing can be understood as

References

SHOWING 1-5 OF 5 REFERENCES

Phys

  • Rev. Lett. 77, 4671
  • 1996

Phys

  • Rev. Lett. 75, 1415 (1995); J.W. Kim and H. Sompolinsky, Phys. Rev. Lett. 76, 3021 (1996); M. Biehl, P. Riegler and M. Stechert, Phys. Rev. E 52, 4624
  • 1995

Phys

  • Rev. A 45, 6056 (1992); S. Amari and N. Murata, Neural Comput. 5, 140 (1993); M. Opper and D. Haussler, Phys. Rev. Lett. 75, 3772
  • 1995