# Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory

@article{Watanabe2010AsymptoticEO, title={Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory}, author={Sumio Watanabe}, journal={ArXiv}, year={2010}, volume={abs/1004.2316} }

In regular statistical models, the leave-one-out cross-validation is asymptotically equivalent to the Akaike information criterion. However, since many learning machines are singular statistical models, the asymptotic behavior of the cross-validation remains unknown. In previous studies, we established the singular learning theory and proposed a widely applicable information criterion, the expectation value of which is asymptotically equal to the average Bayes generalization loss. In the…

## 1,723 Citations

### A widely applicable Bayesian information criterion

- Mathematics, Computer ScienceJ. Mach. Learn. Res.
- 2013

A widely applicable Bayesian information criterion (WBIC) is defined by the average log likelihood function over the posterior distribution with the inverse temperature 1/log n, where n is the number of training samples and it is mathematically proved that WBIC has the same asymptotic expansion as the Bayes free energy, even if a statistical model is singular for or unrealizable by a statistical models.

### Bayesian Cross Validation and WAIC for Predictive Prior Design in Regular Asymptotic Theory

- Computer ScienceArXiv
- 2015

A new formula is derived by which the theoretical relation among CV, WAIC, and the generalization loss is clarified and the optimal hyperparameter can be directly found and the variances of the optimized hyperparameters by CV and WAIC are made smaller with small computational costs.

### Higher Order Equivalence of Bayes Cross Validation and WAIC

- Mathematics, Computer Science
- 2016

The Bayes cross validation and WAIC are equivalent to each other according to the second order asymptotics, if the posterior distribution can be approximated by some normal distribution.

### Bayesian Leave-One-Out Cross-Validation Approximations for Gaussian Latent Variable Models

- Computer ScienceJ. Mach. Learn. Res.
- 2016

This article considers Gaussian latent variable models where the integration over the latent values is approximated using the Laplace method or expectation propagation and finds the approach based upon a Gaussian approximation to the LOO marginal distribution gives the most accurate and reliable results among the fast methods.

### Asymptotic Analysis of the Bayesian Likelihood Ratio for Testing Homogeneity in Normal Mixture Models.

- Mathematics
- 2018

When we use the normal mixture model, the optimal number of the components describing the data should be determined. Testing homogeneity is good for this purpose; however, to construct its theory is…

### On the marginal likelihood and cross-validation

- BiologyBiometrika
- 2020

It is shown that the marginal likelihood is formally equivalent to exhaustive leave-p-out crossvalidation averaged over all values of $p$ and all held-out test sets when using the log posterior predictive probability as the scoring rule.

### Information criterion for variational Bayes learning in regular and singular cases

- Computer Science, MathematicsThe 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems
- 2012

This paper proposes a new information criterion for variational Bayes learning, which is the unbiased estimator of the generalization loss for both cases when the posterior distribution is regular and singular.

### Prior Intensified Information Criterion

- Computer Science
- 2021

The prior intensiﬁed information criterion (PIIC) is proposed, which customizes this criterion to incorporate sparse estimation and causal inference and clearly outperforms the WAIC in terms of prediction performance when the above concerns are manifested.

### A Practical Method Based on Bayes Boundary-Ness for Optimal Classifier Parameter Status Selection

- Computer ScienceJ. Signal Process. Syst.
- 2020

A novel practical method for finding the optimal classifier parameter status corresponding to the Bayes error through the evaluation of estimated class boundaries from the perspective of Bayes boundary-ness with an entropy-based uncertainty measure is proposed.

## References

SHOWING 1-10 OF 64 REFERENCES

### Almost All Learning Machines are Singular

- Computer Science, Mathematics2007 IEEE Symposium on Foundations of Computational Intelligence
- 2007

It is proposed that, by using resolution of singularities, the likelihood function can be represented as the standard form, by which it can be proved the asymptotic behavior of the generalization errors of the maximum likelihood method and the Bayes estimation.

### Bayesian Model Assessment and Comparison Using Cross-Validation Predictive Densities

- Computer ScienceNeural Computation
- 2002

This work proposes an approach using cross-validation predictive densities to obtain expected utility estimates and Bayesian bootstrap to obtain samples from their distributions, and discusses the probabilistic assumptions made and properties of two practical cross- validate methods, importance sampling and k-fold cross- validation.

### Asymptotic Learning Curve and Renormalizable Condition in Statistical Learning Theory

- MathematicsArXiv
- 2010

This paper defines a renormalizable condition of the statistical estimation problem, and shows that, under such a condition, the asymptotic learning curves are ensured to be subject to the universal law, even if the true distribution is unrealizable and singular for a statistical model.

### A Limit Theorem in Singular Regression Problem

- MathematicsArXiv
- 2009

A limit theorem is proved which shows the relation between the singular regression problem and two birational invariants, a real log canonical threshold and a singular fluctuation and enables us to estimate the generalization error from the training error without any knowledge of the true probability distribution.

### Singularities in mixture models and upper bounds of stochastic complexity

- Computer Science, MathematicsNeural Networks
- 2003

### Algebraic Analysis for Nonidentifiable Learning Machines

- Computer Science, MathematicsNeural Computation
- 2001

It is rigorously proved that the Bayesian stochastic complexity or the free energy is asymptotically equal to 1 logn (m1 1) loglogn + constant, where n is the number of training samples and 1 and m1 are the rational number and the natural number, which are determined as the birational invariant values of the singularities in the parameter space.

### Stochastic Complexity and Generalization Error of a Restricted Boltzmann Machine in Bayesian Estimation

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2010

This paper uses a new eigenvalue analysis method and a recursive blowing up process to obtain the maximum pole of zeta functions and shows that these methods are effective for obtaining the asymptotic form of the generalization error of hierarchical learning models.

### An asymptotic approximation of the marginal likelihood for general Markov models

- Computer Science, Mathematics
- 2010

The BIC score for Bayesian networks in the case of binary data and when the underlying graph is a rooted tree and all the inner nodes represent hidden variables is derived.

### Algebraic geometrical methods for hierarchical learning machines

- Computer ScienceNeural Networks
- 2001