Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory

@article{Watanabe2010AsymptoticEO,
  title={Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory},
  author={Sumio Watanabe},
  journal={ArXiv},
  year={2010},
  volume={abs/1004.2316}
}
  • Sumio Watanabe
  • Published 1 March 2010
  • Computer Science, Mathematics
  • ArXiv
In regular statistical models, the leave-one-out cross-validation is asymptotically equivalent to the Akaike information criterion. However, since many learning machines are singular statistical models, the asymptotic behavior of the cross-validation remains unknown. In previous studies, we established the singular learning theory and proposed a widely applicable information criterion, the expectation value of which is asymptotically equal to the average Bayes generalization loss. In the… 

Tables from this paper

A widely applicable Bayesian information criterion
TLDR
A widely applicable Bayesian information criterion (WBIC) is defined by the average log likelihood function over the posterior distribution with the inverse temperature 1/log n, where n is the number of training samples and it is mathematically proved that WBIC has the same asymptotic expansion as the Bayes free energy, even if a statistical model is singular for or unrealizable by a statistical models.
Bayesian Cross Validation and WAIC for Predictive Prior Design in Regular Asymptotic Theory
TLDR
A new formula is derived by which the theoretical relation among CV, WAIC, and the generalization loss is clarified and the optimal hyperparameter can be directly found and the variances of the optimized hyperparameters by CV and WAIC are made smaller with small computational costs.
Higher Order Equivalence of Bayes Cross Validation and WAIC
TLDR
The Bayes cross validation and WAIC are equivalent to each other according to the second order asymptotics, if the posterior distribution can be approximated by some normal distribution.
Information criteria and cross validation for Bayesian inference in regular and singular cases
  • Sumio Watanabe
  • Computer Science
    Japanese Journal of Statistics and Data Science
  • 2021
TLDR
In this paper, in order to establish a mathematical foundation for developing a measure of a statistical model and a prior, the relation among the generalization loss, the information criteria, and the cross-validation loss is shown, then the equivalence and the difference are shown.
Bayesian Leave-One-Out Cross-Validation Approximations for Gaussian Latent Variable Models
TLDR
This article considers Gaussian latent variable models where the integration over the latent values is approximated using the Laplace method or expectation propagation and finds the approach based upon a Gaussian approximation to the LOO marginal distribution gives the most accurate and reliable results among the fast methods.
Asymptotic Analysis of the Bayesian Likelihood Ratio for Testing Homogeneity in Normal Mixture Models.
When we use the normal mixture model, the optimal number of the components describing the data should be determined. Testing homogeneity is good for this purpose; however, to construct its theory is
On the marginal likelihood and cross-validation
TLDR
It is shown that the marginal likelihood is formally equivalent to exhaustive leave-p-out crossvalidation averaged over all values of $p$ and all held-out test sets when using the log posterior predictive probability as the scoring rule.
Information criterion for variational Bayes learning in regular and singular cases
  • Koshi Yamada, Sumio Watanabe
  • Computer Science, Mathematics
    The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems
  • 2012
TLDR
This paper proposes a new information criterion for variational Bayes learning, which is the unbiased estimator of the generalization loss for both cases when the posterior distribution is regular and singular.
A Practical Method Based on Bayes Boundary-Ness for Optimal Classifier Parameter Status Selection
TLDR
A novel practical method for finding the optimal classifier parameter status corresponding to the Bayes error through the evaluation of estimated class boundaries from the perspective of Bayes boundary-ness with an entropy-based uncertainty measure is proposed.
Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison
TLDR
It is shown that it is possible that the problematic skewness of the error distribution, which occurs when the models make similar predictions, does not fade away when the data size grows to infinity in certain situations.
...
...

References

SHOWING 1-10 OF 70 REFERENCES
Almost All Learning Machines are Singular
  • Sumio Watanabe
  • Computer Science, Mathematics
    2007 IEEE Symposium on Foundations of Computational Intelligence
  • 2007
TLDR
It is proposed that, by using resolution of singularities, the likelihood function can be represented as the standard form, by which it can be proved the asymptotic behavior of the generalization errors of the maximum likelihood method and the Bayes estimation.
Asymptotic Learning Curve and Renormalizable Condition in Statistical Learning Theory
TLDR
This paper defines a renormalizable condition of the statistical estimation problem, and shows that, under such a condition, the asymptotic learning curves are ensured to be subject to the universal law, even if the true distribution is unrealizable and singular for a statistical model.
A Limit Theorem in Singular Regression Problem
TLDR
A limit theorem is proved which shows the relation between the singular regression problem and two birational invariants, a real log canonical threshold and a singular fluctuation and enables us to estimate the generalization error from the training error without any knowledge of the true probability distribution.
Singularities in mixture models and upper bounds of stochastic complexity
Algebraic Analysis for Nonidentifiable Learning Machines
TLDR
It is rigorously proved that the Bayesian stochastic complexity or the free energy is asymptotically equal to 1 logn (m1 1) loglogn + constant, where n is the number of training samples and 1 and m1 are the rational number and the natural number, which are determined as the birational invariant values of the singularities in the parameter space.
Stochastic Complexity and Generalization Error of a Restricted Boltzmann Machine in Bayesian Estimation
  • Miki Aoyagi
  • Computer Science, Mathematics
    J. Mach. Learn. Res.
  • 2010
TLDR
This paper uses a new eigenvalue analysis method and a recursive blowing up process to obtain the maximum pole of zeta functions and shows that these methods are effective for obtaining the asymptotic form of the generalization error of hierarchical learning models.
An asymptotic approximation of the marginal likelihood for general Markov models
TLDR
The BIC score for Bayesian networks in the case of binary data and when the underlying graph is a rooted tree and all the inner nodes represent hidden variables is derived.
On the Problem in Model Selection of Neural Network Regression in Overrealizable Scenario
TLDR
The article analyzes the expected training error and the expected generalization error of neural networks and radial basis functions in overrealizable cases and clarifies the difference from regular models, for which identifiability holds.
...
...