Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory

  title={Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory},
  author={Sumio Watanabe},
  • Sumio Watanabe
  • Published 1 March 2010
  • Computer Science, Mathematics
  • ArXiv
In regular statistical models, the leave-one-out cross-validation is asymptotically equivalent to the Akaike information criterion. However, since many learning machines are singular statistical models, the asymptotic behavior of the cross-validation remains unknown. In previous studies, we established the singular learning theory and proposed a widely applicable information criterion, the expectation value of which is asymptotically equal to the average Bayes generalization loss. In the… 

Tables from this paper

A widely applicable Bayesian information criterion

A widely applicable Bayesian information criterion (WBIC) is defined by the average log likelihood function over the posterior distribution with the inverse temperature 1/log n, where n is the number of training samples and it is mathematically proved that WBIC has the same asymptotic expansion as the Bayes free energy, even if a statistical model is singular for or unrealizable by a statistical models.

Bayesian Cross Validation and WAIC for Predictive Prior Design in Regular Asymptotic Theory

A new formula is derived by which the theoretical relation among CV, WAIC, and the generalization loss is clarified and the optimal hyperparameter can be directly found and the variances of the optimized hyperparameters by CV and WAIC are made smaller with small computational costs.

Higher Order Equivalence of Bayes Cross Validation and WAIC

The Bayes cross validation and WAIC are equivalent to each other according to the second order asymptotics, if the posterior distribution can be approximated by some normal distribution.

Bayesian Leave-One-Out Cross-Validation Approximations for Gaussian Latent Variable Models

This article considers Gaussian latent variable models where the integration over the latent values is approximated using the Laplace method or expectation propagation and finds the approach based upon a Gaussian approximation to the LOO marginal distribution gives the most accurate and reliable results among the fast methods.

Asymptotic Analysis of the Bayesian Likelihood Ratio for Testing Homogeneity in Normal Mixture Models.

When we use the normal mixture model, the optimal number of the components describing the data should be determined. Testing homogeneity is good for this purpose; however, to construct its theory is

On the marginal likelihood and cross-validation

It is shown that the marginal likelihood is formally equivalent to exhaustive leave-p-out crossvalidation averaged over all values of $p$ and all held-out test sets when using the log posterior predictive probability as the scoring rule.

Information criterion for variational Bayes learning in regular and singular cases

  • Koshi YamadaSumio Watanabe
  • Computer Science, Mathematics
    The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems
  • 2012
This paper proposes a new information criterion for variational Bayes learning, which is the unbiased estimator of the generalization loss for both cases when the posterior distribution is regular and singular.

Prior Intensified Information Criterion

The prior intensified information criterion (PIIC) is proposed, which customizes this criterion to incorporate sparse estimation and causal inference and clearly outperforms the WAIC in terms of prediction performance when the above concerns are manifested.

A Practical Method Based on Bayes Boundary-Ness for Optimal Classifier Parameter Status Selection

A novel practical method for finding the optimal classifier parameter status corresponding to the Bayes error through the evaluation of estimated class boundaries from the perspective of Bayes boundary-ness with an entropy-based uncertainty measure is proposed.



Almost All Learning Machines are Singular

  • Sumio Watanabe
  • Computer Science, Mathematics
    2007 IEEE Symposium on Foundations of Computational Intelligence
  • 2007
It is proposed that, by using resolution of singularities, the likelihood function can be represented as the standard form, by which it can be proved the asymptotic behavior of the generalization errors of the maximum likelihood method and the Bayes estimation.

Bayesian Model Assessment and Comparison Using Cross-Validation Predictive Densities

This work proposes an approach using cross-validation predictive densities to obtain expected utility estimates and Bayesian bootstrap to obtain samples from their distributions, and discusses the probabilistic assumptions made and properties of two practical cross- validate methods, importance sampling and k-fold cross- validation.

Asymptotic Learning Curve and Renormalizable Condition in Statistical Learning Theory

This paper defines a renormalizable condition of the statistical estimation problem, and shows that, under such a condition, the asymptotic learning curves are ensured to be subject to the universal law, even if the true distribution is unrealizable and singular for a statistical model.

A Limit Theorem in Singular Regression Problem

A limit theorem is proved which shows the relation between the singular regression problem and two birational invariants, a real log canonical threshold and a singular fluctuation and enables us to estimate the generalization error from the training error without any knowledge of the true probability distribution.

Singularities in mixture models and upper bounds of stochastic complexity

Algebraic Analysis for Nonidentifiable Learning Machines

It is rigorously proved that the Bayesian stochastic complexity or the free energy is asymptotically equal to 1 logn (m1 1) loglogn + constant, where n is the number of training samples and 1 and m1 are the rational number and the natural number, which are determined as the birational invariant values of the singularities in the parameter space.

Stochastic Complexity and Generalization Error of a Restricted Boltzmann Machine in Bayesian Estimation

  • Miki Aoyagi
  • Computer Science, Mathematics
    J. Mach. Learn. Res.
  • 2010
This paper uses a new eigenvalue analysis method and a recursive blowing up process to obtain the maximum pole of zeta functions and shows that these methods are effective for obtaining the asymptotic form of the generalization error of hierarchical learning models.

An asymptotic approximation of the marginal likelihood for general Markov models

The BIC score for Bayesian networks in the case of binary data and when the underlying graph is a rooted tree and all the inner nodes represent hidden variables is derived.