# A widely applicable Bayesian information criterion

@article{Watanabe2013AWA, title={A widely applicable Bayesian information criterion}, author={Sumio Watanabe}, journal={J. Mach. Learn. Res.}, year={2013}, volume={14}, pages={867-897} }

A statistical model or a learning machine is called regular if the map taking a parameter to a probability distribution is one-to-one and if its Fisher information matrix is always positive definite. If otherwise, it is called singular. In regular statistical models, the Bayes free energy, which is defined by the minus logarithm of Bayes marginal likelihood, can be asymptotically approximated by the Schwarz Bayes information criterion (BIC), whereas in singular models such approximation does… Expand

#### 386 Citations

A case study of the widely applicable Bayesian information criterion and its optimality

- Mathematics, Computer Science
- Stat. Comput.
- 2015

This work applies a new approximation to the analytically solvable Gaussian process regression case to show that the optimal temperature may depend also on data itself or other variables, such as the noise level, and shows that the steepness of a thermodynamic curve at the optimalTemperature indicates the magnitude of the error that WBIC makes. Expand

Information criteria and cross validation for Bayesian inference in regular and singular cases

- Computer Science
- 2021

In this paper, in order to establish a mathematical foundation for developing a measure of a statistical model and a prior, the relation among the generalization loss, the information criteria, and the cross-validation loss is shown, then the equivalence and the difference are shown. Expand

Evidence bounds in singular models: probabilistic and variational perspectives

- Mathematics
- 2020

The marginal likelihood or evidence in Bayesian statistics contains an intrinsic penalty for larger model sizes and is a fundamental quantity in Bayesian model comparison. Over the past two decades,… Expand

Learning a Flexible K-Dependence Bayesian Classifier from the Chain Rule of Joint Probability Distribution

- Mathematics, Computer Science
- Entropy
- 2015

By establishing the mapping relationship between conditional probability distribution and mutual information, a new scoring function, Sum_MI, is derived and applied to evaluate the rationality of the Bayesian classifiers. Expand

Investigation of the widely applicable Bayesian information criterion

- Mathematics, Computer Science
- Stat. Comput.
- 2017

Generally WBIC performs adequately when one uses informative priors, but it can systematically overestimate the evidence, particularly for small sample sizes. Expand

A Bayesian information criterion for singular models

- Mathematics
- 2013

Summary
We consider approximate Bayesian model choice for model selection problems that involve models whose Fisher information matrices may fail to be invertible along other competing submodels.… Expand

Empirical Survival Jensen-Shannon Divergence as a Goodness-of-Fit Measure for Maximum Likelihood Estimation and Curve Fitting

- Mathematics
- 2018

The coefficient of determination, known as $R^2$, is commonly used as a goodness-of-fit criterion for fitting linear models. $R^2$ is somewhat controversial when fitting nonlinear models, although it… Expand

Bayesian noise model selection and system identification based on approximation of the evidence

- Mathematics, Computer Science
- 2014 IEEE Workshop on Statistical Signal Processing (SSP)
- 2014

The purpose of this work is to identify the parameters of a second order system from noisy data in a context where the difficulty is twofold. First, the model is strongly non linear and possibly non… Expand

Jensen-Shannon Divergence as a Goodness-of-Fit Measure for Maximum Likelihood Estimation and Curve Fitting

- Mathematics, Physics
- 2018

The coefficient of determination, known as $R^2$, is commonly used as a goodness-of-fit criterion for fitting linear models. $R^2$ is somewhat controversial when fitting nonlinear models, although it… Expand

Bayesian information criterion approximations to Bayes factors for univariate and multivariate logistic regression models.

- Medicine, Mathematics
- The international journal of biostatistics
- 2020

An application in prostate cancer is presented, the motivating setting for the work, which illustrates the approximation for large data sets in a practical example and accuracies of the approximations for small samples sizes as well as comparisons to conclusions from frequentist testing are presented. Expand

#### References

SHOWING 1-10 OF 38 REFERENCES

Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2010

The Bayes cross-validation loss is asymptotically equivalent to the widely applicable information criterion as a random variable and model selection and hyperparameter optimization using these two values are asymPTOTically equivalent. Expand

Singularities in mixture models and upper bounds of stochastic complexity

- Mathematics, Computer Science
- Neural Networks
- 2003

The result of this paper shows that the mixture model can attain the more precise prediction than regular statistical models if Bayesian estimation is applied in statistical inference. Expand

Algebraic Analysis for Singular Statistical Estimation

- Mathematics, Computer Science
- ALT
- 1999

This paper clarifies learning efficiency of a non-regular parametric model such as a neural network whose true parameter set is an analytic variety with singular points. By using Sato's b-function we… Expand

Algebraic methods for evaluating integrals In Bayesian statistics

- Mathematics
- 2011

The accurate evaluation of marginal likelihood integrals is a difficult fundamental problem in Bayesian inference that has important applications in machine learning and computational biology.… Expand

Algebraic geometrical methods for hierarchical learning machines

- Mathematics, Computer Science
- Neural Networks
- 2001

An algorithm is established to calculate the Bayesian stochastic complexity based on blowing-up technology in algebraic geometry and it is proved that theBayesian generalization error of a hierarchical learning machine is smaller than that of a regular statistical model, even if the true distribution is not contained in the parametric model. Expand

Learning Coefficient of Generalization Error in Bayesian Estimation and Vandermonde Matrix-Type Singularity

- Mathematics, Computer Science
- Neural Computation
- 2012

This letter gives tight new bound values of learning coefficients for Vandermonde matrix-type singularities and the explicit values with certain conditions, which can show the learning coefficients of three-layered neural networks and normal mixture models. Expand

An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models

- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2011

The BIC for the binary graphical tree models where all the inner nodes of a tree represent binary hidden variables is derived, an extension of a similar formula given by Rusakov and Geiger for naive Bayes models. Expand

Asymptotic Learning Curve and Renormalizable Condition in Statistical Learning Theory

- Mathematics, Computer Science
- ArXiv
- 2010

This paper defines a renormalizable condition of the statistical estimation problem, and shows that, under such a condition, the asymptotic learning curves are ensured to be subject to the universal law, even if the true distribution is unrealizable and singular for a statistical model. Expand

Algebraic Analysis for Nonidentifiable Learning Machines

- Mathematics, Medicine
- Neural Computation
- 2001

It is rigorously proved that the Bayesian stochastic complexity or the free energy is asymptotically equal to 1 logn (m1 1) loglogn + constant, where n is the number of training samples and 1 and m1 are the rational number and the natural number, which are determined as the birational invariant values of the singularities in the parameter space. Expand

Asymptotic model selection and identifiability of directed tree models with hidden variables

- Mathematics
- 2010

The standard Bayesian Information Criterion (BIC) is derived under
some regularity conditions which are not always satisfied by the graphical
models with hidden variables. In this paper we derive… Expand