Corpus ID: 595637

A widely applicable Bayesian information criterion

@article{Watanabe2013AWA,
  title={A widely applicable Bayesian information criterion},
  author={Sumio Watanabe},
  journal={J. Mach. Learn. Res.},
  year={2013},
  volume={14},
  pages={867-897}
}
  • Sumio Watanabe
  • Published 2013
  • Computer Science, Mathematics
  • J. Mach. Learn. Res.
A statistical model or a learning machine is called regular if the map taking a parameter to a probability distribution is one-to-one and if its Fisher information matrix is always positive definite. If otherwise, it is called singular. In regular statistical models, the Bayes free energy, which is defined by the minus logarithm of Bayes marginal likelihood, can be asymptotically approximated by the Schwarz Bayes information criterion (BIC), whereas in singular models such approximation does… Expand
A case study of the widely applicable Bayesian information criterion and its optimality
  • T. Mononen
  • Mathematics, Computer Science
  • Stat. Comput.
  • 2015
TLDR
This work applies a new approximation to the analytically solvable Gaussian process regression case to show that the optimal temperature may depend also on data itself or other variables, such as the noise level, and shows that the steepness of a thermodynamic curve at the optimalTemperature indicates the magnitude of the error that WBIC makes. Expand
Information criteria and cross validation for Bayesian inference in regular and singular cases
  • Sumio Watanabe
  • Computer Science
  • 2021
TLDR
In this paper, in order to establish a mathematical foundation for developing a measure of a statistical model and a prior, the relation among the generalization loss, the information criteria, and the cross-validation loss is shown, then the equivalence and the difference are shown. Expand
Evidence bounds in singular models: probabilistic and variational perspectives
The marginal likelihood or evidence in Bayesian statistics contains an intrinsic penalty for larger model sizes and is a fundamental quantity in Bayesian model comparison. Over the past two decades,Expand
Learning a Flexible K-Dependence Bayesian Classifier from the Chain Rule of Joint Probability Distribution
TLDR
By establishing the mapping relationship between conditional probability distribution and mutual information, a new scoring function, Sum_MI, is derived and applied to evaluate the rationality of the Bayesian classifiers. Expand
Investigation of the widely applicable Bayesian information criterion
TLDR
Generally WBIC performs adequately when one uses informative priors, but it can systematically overestimate the evidence, particularly for small sample sizes. Expand
A Bayesian information criterion for singular models
Summary We consider approximate Bayesian model choice for model selection problems that involve models whose Fisher information matrices may fail to be invertible along other competing submodels.Expand
Empirical Survival Jensen-Shannon Divergence as a Goodness-of-Fit Measure for Maximum Likelihood Estimation and Curve Fitting
The coefficient of determination, known as $R^2$, is commonly used as a goodness-of-fit criterion for fitting linear models. $R^2$ is somewhat controversial when fitting nonlinear models, although itExpand
Bayesian noise model selection and system identification based on approximation of the evidence
The purpose of this work is to identify the parameters of a second order system from noisy data in a context where the difficulty is twofold. First, the model is strongly non linear and possibly nonExpand
Jensen-Shannon Divergence as a Goodness-of-Fit Measure for Maximum Likelihood Estimation and Curve Fitting
The coefficient of determination, known as $R^2$, is commonly used as a goodness-of-fit criterion for fitting linear models. $R^2$ is somewhat controversial when fitting nonlinear models, although itExpand
Bayesian information criterion approximations to Bayes factors for univariate and multivariate logistic regression models.
TLDR
An application in prostate cancer is presented, the motivating setting for the work, which illustrates the approximation for large data sets in a practical example and accuracies of the approximations for small samples sizes as well as comparisons to conclusions from frequentist testing are presented. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 38 REFERENCES
Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory
TLDR
The Bayes cross-validation loss is asymptotically equivalent to the widely applicable information criterion as a random variable and model selection and hyperparameter optimization using these two values are asymPTOTically equivalent. Expand
Singularities in mixture models and upper bounds of stochastic complexity
TLDR
The result of this paper shows that the mixture model can attain the more precise prediction than regular statistical models if Bayesian estimation is applied in statistical inference. Expand
Algebraic Analysis for Singular Statistical Estimation
This paper clarifies learning efficiency of a non-regular parametric model such as a neural network whose true parameter set is an analytic variety with singular points. By using Sato's b-function weExpand
Algebraic methods for evaluating integrals In Bayesian statistics
The accurate evaluation of marginal likelihood integrals is a difficult fundamental problem in Bayesian inference that has important applications in machine learning and computational biology.Expand
Algebraic geometrical methods for hierarchical learning machines
TLDR
An algorithm is established to calculate the Bayesian stochastic complexity based on blowing-up technology in algebraic geometry and it is proved that theBayesian generalization error of a hierarchical learning machine is smaller than that of a regular statistical model, even if the true distribution is not contained in the parametric model. Expand
Learning Coefficient of Generalization Error in Bayesian Estimation and Vandermonde Matrix-Type Singularity
TLDR
This letter gives tight new bound values of learning coefficients for Vandermonde matrix-type singularities and the explicit values with certain conditions, which can show the learning coefficients of three-layered neural networks and normal mixture models. Expand
An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models
TLDR
The BIC for the binary graphical tree models where all the inner nodes of a tree represent binary hidden variables is derived, an extension of a similar formula given by Rusakov and Geiger for naive Bayes models. Expand
Asymptotic Learning Curve and Renormalizable Condition in Statistical Learning Theory
TLDR
This paper defines a renormalizable condition of the statistical estimation problem, and shows that, under such a condition, the asymptotic learning curves are ensured to be subject to the universal law, even if the true distribution is unrealizable and singular for a statistical model. Expand
Algebraic Analysis for Nonidentifiable Learning Machines
TLDR
It is rigorously proved that the Bayesian stochastic complexity or the free energy is asymptotically equal to 1 logn (m1 1) loglogn + constant, where n is the number of training samples and 1 and m1 are the rational number and the natural number, which are determined as the birational invariant values of the singularities in the parameter space. Expand
Asymptotic model selection and identifiability of directed tree models with hidden variables
The standard Bayesian Information Criterion (BIC) is derived under some regularity conditions which are not always satisfied by the graphical models with hidden variables. In this paper we deriveExpand
...
1
2
3
4
...