Bayesian Methods for Adaptive Models
@inproceedings{Bridle2011BayesianMF, title={Bayesian Methods for Adaptive Models}, author={John Scott Bridle and Peter C. Cheeseman and Sidney S. Fels and Stephen F. Gull and Andreas V. M. Herz and John J. Hopfield and Doug Kerns and Allen Knutsen and David Koerner and Michael S. Lewicki and Thomas J. Loredo and Stephen P. Luttrell and Ron Meir and Ken Miller and Marcus Mitchell and Radford M. Neal and Steven J. Nowlan and David Edward Robinson and Ken Rose and Sibusiso Sibisi and John Skilling and Haim Sompolinsky}, year={2011} }
The Bayesian framework for model comparison and regularisation is demonstrated by studying interpolation and classification problems modelled with both linear and non–linear models. This framework quantitatively embodies ‘Occam’s razor’. Over–complex and under– regularised models are automatically inferred to be less probable, even though their flexibility allows them to fit the data better. When applied to ‘neural networks’, the Bayesian framework makes possible (1) objective comparison of…
Figures and Tables from this paper
figure 1.1 figure 2.1 table 2.1 figure 2.2 figure 2.3 figure 2.4 figure 2.5 figure 2.6 figure 2.7 figure 2.8 figure 2.9 figure 3.1 figure 3.10 figure 3.11 figure 3.12 figure 3.2 figure 3.3 figure 3.4 figure 3.5 figure 3.6 figure 3.7 figure 3.8 figure 3.9 figure 4.1 figure 5.1 figure 5.2 figure 5.3 figure 5.4 figure 5.5 figure 5.6 figure 5.7
226 Citations
How Bayesian should Bayesian optimisation be?
- Computer ScienceGECCO Companion
- 2021
This work investigates whether a fully-Bayesian treatment of the Gaussian process hyperparameters in BO (FBBO) leads to improved optimisation performance, and recommends that FBBO using EI with an ARD kernel as the default choice for BO.
Bayesian non-parametrics and the probabilistic approach to modelling
- Computer SciencePhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
- 2013
This article provides an overview of probabilistic modelling and an accessible survey of some of the main tools in Bayesian non-parametrics for modelling unknown functions, density estimation, clustering, time-series modelling, and representing sparsity, hierarchies and covariance structure.
The Case for Bayesian Deep Learning
- Computer ScienceArXiv
- 2020
The key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule, which reflects the inductive biases of neural networks that help them generalize.
A Parsimonious Tour of Bayesian Model Uncertainty
- Computer Science
- 2019
This survey focuses on non-asymptotic out-of-sample performance of Bayesian model selection and averaging techniques, and describes recent extensions to wider classes of probabilistic frameworks including high-dimensional, unidentifiable, or likelihood-free models.
On Priors for Bayesian Neural Networks
- Computer Science
- 2018
This dissertation aims to help the reader navigate the landscape of neural network priors, surveys the existing work on priors for neural networks, isolating some key themes such as the move towards heavy-tailed priors and describes how to give Bayesian neural networks an adaptive width by placing stick-breaking priors on their latent representation.
Bayesian Model Selection, the Marginal Likelihood, and Generalization
- Computer ScienceICML
- 2022
It is shown how marginal likelihood can be negatively correlated with generalization, with implications for neural architecture search, and can lead to both underfitting and overfitting in hyperparameter learning.
Bayesian Deep Learning and a Probabilistic Perspective of Generalization
- Computer ScienceNeurIPS
- 2020
It is shown that deep ensembles provide an effective mechanism for approximate Bayesian marginalization, and a related approach is proposed that further improves the predictive distribution by marginalizing within basins of attraction, without significant overhead.
Parsimonious Inference
- Computer ScienceArXiv
- 2021
The approaches combine efficient encodings with prudent sampling strategies to construct predictive ensembles without cross-validation, thus addressing a fundamental challenge in how to efficiently obtain predictions from data.
Model Selection for Bayesian Autoencoders
- Computer ScienceNeurIPS
- 2021
This work proposes to optimize the distributional sliced-Wasserstein distance (DSWD) between the output of the autoencoder and the empirical data distribution and obtains a powerful alternative to variational autoencoders, which are the preferred choice in modern applications of autoen coders for representation learning with uncertainty.
Bayesian Computation of the Intrinsic Structure of Factor Analytic Models
- Computer ScienceJournal of Data Science
- 2021
A Bayesian approach to factor analytic models is proposed, and adapts ideas from stochastic geometry and Bayesian finite mixture modelling to construct an ergodic Markov chain having the posterior distribution of the complete col- lection of parameters (including the number of factors) as its equilibrium distribution.
References
SHOWING 1-10 OF 114 REFERENCES
A Practical Bayesian Framework for Backprop Networks
- Computer Science
- 1991
A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks and a good correlation between generalisation ability and the Bayesian evidence is obtained.
A Bayesian comparison of different classes of dynamic models using empirical data
- Mathematics, Computer Science
- 1977
The optimum decision rule is asymptotically consistent and gives a quantitative explanation for the "principle of parsimony" often used in the construction of models from empirical data.
The Evidence Framework Applied to Classification Networks
- Computer ScienceNeural Computation
- 1992
It is demonstrated that the Bayesian framework for model comparison described for regression models in MacKay (1992a,b) can also be applied to classification problems and an information-based data selection criterion is derived and demonstrated within this framework.
Bayes Factors and Choice Criteria for Linear Models
- Mathematics
- 1980
SUMMARY Global and local Bayes factors are defined and their respective roles examined as choice criteria among alternative linear models. The global Bayes factor is seen to function, in appropriate…
Note on generalization, regularization and architecture selection in nonlinear learning systems
- Computer ScienceNeural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop
- 1991
The author proposes a new estimate of generalization performance for nonlinear learning systems called the generalized prediction error (GPE) which is based upon the notion of the effective number of…
Consistent inference of probabilities in layered networks: predictions and generalizations
- Computer ScienceInternational 1989 Joint Conference on Neural Networks
- 1989
The problem of learning a general input-output relation using a layered neural network is discussed in a statistical framework and the authors arrive at a Gibbs distribution on a canonical ensemble of networks with the same architecture.
Bayesian modeling of uncertainty in low-level vision
- Computer ScienceInternational Journal of Computer Vision
- 2004
The uncertainty modeling techniques that are developed, and the utility of these techniques in various applications, support the claim that Bayesian modeling is a powerful and practical framework for low-level vision.
The Role of Priors in Active Bayesian Learning in the Sequential Statistical Decision Framework
- Economics
- 1991
It is shown that in most cases where there is sufficient variability in the law of motion that the agents are trying to learn, in sequential models that are extensively used in the economic literature, the rational expectations hypothesis may indeed be justified on the basis of optimizing and optimally updating agents.
Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures
- Computer Science
- 1991
An unsupervised algorithm which is an alternative to the classical winner-take-all competitive algorithms and a supervised modular architecture in which a number of simple "expert" networks compete to solve distinct pieces of a large task are considered.