Using Stacking to Average Bayesian Predictive Distributions (with Discussion)

@article{Yao2018UsingST,
  title={Using Stacking to Average Bayesian Predictive Distributions (with Discussion)},
  author={Yuling Yao and Aki Vehtari and Daniel P. Simpson and Andrew Gelman},
  journal={Bayesian Analysis},
  year={2018}
}
The widely recommended procedure of Bayesian model averaging is flawed in the M-open setting in which the true data-generating process is not one of the candidate models being fit. We take the idea of stacking from the point estimation literature and generalize to the combination of predictive distributions, extending the utility function to any proper scoring rule, using Pareto smoothed importance sampling to efficiently compute the required leave-one-out posterior distributions and… 
Stacking for Non-mixing Bayesian Computations: The Curse and Blessing of Multimodal Posteriors
TLDR
Using parallel runs of MCMC, variational, or mode-based inference to hit as many modes or separated regions as possible, and then combining these using importance sampling based Bayesian stacking, a scalable method for constructing a weighted average of distributions so as to maximize cross-validated prediction utility is proposed.
Learning to Average Predictively Over Good and Bad: Comment on: Using Stacking to Average Bayesian Predictive Distributions
We suggest to extend the stacking procedure for a combination of predictive densities, proposed by Yao et al in the journal Bayesian Analysis to a setting where dynamic learning occurs about features
Bayesian Model Weighting: The Many Faces of Model Averaging
TLDR
It is shown that only Bayesian Stacking has the goal of model averaging for improved predictions by model combination, and the other approaches pursue the quest of finding a single best model as the ultimate goal, and use model averaging only as a preliminary stage to prevent rash model choice.
Bayesian Inference for the Weights in Logarithmic Pooling
TLDR
It is shown that it is possible learn the weights from data, although identifiability issues may arise for some configurations of priors and data, and how the hierarchical approach leads to posterior distributions that are able to accommodate prior-data conflict in complex models.
Model-averaged confidence distributions
Model averaging is commonly used to allow for model uncertainty in parameter estimation. As well as providing a point estimate that is a natural compromise between the estimates from different
Bayesian Hierarchical Stacking: Some Models Are (Somewhere) Useful
TLDR
It is shown that stacking is mostective when model predictive performance is heterogeneous in inputs, and the stacked mixture with a hierarchical model is improved, and generalize stacking to Bayesian hierarchical stacking.
Practical Semiparametric Inference With Bayesian Nonparametric Ensembles
TLDR
This thesis proposes Bayesian Nonparametric Ensemble (BNE), a general modeling approach that combines the a priori information encoded in candidate models using ensemble methods, and then addresses the systematic bias in the candidate models use Bayesian nonparametric machinery.
On the Quantification of Model Uncertainty: A Bayesian Perspective
TLDR
A detailed and up-to-date review of BMA with a focus on its foundations in Bayesian decision theory and Bayesian predictive modeling and important assumptions regarding BMA are considered.
Bayesian Aggregation
  • Yuling Yao
  • Computer Science
    Wiley StatsRef: Statistics Reference Online
  • 2021
TLDR
Among two widely used methods, Bayesian model averaging (BMA) and Bayesian stacking, this article compares their predictive performance, and reviews their theoretical optimality, probabilistic interpretation, practical implementation, and extensions in complex models.
Bayesian hierarchical stacking
TLDR
This work shows that stacking is most effective when the model predictive performance is heterogeneous in inputs, so that it can further improve the stacked mixture by a full-Bayesian hierarchical modeling.
...
...

References

SHOWING 1-10 OF 161 REFERENCES
A Bayes interpretation of stacking for M-complete and M-open settings
TLDR
It is shown that the stacking weights also asymptotically minimize a posterior expected loss, and formally provides a Bayesian justification for cross-validation.
Using Bayesian Model Averaging to Calibrate Forecast Ensembles
Ensembles used for probabilistic weather forecasting often exhibit a spread-error correlation, but they tend to be underdispersive. This paper proposes a statistical method for postprocessing
Comparing Bayes Model Averaging and Stacking When Model Approximation Error Cannot be Ignored
  • B. Clarke
  • Computer Science
    J. Mach. Learn. Res.
  • 2003
TLDR
Bayes Model Averaging is compared to a non-Bayes form of model averaging called stacking and the results suggest the stacking has better robustness properties than BMA in the most important settings.
Bayesian Model Assessment and Comparison Using Cross-Validation Predictive Densities
TLDR
This work proposes an approach using cross-validation predictive densities to obtain expected utility estimates and Bayesian bootstrap to obtain samples from their distributions, and discusses the probabilistic assumptions made and properties of two practical cross- validate methods, importance sampling and k-fold cross- validation.
Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC
TLDR
An efficient computation of LOO is introduced using Pareto-smoothed importance sampling (PSIS), a new procedure for regularizing importance weights, and it is demonstrated that PSIS-LOO is more robust in the finite case with weak priors or influential observations.
Learning to Average Predictively Over Good and Bad: Comment on: Using Stacking to Average Bayesian Predictive Distributions
We suggest to extend the stacking procedure for a combination of predictive densities, proposed by Yao et al in the journal Bayesian Analysis to a setting where dynamic learning occurs about features
Comparing and Weighting Imperfect Models Using D-Probabilities
  • M. Li, D. Dunson
  • Computer Science
    Journal of the American Statistical Association
  • 2020
Abstract We propose a new approach for assigning weights to models using a divergence-based method (D-probabilities), relying on evaluating parametric models relative to a nonparametric Bayesian
Comparison of Bayesian predictive methods for model selection
TLDR
The study demonstrates that the model selection can greatly benefit from using cross-validation outside the searching process both for guiding the model size selection and assessing the predictive performance of the finally selected model.
Bayesian Model Averaging for Linear Regression Models
Abstract We consider the problem of accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the
...
...