Model Criticism in Latent Space

  title={Model Criticism in Latent Space},
  author={Sohan Seth and Iain Murray and Christopher K. I. Williams},
  journal={Bayesian Analysis},
Model criticism is usually carried out by assessing if replicated data generated under the fitted model looks similar to the observed data, see e.g. Gelman, Carlin, Stern, and Rubin [2004, p. 165]. This paper presents a method for latent variable models by pulling back the data into the space of latent variables, and carrying out model criticism in that space. Making use of a model's structure enables a more direct assessment of the assumptions made in the prior and likelihood. We demonstrate… 

Figures and Tables from this paper

Is My Model Flexible Enough? Information-Theoretic Model Check

A method to evaluate the specified model class by assessing its capability of reproducing data that is similar to the observed data record is developed, based on the information-theoretic properties of models viewed as data generators and is applicable to e.g. sequential data and nonlinear dynamical models.

Model Criticism for Long-Form Text Generation

This work proposes to apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of the generated text and finds that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference.

Using Prior Expansions for Prior-Data Conflict Checking

This work considers checking for prior-data conflict in Bayesian models by expanding the prior used for the analysis into a larger family of priors, and considering a marginal likelihood score statistic for the expansion parameter.

Interpretable Stein Goodness-of-fit Tests on Riemannian Manifold

This study develops goodness-of-fit testing and interpretable model criticism methods for general distributions on Riemannian manifolds, including those with an intractable normalization constant, based on extensions of kernel Stein discrepancy.

Customizing Sequence Generation with Multi-Task Dynamical Systems

It is shown that hierarchical multi-task dynamical systems (MTDSs) provide direct user control over sequence generation, via use of a latent code $\mathbf{z}$ that specifies the customization to the individual data sequence that enables style transfer, interpolation and morphing within generated sequences.



Generalizing the probability matrix decomposition model: An example of Bayesian model checking and model expansion

A mixture prior density with two beta distributed components is used to expand the model in a meaningful way and it is concluded that a relatively at prior distribution is inappropriate.

Statistical Model Criticism using Kernel Two Sample Tests

An exploratory approach to statistical model criticism using maximum mean discrepancy (MMD) two sample tests is proposed and it is demonstrated on synthetic data that the selected statistic can be used to identify where a statistical model most misrepresents the data it was trained on.

Sampling and Bayes' inference in scientific modelling and robustness

Predictive checking functions for transformation, serial correlation, bad values, and their relation with Bayesian options are considered, and robustness is seen from a Bayesian viewpoint and examples are given.

P Values for Composite Null Models

This paper proposes two alternatives for computing a p value, the conditional predictive p value and the partial posterior predictive pvalue, and indicates their advantages from both Bayesian and frequentist perspectives.

The analysis of repeated-measures data on schizophrenic reaction times using mixture models.

Four mixture models are fit within a Bayesian model monitoring using posterior predictive checks framework, where the distinctions between models arise from assumptions about the variance of the shifted observations and the exchangeability of schizophrenic individuals.


This paper considers Bayesian counterparts of the classical tests for good- ness of fit and their use in judging the fit of a single Bayesian model to the observed data. We focus on posterior

Bayesian data analysis.

  • J. Kruschke
  • Political Science
    Wiley interdisciplinary reviews. Cognitive science
  • 2010
A fatal flaw of NHST is reviewed and some benefits of Bayesian data analysis are introduced and illustrative examples of multiple comparisons in Bayesian analysis of variance and Bayesian approaches to statistical power are presented.

Model criticism based on likelihood-free inference, with an application to protein network evolution

This work provides a statistical interpretation to current developments in likelihood-free Bayesian inference that explicitly accounts for discrepancies between the model and the data, termed Approximate Bayesian Computation under model uncertainty (ABCμ).

Goodness‐of‐Fit Diagnostics for Bayesian Hierarchical Models

The proposed methodology is based on comparing values of pivotal discrepancy measures, computed using parameter values drawn from the posterior distribution, to known reference distributions, and suggests that diagnostics based on PDMs have higher statistical power than comparable posterior-predictive diagnostic checks in detecting model departures.

A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity

This paper presents a parameter covariance matrix estimator which is consistent even when the disturbances of a linear regression model are heteroskedastic. This estimator does not depend on a formal