# A simple example of Dirichlet process mixture inconsistency for the number of components

@inproceedings{Miller2013ASE, title={A simple example of Dirichlet process mixture inconsistency for the number of components}, author={Jeffrey W. Miller and M. Harrison}, booktitle={NIPS}, year={2013} }

For data assumed to come from a finite mixture with an unknown number of components, it has become common to use Dirichlet process mixtures (DPMs) not only for density estimation, but also for inferences about the number of components. The typical approach is to use the posterior distribution on the number of clusters — that is, the posterior on the number of components represented in the observed data. However, it turns out that this posterior is not consistent — it does not concentrate at the… Expand

#### Figures and Topics from this paper

#### 109 Citations

Inconsistency of Pitman-Yor process mixtures for the number of components

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2014

It is shown that the posterior on data from a finite mixture does not concentrate at the true number of components, and this result applies to a large class of nonparametric mixtures, including DPMs and PYMs, over a wide variety of families of component distributions. Expand

Finite mixture models are typically inconsistent for the number of components

- Mathematics
- 2020

Scientists and engineers are often interested in learning the number of subpopulations (or components) present in a data set. Practitioners commonly use a Dirichlet process mixture model (DPMM) for… Expand

Mixture Models With a Prior on the Number of Components

- Mathematics, Medicine
- Journal of the American Statistical Association
- 2018

It turns out that many of the essential properties of DPMs are also exhibited by MFMs, and the MFM analogues are simple enough that they can be used much like the corresponding DPM properties; this simplifies the implementation of MFMs and can substantially improve mixing. Expand

Posterior Distribution for the Number of Clusters in Dirichlet Process Mixture Models

- Mathematics, Computer Science
- ArXiv
- 2019

This work provides a rigorous study for the posterior distribution of the number of clusters in DPMM under different prior distributions on the parameters and constraints on the distributions of the data. Expand

Inference for the Number of Topics in the Latent Dirichlet Allocation Model via Bayesian Mixture Modeling

- Computer Science
- 2019

A variant of the Metropolis–Hastings algorithm is presented that can be used to estimate the posterior distribution of the number of topics and it is evaluated on synthetic data and with procedures that are currently used in the machine learning literature. Expand

Infinite Gaussian Mixture Modeling with an Improved Estimation of the Number of Clusters

- Computer Science
- AAAI
- 2021

This paper shows that IGMM provides an inconsistent estimation of the number of clusters and proposes a modified training procedure that uses the inverse χ for this purpose and demonstrates good results when compared to other methods used to evaluate model order, using realworld databases. Expand

From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering

- Mathematics, Medicine
- Adv. Data Anal. Classif.
- 2019

The concept of sparse finite mixture is illustrated that the concept is very generic and easily extended to cluster various types of non-Gaussian data, in particular discrete data and continuous multivariate data arising from non- Gaussian clusters. Expand

Infinite mixtures of multivariate normal-inverse Gaussian distributions for clustering of skewed data

- Mathematics
- 2020

Mixtures of multivariate normal inverse Gaussian (MNIG) distributions can be used to cluster data that exhibit features such as skewness and heavy tails. However, for cluster analysis, using a… Expand

Dirichlet process mixtures under affine transformations of the data

- Computer Science, Mathematics
- Comput. Stat.
- 2021

This work devise a coherent prior specification of the model which makes posterior inference invariant with respect to affine transformations of the data and shows that mild assumptions on the true data generating process are sufficient to ensure that DPM-G models feature such a property. Expand

Dynamic mixtures of finite mixtures and telescoping sampling

- Computer Science
- 2020

A novel sampling scheme is proposed for MFMs called the telescoping sampler which allows Bayesian inference for mixtures with arbitrary component distributions and the ease of its application using different component distributions is demonstrated on real data sets. Expand

#### References

SHOWING 1-10 OF 32 REFERENCES

Inconsistency of Pitman-Yor process mixtures for the number of components

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2014

It is shown that the posterior on data from a finite mixture does not concentrate at the true number of components, and this result applies to a large class of nonparametric mixtures, including DPMs and PYMs, over a wide variety of families of component distributions. Expand

The L1-consistency of Dirichlet mixtures in multivariate Bayesian density estimation

- Computer Science, Mathematics
- J. Multivar. Anal.
- 2010

The L"1-consistency of Dirichlet mixutures in the multivariate density estimation setting is extended and the Kullback-Leibler property of the prior holds and the size of the sieve in the parameter space in terms of L" 1-metric entropy is not larger than the order of n. Expand

Bayesian finite mixtures with an unknown number of components: The allocation sampler

- Mathematics, Computer Science
- Stat. Comput.
- 2007

A new Markov chain Monte Carlo method for the Bayesian analysis of finite mixture distributions with an unknown number of components is presented and can be used for mixtures of components from any parametric family, under the assumption that the component parameters can be integrated out of the model analytically. Expand

A NONPARAMETRIC BAYESIAN APPROACH TO DETECT THE NUMBER OF REGIMES IN MARKOV SWITCHING MODELS

- Mathematics
- 2002

The literature on Markov switching models is increasing and producing interesting results both at theoretical and applied levels. Most often the number of regimes, i.e., of data generating processes,… Expand

Bayesian analysis of finite mixture distributions using the allocation sampler

- Mathematics
- 2007

Finite mixture distributions are receiving more and more attention from statisticians in many different fields of research because they are a very flexible class of
models. They are typically used… Expand

Particle filters for mixture models with an unknown number of components

- Mathematics, Computer Science
- Stat. Comput.
- 2004

The performance of this particle filter, when analyzing both simulated and real data from a Gaussian mixture model, is uniformly better than the particle filter algorithm of Chen and Liu, and in many situations it outperforms a Gibbs Sampler. Expand

How many clusters

- Mathematics
- 2008

The title poses a deceptively simple question that must be addressed by any statistical model or computational algorithm for the clustering of points. Two distinct interpretations are possible, one… Expand

Computing Nonparametric Hierarchical Models

- Computer Science
- 1998

The ease with which the strict parametric assumptions common to most standard Bayesian hierarchical models can be relaxed to incorporate uncertainties about functional forms using Dirichlet process components is illustrated, partly enabled by the approach to computation using MCMC methods. Expand

Modelling Heterogeneity With and Without the Dirichlet Process

- Mathematics
- 2001

We investigate the relationships between Dirichlet process (DP) based models and allocation models for a variable number of components, based on exchangeable distributions. It is shown that the DP… Expand

Dirichlet Process

- Computer Science
- Encyclopedia of Machine Learning
- 2010

The Dirichlet process is a stochastic proces used in Bayesian nonparametric models of data, particularly in Dirichlet process mixture models (also known as infinite mixture models). It is a… Expand