• Corpus ID: 219558839

A generalized Bayes framework for probabilistic clustering

  title={A generalized Bayes framework for probabilistic clustering},
  author={Tommaso Rigon and Amy H. Herring and David B. Dunson},
  journal={arXiv: Methodology},
Loss-based clustering methods, such as k-means and its variants, are standard tools for finding groups in data. However, the lack of quantification of uncertainty in the estimated clusters is a disadvantage. Model-based clustering based on mixture models provides an alternative, but such methods face computational problems and large sensitivity to the choice of kernel. This article proposes a generalized Bayes framework that bridges between these two paradigms through the use of Gibbs… 

Figures and Tables from this paper

Cohesion and Repulsion in Bayesian Distance Clustering

This work proposes a hybrid solution that entails defining a likelihood on pairwise distances between observations that allows for cluster identifiability and shows how this modelling strategy has interesting connection with existing proposals in the literature as well as a decision-theoretic interpretation.

Spectral Clustering, Bayesian Spanning Forest, and Forest Process

This work derives a simple Markov chain Monte Carlo algorithm for posterior estimation, and demonstrates superior performance compared to existing algorithms, and illustrates several model-based extensions useful for data applications, including high-dimensional and multi-view clustering for images.

Bayesian clustering using random effects models and predictive projections

This work considers a Bayesian clustering method that combines linear mixed models and predictive projections, inspired by methods for Bayesian model checking, which uses simulated data replicates from a fitted model to define similarity between observations in relevant ways for clustering.

Finite mixture models do not reliably learn the number of components

It is proved that under even the slightest model misspecification, the FMM component-count posterior diverges: the posterior probability of any particular finite number of latent components converges to 0 in the limit of infinite data.

netANOVA: novel graph clustering technique with significance assessment via hierarchical ANOVA

An unsupervised workflow to identify groups of graphs from reliable network-based statistics, which finds its inspiration in distance-wise ANOVA algorithms and is flexible since users can choose multiple options to adapt to specific contexts and network types.

Clusterization of Different Vulnerable Countries for Immigrants Due to Covid-19 Using Mean Probabilistic Likelihood Score and Unsupervised Mining Algorithms

The result has shown that the combined application of probabilistic LHS and unsupervised CA can be a reliable method to identify the vulnerability of different countries generally chosen by migrant people.

NED: Niche Detection in User Content Consumption Data

An end-to-end framework, NED, which operates in two steps: discovering co-clusters of user behaviors based on interaction densities, and explaining them using attributes of involved nodes, and shows experimental results on several public datasets, as well as a large-scale industrial dataset from Snapchat.

Bayesian functional registration of fMRI activation maps

Functional magnetic resonance imaging (fMRI) has provided invaluable insight into our understanding of human behavior. However, large interindividual differences in both brain anatomy and functional

Improved fMRI-based Pain Prediction using Bayesian Group-wise Functional Registration

In recent years, the field of neuroimaging has undergone a paradigm shift, mov-ing away from the traditional brain mapping approach towards the development of integrated, multivariate brain models



Revisiting k-means: New Algorithms via Bayesian Nonparametrics

This paper shows that a Gibbs sampling algorithm for the Dirichlet process mixture approaches a hard clustering algorithm in the limit, and further that the resulting algorithm monotonically minimizes an elegant underlying k-means-like clustering objective that includes a penalty for the number of clusters.

Bayesian Model-Based Clustering Procedures

This article establishes a general formulation for Bayesian model-based clustering, in which subset labels are exchangeable, and items are also Exchangeable, possibly up to covariate effects, and a new heuristic item-swapping algorithm is introduced.

Model-Based Clustering, Discriminant Analysis, and Density Estimation

This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

A novel objective function that goes beyond clustering to learn (and penalize new) groupings for which the mutual exclusivity and exhaustivity assumptions of clustering are relaxed, and several other algorithms are demonstrated, all of which are scalable and simple to implement.

Bayesian nonparametric clustering for large data sets

Two nonparametric Bayesian methods to cluster big data and apply them to cluster genes by patterns of gene–gene interaction are proposed and compare favorably with other clustering algorithms, including k-mean, DP-means, DBSCAN, SUGS, streaming variational Bayes and an EM algorithm.

Bayesian Cluster Analysis: Point Estimation and Credible Balls (with Discussion)

This paper applies Bayesian techniques to develop appropriate point estimates and credible sets to summarize the posterior of the clustering structure based on decision and information theoretic techniques.

Improved criteria for clustering based on the posterior similarity matrix

New criteria for estimating a clustering, which are based on the posterior expected adjusted Rand index, are proposed and are shown to possess a shrinkage property and outperform Binder's loss in a simulation study and in an application to gene expression data.

Bayesian Distance Clustering

This work proposes a class of Bayesian distance clustering methods, which rely on modeling the likelihood of the pairwise distances in place of the original data, and illustrates dramatic gains in the ability to infer clusters that are not well represented by the usual choices of kernel.

A general framework for updating belief distributions

It is argued that a valid update of a prior belief distribution to a posterior can be made for parameters which are connected to observations through a loss function rather than the traditional likelihood function, which is recovered as a special case.

Bayesian clustering and product partition models

A decision theoretic formulation of product partition models (PPMs) is presented that allows a formal treatment of different decision problems such as estimation or hypothesis testing and clustering methods simultaneously, and an algorithm is proposed that yields Bayes estimates of the quantities of interest and the groups of experimental units.