Corpus ID: 234338028

Search Algorithms and Loss Functions for Bayesian Clustering

@inproceedings{Dahl2021SearchAA,
  title={Search Algorithms and Loss Functions for Bayesian Clustering},
  author={David B. Dahl and Devin J. Johnson and Peter R. Mueller},
  year={2021}
}
We propose a randomized greedy search algorithm to find a point estimate for a random partition based on a loss function and posterior Monte Carlo samples. Given the large size and awkward discrete nature of the search space, the minimization of the posterior expected loss is challenging. Our approach is a stochastic search based on a series of greedy optimizations performed in a random order and is embarrassingly parallel. We consider several loss functions, including Binder loss and variation… Expand

Tables from this paper

Cluster Analysis via Random Partition Distributions
Hierarchical and k-medoids clustering are deterministic clustering algorithms based on pairwise distances. Using these same pairwise distances, we propose a novel stochastic clustering method basedExpand

References

SHOWING 1-10 OF 15 REFERENCES
Bayesian Model-Based Clustering Procedures
This article establishes a general formulation for Bayesian model-based clustering, in which subset labels are exchangeable, and items are also exchangeable, possibly up to covariate effects. TheExpand
Optimal Bayesian estimators for latent variable cluster models
TLDR
A Bayesian decision theoretical approach is adopted to define an optimality criterion for clusterings and a fast and context-independent greedy algorithm is proposed to find the best allocations, thereby solving the clustering and the model-choice problems at the same time. Expand
Improved criteria for clustering based on the posterior similarity matrix
In this paper we address the problem of obtaining a single clustering estimate bc based on an MCMC sample of clusterings c (1) ;c (2) :::;c (M) from the posterior distribution of a Bayesian clusterExpand
Bayesian Cluster Analysis: Point Estimation and Credible Balls
TLDR
This paper applies Bayesian techniques to develop appropriate point estimates and credible sets to summarize the posterior of the clustering structure based on decision and information theoretic techniques. Expand
Bayesian infinite mixture model based clustering of gene expression profiles
TLDR
A clustering procedure based on the Bayesian infinite mixture model and applied to clustering gene expression profiles that allows for incorporation of uncertainties involved in the model selection in the final assessment of confidence in similarities of expression profiles. Expand
Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance
TLDR
An organized study of information theoretic measures for clustering comparison, including several existing popular measures in the literature, as well as some newly proposed ones, and advocates the normalized information distance (NID) as a general measure of choice. Expand
Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model
This chapter describes a clustering procedure for microarray expression data based on a well-defined statistical model, specifically, a conjugate Dirichlet process mixture model. The clusteringExpand
Comparing clusterings---an information based distance
This paper proposes an information theoretic criterion for comparing two partitions, or clusterings, of the same data set. The criterion, called variation of information (VI), measures the amount ofExpand
Multiple Hypothesis Testing by Clustering Treatment Effects
Multiple hypothesis testing and clustering have been the subject of extensive research in high-dimensional inference, yet these problems usually have been treated separately. By defining trueExpand
Objective Criteria for the Evaluation of Clustering Methods
TLDR
This article proposes several criteria which isolate specific aspects of the performance of a method, such as its retrieval of inherent structure, its sensitivity to resampling and the stability of its results in the light of new data. Expand
...
1
2
...