# AutoGMM: Automatic and Hierarchical Gaussian Mixture Modeling in Python

@inproceedings{Athey2019AutoGMMAA, title={AutoGMM: Automatic and Hierarchical Gaussian Mixture Modeling in Python}, author={Thomas L Athey and Tingshan Liu and Benjamin D. Pedigo and Joshua T. Vogelstein}, year={2019} }

Background: Gaussian mixture modeling is a fundamental tool in clustering, as well as discriminant analysis and semiparametric density estimation. However, estimating the optimal model for any given number of components is an NP-hard problem, and estimating the number of components is in some respects an even harder problem. Findings: In R, a popular package called mclust addresses both of these problems. However, Python has lacked such a package. We therefore introduce AutoGMM, a Python…

## Figures and Tables from this paper

## One Citation

### Superclass-Conditional Gaussian Mixture Model For Learning Fine-Grained Embeddings

- Computer ScienceICLR
- 2022

A training framework underlain by a novel superclass-conditional Gaussian mixture model (SCGM), which imitates the generative process of samples from hierarchies of classes through latent variable modeling of the fine-grained subclasses that is efficient, and flexible to different domains.

## References

SHOWING 1-10 OF 31 REFERENCES

### mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models

- Computer ScienceR J.
- 2016

This updated version of mclust adds new covariance structures, dimension reduction capabilities for visualisation, model selection criteria, initialisation strategies for the EM algorithm, and bootstrap-based inference, making it a full-featured R package for data analysis via finite mixture modelling.

### Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering

- Mathematics, Computer ScienceJ. Classif.
- 2007

A modified version of BIC is proposed, where the likelihood is evaluated at the MAP instead of the MLE, and the resulting method avoids degeneracies and singularities, but when these are not present it gives similar results to the standard method using MLE.

### Model-based Gaussian and non-Gaussian clustering

- Computer Science
- 1993

The classification maximum likelihood approach is sufficiently general to encompass many current clustering algorithms, including those based on the sum of squares criterion and on the criterion of Friedman and Rubin (1967), but it is restricted to Gaussian distributions and it does not allow for noise.

### On Spectral Learning of Mixtures of Distributions

- Mathematics, Computer ScienceCOLT
- 2005

It is proved that a very simple algorithm, namely spectral projection followed by single-linkage clustering, properly classifies every point in the sample, and there are many Gaussian mixtures such that each pair of means is separated, yet upon spectral projection the mixture collapses completely.

### Model-Based Clustering, Discriminant Analysis, and Density Estimation

- Computer Science
- 2002

This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.

### CURE: an efficient clustering algorithm for large databases

- Computer ScienceSIGMOD '98
- 1998

This work proposes a new clustering algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes and wide variances in size, and demonstrates that random sampling and partitioning enable CURE to not only outperform existing algorithms but also to scale well for large databases without sacrificing clustering quality.

### Learning mixtures of Gaussians

- Computer Science40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039)
- 1999

This work presents the first provably correct algorithm for learning a mixture of Gaussians, which returns the true centers of the Gaussian to within the precision specified by the user with high probability.

### Some methods for classification and analysis of multivariate observations

- Mathematics
- 1967

The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give…

### Model-based clustering of high-dimensional data: A review

- Computer ScienceComput. Stat. Data Anal.
- 2014