• Corpus ID: 236987340

AutoGMM: Automatic and Hierarchical Gaussian Mixture Modeling in Python

  title={AutoGMM: Automatic and Hierarchical Gaussian Mixture Modeling in Python},
  author={Thomas L Athey and Tingshan Liu and Benjamin D. Pedigo and Joshua T. Vogelstein},
Background: Gaussian mixture modeling is a fundamental tool in clustering, as well as discriminant analysis and semiparametric density estimation. However, estimating the optimal model for any given number of components is an NP-hard problem, and estimating the number of components is in some respects an even harder problem. Findings: In R, a popular package called mclust addresses both of these problems. However, Python has lacked such a package. We therefore introduce AutoGMM, a Python… 
1 Citations

Figures and Tables from this paper

Superclass-Conditional Gaussian Mixture Model For Learning Fine-Grained Embeddings

A training framework underlain by a novel superclass-conditional Gaussian mixture model (SCGM), which imitates the generative process of samples from hierarchies of classes through latent variable modeling of the fine-grained subclasses that is efficient, and flexible to different domains.



mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models

This updated version of mclust adds new covariance structures, dimension reduction capabilities for visualisation, model selection criteria, initialisation strategies for the EM algorithm, and bootstrap-based inference, making it a full-featured R package for data analysis via finite mixture modelling.

Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering

A modified version of BIC is proposed, where the likelihood is evaluated at the MAP instead of the MLE, and the resulting method avoids degeneracies and singularities, but when these are not present it gives similar results to the standard method using MLE.

Model-based Gaussian and non-Gaussian clustering

The classification maximum likelihood approach is sufficiently general to encompass many current clustering algorithms, including those based on the sum of squares criterion and on the criterion of Friedman and Rubin (1967), but it is restricted to Gaussian distributions and it does not allow for noise.

A spectral algorithm for learning mixture models

On Spectral Learning of Mixtures of Distributions

It is proved that a very simple algorithm, namely spectral projection followed by single-linkage clustering, properly classifies every point in the sample, and there are many Gaussian mixtures such that each pair of means is separated, yet upon spectral projection the mixture collapses completely.

Model-Based Clustering, Discriminant Analysis, and Density Estimation

This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.

CURE: an efficient clustering algorithm for large databases

This work proposes a new clustering algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes and wide variances in size, and demonstrates that random sampling and partitioning enable CURE to not only outperform existing algorithms but also to scale well for large databases without sacrificing clustering quality.

Learning mixtures of Gaussians

  • S. Dasgupta
  • Computer Science
    40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039)
  • 1999
This work presents the first provably correct algorithm for learning a mixture of Gaussians, which returns the true centers of the Gaussian to within the precision specified by the user with high probability.

Some methods for classification and analysis of multivariate observations

The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give

Model-based clustering of high-dimensional data: A review