Model-based clustering using copulas with applications

  title={Model-based clustering using copulas with applications},
  author={Ioannis Kosmidis and Dimitris Karlis},
  journal={Statistics and Computing},
The majority of model-based clustering techniques is based on multivariate normal models and their variants. In this paper copulas are used for the construction of flexible families of models for clustering applications. The use of copulas in model-based clustering offers two direct advantages over current methods: (i) the appropriate choice of copulas provides the ability to obtain a range of exotic shapes for the clusters, and (ii) the explicit choice of marginal distributions for the… 
Vine copula mixture models and clustering for non-Gaussian data
Dependency Clustering of Mixed Data with Gaussian Mixture Copulas
A new, efficient, semiparametric algorithm is designed to approximately estimate the parameters of the copula that can fit continuous, ordinal and binary data and empirically demonstrate performance improvements over state-of-the-art methods of correlation clustering on synthetic and benchmark datasets.
Copula–based clustering methods
In the dissimilarity–based clustering framework, methods based on concordance or tail-dependence concept are described and compared and a comparison between the two different approaches is performed through a case study on environmental data.
Some non-standard statistical dependence problems
A framework for the application of pair-mixtures of copulas to model asymmetric dependencies in bivariate data is developed and it is confirmed that for small sample sizes, these tests fail to maintain their 5% significance level and that the Cramér−von Mises-type statistics are the most powerful.
CoClust: An R Package for Copula-Based Cluster Analysis
The aim of this chapter is to present and describe the R package CoClust, which enables implementing a clustering algorithm based on the copula function that overcomes the limitations of classic approaches that only deal with linear bivariate relationships.
Variable selection for mixed data clustering: a model-based approach
Two approaches for selecting variables in latent class analysis are proposed to avoid the computation of the maximum likelihood estimates for each model comparison and avoid the use of the standard algorithms for variable selection which are often suboptimal and computationally expensive.
A Semiparametric and Location-Shift Copula-Based Mixture Model
  • G. Mazo
  • Computer Science
    J. Classif.
  • 2017
This paper aims at overcoming limitations by presenting a copulabased mixture model which is semiparametric, allowing for data adaptation without any modeling effort.
Vine copulas for mixed data : multi-view clustering for mixed data beyond meta-Gaussian dependencies
This work designs a new inference algorithm to fit vines on mixed data thereby extending their use to several applications and develops a dependency-seeking multi-view clustering model based on Dirichlet Process mixture of vines that generalizes previous models to arbitrary dependencies as well as to mixed marginals.
Copula-based bivariate finite mixture regression models with an application for insurance claim count data
Copula-based bivariate finite mixture of regression models are proposed, the new approach is defined, estimation through an EM algorithm is presented, and then different models are applied to a Spanish insurance claim count database.


Model-based clustering with non-elliptically contoured distributions
Finite mixtures of the normal inverse Gaussian distribution (and its multivariate extensions) are proposed, which start from a density that allows for skewness and fat tails, generalize the existing models, are tractable and have desirable properties.
Copula Functions in Model Based Clustering
This paper considers the other proposal based on the general stochastic approach in two versions: classification likelihood approach, where each observation comes from one of several populations; and mixture approach,where observations are distributed as a mixture of several distributions.
A Copula-Based Algorithm for Discovering Patterns of Dependent Observations
A new algorithm (CoClust in brief) is proposed that allows to cluster dependent data according to the multivariate structure of the generating process without any assumption on the margins and is compared with a model–based clustering technique.
Modeling Dependence with C- and D-Vine Copulas: The R Package CDVine
The R package CDVine is presented which provides functions and tools for statistical inference of canonical vine (C-vine) and D-vine copulas and contains tools for bivariate exploratory data analysis and for b variables selection as well as for selection of pair-copula families in a vine.
Finite Mixture Models
The aim of this article is to provide an up-to-date account of the theory and methodological developments underlying the applications of finite mixture models.
Model-based Gaussian and non-Gaussian clustering
The classification maximum likelihood approach is sufficiently general to encompass many current clustering algorithms, including those based on the sum of squares criterion and on the criterion of Friedman and Rubin (1967), but it is restricted to Gaussian distributions and it does not allow for noise.
mclust Version 4 for R : Normal Mixture Modeling for Model-Based Clustering , Classification , and Density Estimation
This version of mclust provides functions for parameter estimation via the EM algorithm for normal mixture models with a variety of covariance structures, and functions for simulation from these models.
Flexible mixture modelling using the multivariate skew-t-normal distribution
This paper presents a robust probabilistic mixture model based on the multivariate skew-t-normal distribution, a skew extension of the multivariate Student’s t distribution with more powerful
Multivariate mixture modeling using skew-normal independent distributions