Corpus ID: 235694655

Dealing with overdispersion in multivariate count data

  title={Dealing with overdispersion in multivariate count data},
  author={Noemi Corsini and Cinzia Viroli},
The problem of overdispersion in multivariate count data is a challenging issue. Nowadays, it covers a central role mainly due to the relevance of modern technologies data, such as Next Generation Sequencing and textual data from the web or digital collections. This work presents a comprehensive analysis of the likelihood-based models for extra-variation data proposed in the scientific literature. Particular attention will be paid to the models feasible for high-dimensional data. A new approach… Expand

Figures and Tables from this paper


Estimating overdispersion in sparse multinomial data.
A new estimator is derived which has the lowest root mean squared error across a range of scenarios, especially when the data are very sparse, which is more robust than fitting a Dirichlet-multinomial model or adding a random effect to the linear predictor. Expand
Regression Models for Multivariate Count Data
  • Yiwen Zhang, Hua Zhou, Jin Zhou, Wei Sun
  • Mathematics, Medicine
  • Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America
  • 2017
This article studies some generalized linear models that incorporate various correlation structures among multivariate counts and studies the estimation, testing, and variable selection for these models in a unifying framework. Expand
Going deep in clustering high-dimensional data: deep mixtures of unigrams for uncovering topics in textual data
This work developed a deep version of mixtures of Unigrams for the unsupervised classification of very short documents with a large number of terms, by allowing for models with further deeper latent layers; the proposal is derived in a Bayesian framework. Expand
An improved method for the computation of maximum likeliood estimates for multinomial overdispersion models
This article considers the maximum likelihood estimation of two commonly used overdispersion models, namely, the Dirichlet-multinomial distribution (DM), due to Mosimann, and a finite mixture distribution (FM) proposed by Morel and Nagaraj, and Neerchal and Morel, and an approximation theorem is used to obtain a two-stage procedure. Expand
Mixture‐based clustering for count data using approximated Fisher Scoring and Minorization–Maximization approaches
Two alternative representations of DCM distribution are used to perform clustering based on finite mixture models, where the mixture parameters are estimated using the minorization–maximization framework and an approximation to the Fisher scoring algorithm are used. Expand
New improved estimators for overdispersion in models with clustered multinomial data and unequal cluster sizes
This paper aims to contribute to fill the gap by proposing new estimators for the intracluster correlation coefficient by proposing a new set of quasi-likelihood methods for this coefficient. Expand
A New Multinomial Model and a Zero Variance Estimation
A new Bayesian estimation approach where the prior distribution is constructed through the transformation of the multivariate beta of Olkin and Liu (2003), which allows us to estimate moments in Monte Carlo simulations with a dramatic reduction of their variances. Expand
Miscellanea. An extension of Morel-Nagaraj's finite mixture distribution for modelling multinomial clustered data
Morel & Nagaraj (1993) proposed a finite mixture distribution for modelling multinomial extra variation when the extra variation is believed to be caused by clumped multinomial sampling with oneExpand
A finite mixture distribution for modelling multinomial extra variation
SUMMARY We propose a new distribution to model categorical data exhibiting overdispersion when the overdispersion is believed to be caused by clumped sampling. The proposed distribution is a finiteExpand
A Conway-Maxwell-multinomial distribution for flexible modeling of clustered categorical data
This work considers a Conway-Maxwell-multinomial (CMM) distribution for modeling clustered categorical data exhibiting positively or negatively associated trials and features a dispersion parameter which allows it to adapt to a range of association levels and includes several recognizable distributions as special cases. Expand