• Corpus ID: 65153410

Efficient Bayesian Nonparametric Inference for Categorical Data with General High Missingness

@article{Wang2017EfficientBN,
  title={Efficient Bayesian Nonparametric Inference for Categorical Data with General High Missingness},
  author={Chaojie Wang and Linghao Shen and Han Li and Xiaodan Fan},
  journal={arXiv: Methodology},
  year={2017}
}
Missingness in categorical data is a common problem in various real applications. Traditional approaches either utilize only the complete observations or impute the missing data by some ad hoc methods rather than the true conditional distribution of the missing data, thus losing or distorting the rich information in the partial observations. In this paper, we develop a Bayesian nonparametric approach, the Dirichlet Process Mixture of Collapsed Product-Multinomials (DPMCPM), to model the full… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 34 REFERENCES

Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys

TLDR
This work presents a fully Bayesian, joint modeling approach to multiple imputation for categorical data based on Dirichlet process mixtures of multinomial distributions, which automatically models complex dependencies while being computationally expedient.

Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models With Local Dependence

TLDR
A nonparametric Bayesian joint model for multivariate continuous and categorical variables and imputations based on the proposed model tend to have better repeated sampling properties than the default application of chained equations in this realistic setting.

Bayesian Multilevel Latent Class Models for the Multiple Imputation of Nested Categorical Data

  • D. VidottoJ. VermuntKatrijn Van Deun
  • Computer Science
    Journal of educational and behavioral statistics : a quarterly publication sponsored by the American Educational Research Association and the American Statistical Association
  • 2018
TLDR
Results indicate that the BMLC model is able to recover unbiased parameter estimates of the analysis models considered in the authors' studies, as well as to correctly reflect the uncertainty due to missing data, outperforming the competing methods.

Bayesian Simultaneous Edit and Imputation for Multivariate Categorical Data

TLDR
A Bayesian hierarchical model is used that couples a stochastic model for the measurement error process with a Dirichlet process mixture of multinomial distributions for the underlying, error-free values and is restricted to have support only on the set of theoretically possible combinations.

MIMCA: multiple imputation for categorical variables with multiple correspondence analysis

TLDR
The proposed method provides a good point estimate of the parameters of the analysis model considered, such as the coefficients of a main effects logistic regression model, and a reliable estimates of the variability of the estimators.

MULTIPLE IMPUTATION OF INCOMPLETE CATEGORICAL DATA USING LATENT CLASS ANALYSIS

TLDR
The proposed multiple imputation method, which is implemented in Latent GOLD software for latent class analysis, is illustrated with two examples and a comparison to well-established methods such as maximum likelihood is compared.

Mixture analysis of multivariate categorical data with covariates and missing entries

Dirichlet Process Mixture Models for Modeling and Generating Synthetic Versions of Nested Categorical Data

We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for

Review: a gentle introduction to imputation of missing values.

Multiple imputation of missing categorical data using latent class models: State of art

(ProQuest: ... denotes formulae omitted.)IntroductionSocial and behavioral science researchers often collect data using tests or questionnaires consisting of items which are supposed to measure one