# Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models With Local Dependence

@article{Murray2014MultipleIO, title={Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models With Local Dependence}, author={Jared Murray and Jerome P. Reiter}, journal={Journal of the American Statistical Association}, year={2014}, volume={111}, pages={1466 - 1479} }

ABSTRACT We present a nonparametric Bayesian joint model for multivariate continuous and categorical variables, with the intention of developing a flexible engine for multiple imputation of missing values. The model fuses Dirichlet process mixtures of multinomial distributions for categorical variables with Dirichlet process mixtures of multivariate normal distributions for continuous variables. We incorporate dependence between the continuous and categorical variables by (1) modeling the means…

## 78 Citations

### Bayesian Mixture Modeling for Multivariate Conditional Distributions

- MathematicsJournal of Statistical Theory and Practice
- 2020

We present a Bayesian mixture model for estimating the joint distribution of mixed ordinal, nominal, and continuous data conditional on a set of fixed variables. The modeling strategy is motivated by…

### IMPROVING BAYESIAN MIXTURE MODELS FOR MULTIPLE IMPUTATION OF MISSING DATA USING FOCUSED CLUSTERING

- Computer Science
- 2018

A procedure for specifying which variables with low rates of missingness to include in the focus set is presented, and the performance of the imputation procedure is examined using simulation studies based on artificial data and on data from the American Community Survey.

### Nonparametric Bayesian Models With Focused Clustering for Mixed Ordinal and Nominal Data

- Mathematics
- 2015

Dirichlet process mixtures can be useful models of multivariate categorical data and effective tools for multiple imputation of missing categorical values. In some contexts, however, these models can…

### An Empirical Comparison of Multiple Imputation Methods for Categorical Data

- Computer Science
- 2015

The results suggest that default chained equations approaches based on generalized linear models are dominated by the default regression tree and Bayesian mixture model approaches, making both reasonable default engines for multiple imputation of categorical data.

### Efficient Bayesian Nonparametric Inference for Categorical Data with General High Missingness

- Computer Science
- 2017

A Bayesian nonparametric approach, the Dirichlet Process Mixture of Collapsed Product-Multinomials (DPMCPM) is developed, which can model general missing mechanisms by creating an extra category to denote missingness, which implicitly integrates out the missing part with regard to their true conditional distribution.

### Nonparametric statistical inference and imputation for incomplete categorical data

- Computer Science
- 2017

Under the framework of latent class analysis, DPMCPM can model general missing mechanisms by creating an extra category to denote missingness, which implicitly integrates out the missing part with regard to their true conditional distribution.

### Dirichlet Process Mixture Models for Modeling and Generating Synthetic Versions of Nested Categorical Data

- Mathematics
- 2014

We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for…

### Dirichlet Process Mixture Models for Nested Categorical Data

- Mathematics
- 2015

We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for…

### An Imputation model by Dirichlet Process Mixture of Elliptical Copulas for Data of Mixed Type

- Computer Science
- 2019

A Bayesian nonparametric approach is considered by using an infinite mixture of elliptical copulas induced by a Dirichlet process mixture to build a flexible copula function, which provides a better overall fit compared to their single component counterparts, and performs better at capturing tail dependence features of the data.

### Regression models involving nonlinear effects with missing data: A sequential modeling approach using Bayesian estimation.

- MathematicsPsychological methods
- 2019

It is demonstrated how the sequential modeling approach can be used to implement a multiple imputation strategy based on Bayesian estimation techniques that can accommodate rather complex substantive regression models with nonlinear effects and also allows a flexible treatment of auxiliary variables.

## References

SHOWING 1-10 OF 51 REFERENCES

### Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys

- Computer Science, Mathematics
- 2013

This work presents a fully Bayesian, joint modeling approach to multiple imputation for categorical data based on Dirichlet process mixtures of multinomial distributions, which automatically models complex dependencies while being computationally expedient.

### Missing covariates in generalized linear models when the missing data mechanism is non‐ignorable

- Mathematics, Computer Science
- 1999

A method for estimating parameters in generalized linear models with missing covariates and a non‐ignorable missing data mechanism and sensitivity analyses play an important role in this problem are discussed in detail.

### A multivariate technique for multiply imputing missing values using a sequence of regression models

- Mathematics
- 2001

This article describes and evaluates a procedure for imputing missing values for a relatively complex data structure when the data are missing at random. The imputations are obtained by fitting a…

### Nonparametric Bayes Modeling of Multivariate Categorical Data

- Computer Science, MathematicsJournal of the American Statistical Association
- 2012

This article develops a nonparametric Bayes approach, which defines a prior with full support on the space of distributions for multiple unordered categorical variables, and shows this can be accomplished through a Dirichlet process mixture of product multinomial distributions, which is also a convenient form for posterior computation.

### Bayesian Estimation of Discrete Multivariate Latent Structure Models With Structural Zeros

- Mathematics, Computer Science
- 2014

An approach for estimating posterior distributions in Bayesian latent structure models with potentially many structural zeros is presented, and an algorithm for collapsing a large set of structural zero combinations into a much smaller set of disjoint marginal conditions, which speeds up computation.

### Maximum likelihood estimation for mixed continuous and categorical data with missing values

- Mathematics
- 1985

SUMMARY Maximum likelihood procedures for analysing mixed continuous and categorical data with missing values are presented. The general location model of Olkin & Tate (1961) and extensions…

### Latent class based multiple imputation approach for missing categorical data.

- Computer Science, MathematicsJournal of statistical planning and inference
- 2010

### 9. Multiple Imputation of Incomplete Categorical Data Using Latent Class Analysis

- Mathematics, Computer Science
- 2008

The proposed multiple imputation method, which is implemented in Latent GOLD software for latent class analysis, is illustrated with two examples and compared to well-established methods such as maximum likelihood estimation with incomplete data and multiple imputations using a saturated log-linear model.

### Bayesian multiple imputation for large-scale categorical data with structural zeros

- Mathematics
- 2013

We propose an approach for multiple imputation of items missing at random in large-scale surveys with exclusively categorical variables that have structural zeros. Our approach is to use mixtures of…

### Multiple Imputation of Missing or Faulty Values Under Linear Constraints

- Mathematics
- 2014

Many statistical agencies, survey organizations, and research centers collect data that suffer from item nonresponse and erroneous or inconsistent values. These data may be required to satisfy linear…