Stop or Continue Data Collection: A Nonignorable Missing Data Approach for Continuous Variables

  title={Stop or Continue Data Collection: A Nonignorable Missing Data Approach for Continuous Variables},
  author={Tha{\'i}s Paiva and Jerome P. Reiter},
  journal={Journal of Official Statistics},
  pages={579 - 599}
Abstract We present an approach to inform decisions about nonresponse follow-up sampling. The basic idea is (i) to create completed samples by imputing nonrespondents’ data under various assumptions about the nonresponse mechanisms, (ii) take hypothetical samples of varying sizes from the completed samples, and (iii) compute and compare measures of accuracy and cost for different proposed sample sizes. As part of the methodology, we present a new approach for generating imputations for… 

Figures from this paper

Statistical Modeling of Longitudinal Data with Non-ignorable Non-monotone Missingness with Semiparametric Bayesian and Machine Learning Components.
The methodology is applied to data from a phase II clinical trial that studies quality of life of patients with prostate cancer receiving radiation therapy, and improves existing methods by accommodating data with small sample size.
Imputation of Missing Data Using Linear Gaussian Cluster-Weighted Modeling
This work introduces a new imputation methodology for databases with univariate missing patterns based on additional information from fully-observed auxiliary variables using the properties of Gaussian Cluster-Weighted modeling to construct a predictive model to impute the missing values using the information from the covariates.
Comparing the Ability of Regression Modeling and Bayesian Additive Regression Trees to Predict Costs in a Responsive Survey Design Context
This work evaluates alternative modeling strategies aimed at predicting survey costs (specifically, interviewer hours) using data from the National Survey of Family Growth and includes multilevel regression and Bayesian Additive Regression Trees (BART).
BAGEL: A non-ignorable missing value estimation method for mixed attribute datasets
This paper proposes a methodology called Bayesian Genetic Algorithm (BAGEL) with hybridized Bayesian and Genetic Al algorithm principles, implemented in real datasets for imputing both discrete and continuous missing values and the imputation accuracy is observed.
Bayesian Mixture Modeling for Multivariate Conditional Distributions
We present a Bayesian mixture model for estimating the joint distribution of mixed ordinal, nominal, and continuous data conditional on a set of fixed variables. The modeling strategy is motivated by
Effects of a Government-Academic Partnership: Has the NSF-CENSUS Bureau Research Network Helped Improve the US Statistical System?
The National Science Foundation-Census Bureau Research Network (NCRN) was established in 2011 to create interdisciplinary research nodes on methodological questions of interest and significance to
Responsive and Adaptive Design for Survey Optimization
This paper presents the four pillars of responsive and adaptive design (RAD), a scientific framework driven by cost-quality tradeoff analysis and optimization that enables the most efficient production of high-quality data.


Multiple imputation in mixture models for nonignorable nonresponse with follow-ups
Abstract One approach to inference for means or linear regression parameters when the outcome is subject to nonignorable nonresponse is mixture modeling. Mixture models assume separate parameters for
Proxy Pattern-Mixture Analysis for Survey Nonresponse
We consider assessment of nonresponse bias for the mean of a survey variable Y subject to nonresponse. We assume that there are a set of covariates observed for nonrespondents and respondents. To
A new stopping rule for surveys
A new rule for when to stop collecting data in a sample survey is developed that attempts to use complete interview data as well as covariates available on non-responders to determine when the probability that collecting additional data will change the survey estimate is sufficiently low to justify stopping data collection.
Stopping rules for surveys with multiple waves of nonrespondent follow-up.
Three stopping rules that are based on assessing whether successive waves of sampling provide evidence that the parameter of interest is changing are proposed that would save time and possibly resources, and adjusting for the nonresponse in the analysis would reduce the impact of nonresponse bias.
The analysis of longitudinal ordinal data with nonrandom drop-out
A model is proposed for longitudinal ordinal data with nonrandom drop-out, which combines the multivariate Dale model for longitudinal ordinal data with a logistic regression model for drop-out.
The 2010 Morris Hansen lecture dealing with survey nonresponse in data collection, in estimation
The concept of “balanced response set” introduced in this article extends the well-known idea of "balanced sample" and is a quadratic form relating to a multivariate auxiliary vector; its statistical properties are explored.
Modeling the Drop-Out Mechanism in Repeated-Measures Studies
Methods that simultaneously model the data and the drop-out process within a unified model-based framework are discussed, and possible extensions outlined.
Informative Drop‐Out in Longitudinal Data Analysis
A model is proposed for continuous longitudinal data with non-ignorable or informative drop-out (ID). The model combines a multivariate linear model for the underlying response with a logistic
Multiple Imputation of Missing or Faulty Values Under Linear Constraints
Many statistical agencies, survey organizations, and research centers collect data that suffer from item nonresponse and erroneous or inconsistent values. These data may be required to satisfy linear
Global Measures of Data Utility for Microdata Masked for Disclosure Limitation
When releasing microdata to the public, data disseminators typically alter the original data to protect the confldentiality of database subjects' identities and sensitive attributes. However, such