Multiple system estimation using covariates having missing values and measurement error: Estimating the size of the Māori population in New Zealand

  title={Multiple system estimation using covariates having missing values and measurement error: Estimating the size of the Māori population in New Zealand},
  author={Peter G. M. van der Heijden and Maarten J.L.F. Cruyff and Paul A. Smith and Christine Bycroft and Patrick J. Graham and Nathaniel Matheson-Dunning},
  journal={Journal of the Royal Statistical Society: Series A (Statistics in Society)},
We investigate the use of two or more linked lists, for both population size estimation and the relationship between variables appearing on all or only some lists. This relationship is usually not fully known because some individuals appear in only some lists, and some are not in any list. These two problems have been solved simultaneously using the EM algorithm. We extend this approach to estimate the size of the indigenous Māori population in New Zealand, leading to several innovations: (1… 


Population Size Estimation Using Multiple Incomplete Lists with Overcoverage
This article proposes an approach to this problem that employs a class of capturerecapture methods based on Latent Class models that are applied to five sources of empirical data to estimate the number of active local units of Italian enterprises in 2011.
An Overview of Population Size Estimation where Linking Registers Results in Incomplete Covariates, with an Application to Mode of Transport of Serious Road Casualties
The properties of the linkage of two or more registers are elucidated, where the model is appropriate and in situations corresponding with real applications in official statistics, and alsoWhere the model conditions are violated.
People born in the Middle East but residing in the Netherlands: Invariant population size estimates and the role of active and passive covariates
Including covariates in loglinear models of population registers improves population size estimates for two reasons. First, it is possible to take heterogeneity of inclusion probabilities over the
Sensitivity of Population Size Estimation for Violating Parametric Assumptions in Log-linear Models
Abstract An important quality aspect of censuses is the degree of coverage of the population. When administrative registers are available undercoverage can be estimated via capture-recapture
A Multiple‐Record Systems Estimation Method that Takes Observed and Unobserved Heterogeneity into Account
Summary.  We present a model to estimate the size of an unknown population from a number of lists that applies when the assumptions of (a) homogeneity of capture probabilities of individuals and (b)
Estimating error rates in an administrative register and survey questions using a latent class model
The chapter describes the data on neighborhood of residence obtained from a survey and an important Dutch official administrative register, and then details the latent class model built to estimate classification error rates in these measures.
Multilist Population Estimation with Incomplete and Partial Stratification
A general method to deal with cases when not all lists are active in all strata using an expectation maximization (EM) algorithm is developed and a flexible log-linear modeling framework is used that allows for list dependencies and differential probabilities of ascertainment in each list.
Estimating Classification Errors Under Edit Restrictions in Composite Survey-Register Data Using Multiple Imputation Latent Class Modelling (MILC)
A new method is proposed to estimate the number of classification errors across several sources while taking into account impossible combinations with scores on other variables, which enhances the quality of statistics based on the composite data set.
The framework for estimating coverage in the 2011 Census of England and Wales: Combining dual-system estimation with ratio estimation
Dual-system estimation is a well-established approach for estimating an unknown population size from two independent but imperfect counts of the population. In this paper we develop the estimation
Estimating the number of serious road injuries per vehicle type in the Netherlands by using multiple imputation of latent classes
The MILC method is extended to handle the large number of missing values in the stratification variable ‘region of accident’ and to include more stratification covariates and a multiply imputed data set is generated that can be used to create statistical figures in a straightforward manner.