A note on dichotomization of continuous response variable in the presence of contamination and model misspecification.

@article{Shentu2010ANO,
  title={A note on dichotomization of continuous response variable in the presence of contamination and model misspecification.},
  author={Yue Shentu and Min-ge Xie},
  journal={Statistics in medicine},
  year={2010},
  volume={29 21},
  pages={
          2200-14
        }
}
  • Y. Shentu, M. Xie
  • Published 20 September 2010
  • Mathematics
  • Statistics in medicine
The purpose of this note is to raise awareness of the complexity of the practice involving dichotomization. It is well known that the regular regression models are effective tools for analyzing Gaussian-type response variables, and researchers are often told that it is a 'bad idea' to practice dichotomization if continuous measurements are available. We demonstrate through special cases, however, that there is another side of the story if the response variable is contaminated. Although… 

Figures from this paper

Dichotomizing Continuous Data Which Retains Statistical Precision Using a Bayesian Distributional Approach That Reflects the True Uncertainty
Although dichotomization is widely criticized by statisticians, it is sometimes useful and necessary in medical research for decision-making or communication purposes. To address the issue of
A distributional approach to obtain adjusted comparisons of proportions of a population at risk
TLDR
When an outcome follows the required condition of distribution shift between exposure groups, the results of a linear regression model can be followed by the corresponding comparison of proportions at risk thus avoiding the drawback of the usual dichotomisation of continuous outcomes.
Getting less of what you want: reductions in statistical power and increased bias when categorizing medication adherence data
TLDR
It is recommended that adherence be measured continuously and analyzed without categorization when using it as a predictor in regression models, and it is shown how parameter estimates and standard errors can be severely biased when categorizing adherence.
Ranking scientific journals via latent class models for polytomous item response data
We propose a strategy for ranking scientific journals starting from a set of available quantitative indicators that represent imperfect measures of the unobservable "value" of the journals of
Interpreting and Testing Interactions in Conditional Mixture Models
Mixture modeling applications in psychology often include covariates to explain class membership and aid in construct validation of the latent classification variable. These applications tend to use
On the Effectiveness of Discretizing Quantitative Attributes in Linear Classifiers
TLDR
It is demonstrated that, contrary to prevalent belief, discretization of quantitative attributes, for discriminative linear models, is a beneficial pre-processing step, as it leads to far superior classification performance, especially on bigger datasets, and surprisingly, much better convergence, which leads to better training time.
Raising Placebo Efficacy in Antidepressant Trials Across Decades Explained by Small-Study Effects: A Meta-Reanalysis
  • L. Holper
  • Psychology, Medicine
    Frontiers in Psychiatry
  • 2020
TLDR
The present findings contribute to the ongoing debate on antidepressant placebo outcomes and highlight the need to adjust for bias introduced by SSE, a well-known but not yet formally assessed bias in antidepressant trials.
Statistical Modeling of Occupational Exposure to Polycyclic Aromatic Hydrocarbons Using OSHA Data
TLDR
Mixed-effects logistic models were used to predict the exceedance fraction (EF), i.e., the probability of exceeding OSHA's Permissible Exposure Limit (PEL) for PAHs based on industry and occupation, and will be used to create a job-exposure matrix for use in a population-based case-control study exploring PAH exposure and breast cancer risk.
Integrative Data Analysis for Research in Developmental Psychopathology
Researchers from many disciplines have increasingly called for changes in research practices to be more transparent and rigorous. One of the changes prominently discussed is the utilization of
Kurdi, Amanj I and Chen, Li-Chia and Elliott, Rachel A (2017) Exploring factors associated with patients’ adherence to antihypertensive drugs
TLDR
This retrospective cohort study included adults with primary hypertension identified in the UK Clinical Practice Research Datalink from April/2006 to March/2013 to explore factors associated with adherence to antihypertensive drugs overall and to particular classes in hypertensive patients.
...
1
2
3
...

References

SHOWING 1-10 OF 25 REFERENCES
Measurement error in the response in the general linear model
Abstract Problems where there is measurement error in the response variable in a general linear model are considered. With Y denoting the true value and U the observed/surrogate value, a rich class
On the practice of dichotomization of quantitative variables.
TLDR
The authors present the case that dichotomization is rarely defensible and often will yield misleading results.
On the Existence of Maximum Likelihood Estimators for the Binomial Response Models
SUMMARY Necessary and sufficient conditions are given for the existence of maximum likelihood estimators of the linear regression parameter in binomial response (this includes Logit and Probit)
Dichotomizing continuous predictors in multiple regression: a bad idea.
TLDR
It is argued that the simplicity achieved is gained at a cost; dichotomization may create rather than avoid problems, notably a considerable loss of power and residual confounding.
Loss of Power in Logistic, Ordinal Logistic, and Probit Regression When an Outcome Variable Is Coarsely Categorized
Variables that have been coarsely categorized into a small number of ordered categories are often modeled as outcome variables in psychological research. The authors employ a Monte Carlo study to
The Cost of Dichotomization
to discarding 38% and 60% of the cases under representative conditions. As dichotomization departs from the mean, the costs in variance accounted for and in power are even larger. Consequences of
Measuring overeducation with earnings frontiers and multiply imputed censored income data
"In this paper, we remove one serious drawback of the IAB employment sample impeding its applicability to the estimation of earnings frontiers: the censoring of the income data, by multiple
Measurement Error Models
Measurement error in nonlinear models: a modern perspective
TLDR
The Regression Calibration Algorithm and Examples of the Approximations Theoretical Examples Bibliographic Notes and Software Simulation Extrapolation Overview Simulationextrapolation Heuristics The SIMEX Algorithm Applications SIMEX in Some Important Special Cases Extensions and Related Methods.
The Statistical Analysis of Discrete Data
TLDR
This paper presents a meta-analysis of large sample theory of univariate Discrete Responses and some results from Linear Algebra suggest that the model chosen may be biased towards linear models.
...
1
2
3
...