A note on dichotomization of continuous response variable in the presence of contamination and model misspecification.
@article{Shentu2010ANO,
title={A note on dichotomization of continuous response variable in the presence of contamination and model misspecification.},
author={Yue Shentu and Min-ge Xie},
journal={Statistics in medicine},
year={2010},
volume={29 21},
pages={
2200-14
}
}The purpose of this note is to raise awareness of the complexity of the practice involving dichotomization. It is well known that the regular regression models are effective tools for analyzing Gaussian-type response variables, and researchers are often told that it is a 'bad idea' to practice dichotomization if continuous measurements are available. We demonstrate through special cases, however, that there is another side of the story if the response variable is contaminated. Although…
29 Citations
Dichotomizing Continuous Data Which Retains Statistical Precision Using a Bayesian Distributional Approach That Reflects the True Uncertainty
- Mathematics
- 2016
Although dichotomization is widely criticized by statisticians, it is sometimes useful and necessary in medical research for decision-making or communication purposes. To address the issue of…
A distributional approach to obtain adjusted comparisons of proportions of a population at risk
- Psychology, MedicineEmerging Themes in Epidemiology
- 2016
When an outcome follows the required condition of distribution shift between exposure groups, the results of a linear regression model can be followed by the corresponding comparison of proportions at risk thus avoiding the drawback of the usual dichotomisation of continuous outcomes.
Getting less of what you want: reductions in statistical power and increased bias when categorizing medication adherence data
- PsychologyJournal of Behavioral Medicine
- 2016
It is recommended that adherence be measured continuously and analyzed without categorization when using it as a predictor in regression models, and it is shown how parameter estimates and standard errors can be severely biased when categorizing adherence.
Ranking scientific journals via latent class models for polytomous item response data
- Economics
- 2015
We propose a strategy for ranking scientific journals starting from a set of available quantitative indicators that represent imperfect measures of the unobservable "value" of the journals of…
Interpreting and Testing Interactions in Conditional Mixture Models
- Psychology
- 2016
Mixture modeling applications in psychology often include covariates to explain class membership and aid in construct validation of the latent classification variable. These applications tend to use…
On the Effectiveness of Discretizing Quantitative Attributes in Linear Classifiers
- Computer ScienceIEEE Access
- 2020
It is demonstrated that, contrary to prevalent belief, discretization of quantitative attributes, for discriminative linear models, is a beneficial pre-processing step, as it leads to far superior classification performance, especially on bigger datasets, and surprisingly, much better convergence, which leads to better training time.
Raising Placebo Efficacy in Antidepressant Trials Across Decades Explained by Small-Study Effects: A Meta-Reanalysis
- Psychology, MedicineFrontiers in Psychiatry
- 2020
The present findings contribute to the ongoing debate on antidepressant placebo outcomes and highlight the need to adjust for bias introduced by SSE, a well-known but not yet formally assessed bias in antidepressant trials.
Statistical Modeling of Occupational Exposure to Polycyclic Aromatic Hydrocarbons Using OSHA Data
- Environmental ScienceJournal of occupational and environmental hygiene
- 2015
Mixed-effects logistic models were used to predict the exceedance fraction (EF), i.e., the probability of exceeding OSHA's Permissible Exposure Limit (PEL) for PAHs based on industry and occupation, and will be used to create a job-exposure matrix for use in a population-based case-control study exploring PAH exposure and breast cancer risk.
Integrative Data Analysis for Research in Developmental Psychopathology
- Psychology
- 2016
Researchers from many disciplines have increasingly called for changes in research practices to be more transparent and rigorous. One of the changes prominently discussed is the utilization of…
Kurdi, Amanj I and Chen, Li-Chia and Elliott, Rachel A (2017) Exploring factors associated with patients’ adherence to antihypertensive drugs
- Medicine
- 2018
This retrospective cohort study included adults with primary hypertension identified in the UK Clinical Practice Research Datalink from April/2006 to March/2013 to explore factors associated with adherence to antihypertensive drugs overall and to particular classes in hypertensive patients.
References
SHOWING 1-10 OF 25 REFERENCES
Measurement error in the response in the general linear model
- Mathematics
- 1996
Abstract Problems where there is measurement error in the response variable in a general linear model are considered. With Y denoting the true value and U the observed/surrogate value, a rich class…
On the practice of dichotomization of quantitative variables.
- PsychologyPsychological methods
- 2002
The authors present the case that dichotomization is rarely defensible and often will yield misleading results.
On the Existence of Maximum Likelihood Estimators for the Binomial Response Models
- Psychology
- 1981
SUMMARY Necessary and sufficient conditions are given for the existence of maximum likelihood estimators of the linear regression parameter in binomial response (this includes Logit and Probit)…
Dichotomizing continuous predictors in multiple regression: a bad idea.
- MathematicsStatistics in medicine
- 2006
It is argued that the simplicity achieved is gained at a cost; dichotomization may create rather than avoid problems, notably a considerable loss of power and residual confounding.
Loss of Power in Logistic, Ordinal Logistic, and Probit Regression When an Outcome Variable Is Coarsely Categorized
- Psychology
- 2006
Variables that have been coarsely categorized into a small number of ordered categories are often modeled as outcome variables in psychological research. The authors employ a Monte Carlo study to…
The Cost of Dichotomization
- Economics
- 1983
to discarding 38% and 60% of the cases under representative conditions. As dichotomization departs from the mean, the costs in variance accounted for and in power are even larger. Consequences of…
Measuring overeducation with earnings frontiers and multiply imputed censored income data
- Economics
- 2006
"In this paper, we remove one serious drawback of the IAB employment sample impeding its applicability to the estimation of earnings frontiers: the censoring of the income data, by multiple…
Measurement error in nonlinear models: a modern perspective
- Mathematics
- 2006
The Regression Calibration Algorithm and Examples of the Approximations Theoretical Examples Bibliographic Notes and Software Simulation Extrapolation Overview Simulationextrapolation Heuristics The SIMEX Algorithm Applications SIMEX in Some Important Special Cases Extensions and Related Methods.
The Statistical Analysis of Discrete Data
- Computer Science, Mathematics
- 1989
This paper presents a meta-analysis of large sample theory of univariate Discrete Responses and some results from Linear Algebra suggest that the model chosen may be biased towards linear models.


