Simpson's Paradox in COVID-19 Case Fatality Rates: A Mediation Analysis of Age-Related Causal Effects

@article{vonKgelgen2021SimpsonsPI,
  title={Simpson's Paradox in COVID-19 Case Fatality Rates: A Mediation Analysis of Age-Related Causal Effects},
  author={Julius von K{\"u}gelgen and Luigi Gresele and Bernhard Scholkopf},
  journal={Ieee Transactions on Artificial Intelligence},
  year={2021},
  volume={2},
  pages={18 - 27}
}
We point out an instantiation of Simpson's paradox in COVID-19 case fatality rates (cfrs): comparing a large-scale study from China (February 17) with early reports from Italy (March 9), we find that cfrs are lower in Italy for every age group, but higher overall. This phenomenon is explained by a stark difference in case demographic between the two countries. Using this as a motivating example, we introduce basic concepts from mediation analysis and show how these can be used to quantify… 

Figures from this paper

Towards Understanding the COVID-19 Case Fatality Rate
TLDR
This work analyzes the case fatality rate of the COVID-19 pandemic at two different time snapshots, July 6 and Dec 28, 2020, and considers two important population covariates, age and GDP as a proxy for the quality and abundance of public health.
Questioning causality on sex, gender and COVID-19, and identifying bias in large-scale data-driven analyses: the Bias Priority Recommendations and Bias Catalog for Pandemics
TLDR
An encyclopedia-like reference guide, the Bias Catalog for Pandemics (BCP), is compiled, to provide definitions and emphasize realistic examples of bias in general, and within the COVID-19 pandemic context, to raise awareness on the dimensionality of such foreseen impacts.
Estimating the case fatality ratio for COVID-19 using a time-shifted distribution analysis
TLDR
This work presents a simple method for calculating the CFR using only public case and death data over time by exploiting the correspondence between the time distributions of cases and deaths, and discusses corrections to CFR values using excess-death and seroprevalence data to estimate the infection fatality ratio (IFR).
Toward Causal Representation Learning
TLDR
Fundamental concepts of causal inference are reviewed and related to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research.
Simpson's Paradox: A Singularity of Statistical and Inductive Inference
The occurrence of Simpson’s paradox (SP) in 2 × 2 contingency tables has been well studied in the literature. The first contribution of the present work is to comprehensively revisit this problem. We
Why Not to Trust Big Data: Discussing Statistical Paradoxes
TLDR
It is provided that statistical paradoxes are more common in a variety of data and they lead to wrong conclusions potentially with harmful consequences and Experiments on two real-world datasets and a case study indicate that statistical contradiction are severely harmful to big data and automatic data analysis techniques.
Causal Reasoning with Spatial-temporal Representation Learning: A Prospective Study
TLDR
This paper conducts a comprehensive review of existing causal reasoning methods for spatial-temporal representation learning, covering fundamental theories, models, and datasets, and proposes some primary challenges, opportunities, and future research directions for benchmarking causal reasoning algorithms in spatial- temporal representation learning.
Causal Reasoning Meets Visual Representation Learning: A Prospective Study
TLDR
This paper conducts a comprehensive review of existing causal reasoning methods for visual representation learning, covering fundamental theories, models, and datasets, and proposes some prospective challenges, opportunities, and future research directions for benchmarking causal reasoning algorithms inVisual representation learning.
F ROM S TATISTICAL TO C AUSAL L EARNING
We describe basic ideas underlying research to build and understand artificially intelligent systems: from symbolic approaches via statistical learning to interventional models relying on concepts of
From Statistical to Causal Learning
We describe basic ideas underlying research to build and understand artificially intelligent systems: from symbolic approaches via statistical learning to interventional models relying on concepts of

References

SHOWING 1-10 OF 82 REFERENCES
Causal impact of masks, policies, behavior on early covid-19 pandemic in the U.S.
TLDR
This paper evaluates the dynamic impact of various policies, such as school, business, and restaurant closures, adopted by the US states on the growth rates of confirmed Covid-19 cases and social distancing behavior measured by Google Mobility Reports, and finds that both policies and information on transmission risks are important determinants of people's social Distancing behavior.
Assaying Large-scale Testing Models to Interpret Covid-19 Case Numbers. A Cross-country Study
TLDR
Competing hypotheses regarding the underlying testing mechanisms are modeled, thereby providing different prevalence estimates based on case numbers, and used to predict SARS-CoV-2-attributed death rate trajectories, which supports non-trivial testing mechanisms can be inferred from data and should be scrutinized.
Detecting Simpson's Paradox
TLDR
A method to discover Simpson’s paradox for the trend of the pair of continuous variables, which uses categorical variables to partition the whole data set into groups and finds the sign reversal between the coefficient correlations measured in the group relative to the original entire data.
Intergenerational Ties and Case Fatality Rates: A Cross-Country Analysis
COVID-19 is spreading and has reached the state of a worldwide pandemic and health systems are or will be tested in how they can deal with it. So far, during this early phase of the pandemic,
Collider bias undermines our understanding of COVID-19 disease risk and severity
TLDR
The challenge of interpreting observational evidence from samples of the population, which may be affected by collider bias, is highlighted using data from the UK Biobank in which individuals tested for COVID-19 are highly selected for a wide range of genetic, behavioural, cardiovascular, demographic, and anthropometric traits.
An empirical estimate of the infection fatality rate of COVID-19 from the first Italian outbreak
TLDR
Empirical estimates based on population level data show a sharp difference in fatality rates between young and old people and firmly rule out overall fatality ratios below 0.5% in populations with more than 30% over 60 years old.
External Validity: From Do-Calculus to Transportability Across Populations
TLDR
A formal representation called "selection diagrams" for expressing knowledge about differences and commonalities between populations of interest is introduced and questions of transportability are reduced to symbolic derivations in the do-calculus.
Adjustment Criteria for Generalizing Experimental Findings
TLDR
The assumptions and machinery necessary for using covariate adjustment to correct for the biases generated by both transportability and sampling selection bias are investigated, and experimental data is generalized to infer causal effects in a new domain.
...
...