A generalization of moderated statistics to data adaptive semiparametric estimation in high-dimensional biology.

The widespread availability of high-dimensional biological sequencing data has made the simultaneous screening of numerous biological characteristics a central statistical problem in computational biology. While the dimensionality of such data sets continues to increase, the problem of teasing out the effects of biomarkers in studies measuring baseline confounders while avoiding model misspecification remains only partially addressed. Efficient estimators constructed from data adaptive… 

