• Corpus ID: 248157369

Practical considerations for specifying a super learner

@inproceedings{Phillips2022PracticalCF,
  title={Practical considerations for specifying a super learner},
  author={Rachael V. Phillips and Mark J. van der Laan and Hana Lee and Susan Gruber},
  year={2022}
}
parametric to know in advance is the most for a dataset and prediction task at hand. The super learner (SL) is an algorithm that alleviates over selecting the one “right” strategy while the freedom to consider many of them, such as those recommended by collaborators, used in related research, or specified by subject-matter experts. It is an entirely pre-specified and data-adaptive strategy for predictive modeling. To ensure the SL is well-specified for learning the prediction function, the… 

Figures and Tables from this paper

Dispensing with unnecessary assumptions in population genetics analysis

The approach is founded on Targeted Learning, a framework for estimation that integrates mathematical statistics, machine learning and causal inference to provide mathematical guarantees and realistic p-values and extends the reach of current genome-wide association studies by simultaneously allowing for the classification of the types of SNPs and phenotypes for which such non-linearities occur.

Targeted learning in observational studies with multi-level treatments: An evaluation of antipsychotic drug treatment safety

We investigate estimation of causal effects of multiple competing (multi-valued) treatments in the absence of randomization. Our work is motivated by an intention-to-treat study of the relative

A Causal Research Pipeline and Tutorial for Psychologists and Social Scientists

This work reformulates the typical approach to research in psychology to harmonize inevitably causal theories with the rest of the research pipeline, and presents a new process which begins with the incorporation of techniques from the confluence of causal discovery and machine learning for the development, validation, and transparent formal specification of theories.

References

SHOWING 1-10 OF 36 REFERENCES

Principled Machine Learning Using the Super Learner: An Application to Predicting Prison Violence

A powerful new approach to statistical learning is presented that leverages a variety of data-adaptive methods, such as random forests and spline regression, and systematically chooses the one, or a weighted combination of many, that produces the best forecasts.

Combining Possibly Related Estimation Problems

SUMMARY We have two sets of parameters we wish to estimate, and wonder whether the James-Stein estimator should be applied separately to the two sets or once to the combined problem. We show that

Oracle inequalities for multi-fold cross validation

The results are extended to penalized cross validation in order to control unbounded loss functions and applications include regression with squared and absolute deviation loss and classification under Tsybakov’s condition.

R: A language and environment for statistical computing.

Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice

The Cross-Validated Adaptive Epsilon-Net Estimator

A cross-validated e-net estimation method that uses a collection of submodels and aCollection of e-nets over each submodel to derive a finite sample inequality that shows that the resulting estimator is as good as an oracle estimator that uses the best submodel and resolution level for the unknown true parameter.

The Balance Super Learner: A robust adaptation of the Super Learner to improve estimation of the average treatment effect in the treated based on propensity score matching

The results suggest that the use of this adapted Super Learner to estimate the propensity score can further improve the robustness of propensity score matching estimators.

Variable Importance and Prediction Methods for Longitudinal Problems with Missing Variables

Predictive and variable importance methods for longitudinal data sets containing continuous and binary exposures subject to missingness are presented and used for prognosis of medical outcomes of severe trauma patients.