• Corpus ID: 235731734

Uncertainty in Lung Cancer Stage for Outcome Estimation via Set-Valued Classification

  title={Uncertainty in Lung Cancer Stage for Outcome Estimation via Set-Valued Classification},
  author={Savannah L. Bergquist and Gabriel A. Brooks and Mary Beth Landrum and Nancy L. Keating and Sherri Rose},
Difficulty in identifying cancer stage in health care claims data has limited oncology quality of care and health outcomes research. We fit prediction algorithms for classifying lung cancer stage into three classes (stages I/II, stage III, and stage IV) using claims data, and then demonstrate a method for incorporating the classification uncertainty in outcomes estimation. Leveraging set-valued classification and split conformal inference, we show how a fixed algorithm developed in one cohort… 


Classifying Stage IV Lung Cancer From Health Care Claims: A Comparison of Multiple Analytic Approaches.
Machine learning algorithms have potential to improve lung cancer stage classification but may be prone to overfitting, and degradation of accuracy between development and validation cohorts suggests the need for caution in implementing machine learning in research or care delivery.
Detecting Lung and Colorectal Cancer Recurrence Using Structured Clinical/Administrative Data to Enable Outcomes Research and Population Health Management
Algorithms to detect the presence and timing of recurrence after definitive therapy for stages I–III lung and colorectal cancer using 2 data sources that contain a widely available type of structured data linked to gold-standard recurrence status are developed.
Uncertainty estimation for classification and risk prediction on medical tabular data
This work expands and refine the set of heuristics to select an uncertainty estimation technique and observes that ensembles and related techniques perform poorly when it comes to detecting out-of-domain examples, a critical task which is carried out more successfully by auto-encoders.
Updated Overview of the SEER-Medicare Data: Enhanced Content and Applications.
The large sample size and diverse array of data on cancer patients and noncancer controls in the SEER-Medicare database make it a unique resource for conducting cancer health services research.
Development, Validation, and Dissemination of a Breast Cancer Recurrence Detection and Timing Informatics Algorithm
Valid and reliable detection of recurrence using data derived from electronic medical records and insurance claims is feasible and will enable extensive, novel research on quality, effectiveness, and outcomes for breast cancer patients and those who develop recurrence.
Survival ensembles.
A unified and flexible framework for ensemble learning in the presence of censoring for right-censored data is proposed and a random forest algorithm and a generic gradient boosting algorithm are introduced for the construction of prognostic and diagnostic models.
Super Learner for Survival Data Prediction
This paper proposes two algorithms for constructing super learners in survival data prediction where the individual algorithms are based on proportional hazards and compares the performance of the proposed super learners with existing models through extensive simulation studies.
Random survival forests
This article introduces random survival forests, a random forests method for the analysis of right-censored survival data, and extends Breiman’s random forests (RF) method, showing it to be highly accurate and comparable to state-of-the-art methods.
Post-prediction Inference
The postpi approach can correct bias and improve variance estimation (and thus subsequent statistical inference) with predicted outcome data and can improve inference in two totally distinct fields: modeling predicted phenotypes in re-purposed gene expression data and modeling predicted causes of death in verbal autopsy data.