Automating Construction of Machine Learning Models with Clinical Big Data: Rationale and Methods


Background: To improve health outcomes and cut healthcare costs, we often need to conduct prediction/classification using large clinical data sets, a.k.a. “clinical big data,” e.g., to identify high-risk patients for preventive interventions. Machine learning has been proposed as a key technology for doing this. Machine learning won most data science competitions and could support many clinical activities, yet only 15% of hospitals use it for even limited purposes. Despite familiarity with data, healthcare researchers often lack machine learning expertise to directly use clinical big data, creating a hurdle in realizing value from their data. Healthcare researchers can work with data scientists with deep machine learning knowledge, but it takes time and effort for both parties to communicate effectively. Facing a U.S. shortage of data scientists and hiring competition from companies with deep pockets, healthcare systems have difficulty recruiting data scientists. Building and generalizing a machine learning model often requires hundreds to thousands of manual iterations by data scientists to select: a) hyper-parameter values and complex algorithms that greatly affect model accuracy, as well as b) operators and periods for temporally aggregating clinical attributes (e.g., whether a patient’s weight kept rising in the past year). This process becomes infeasible with limited budgets. Objective: This study’s goal is to enable healthcare researchers to directly use clinical big data, make machine learning feasible with limited budgets and data scientist resources, and realize value from data. Methods: This study will: 1) finish developing new software Auto-ML (Automated Machine Learning) to automate model selection for machine learning with clinical big data and validate Auto-ML on seven benchmark modeling problems of clinical importance, 2) apply Auto-ML and novel methodology to two new modeling problems crucial for care management allocation and pilot one model with care managers, and 3) perform simulations to estimate the impact of adopting Auto-ML on U.S. patient outcomes. Results: We are currently writing Auto-ML’s design document. We intend to finish our study in around five years. Conclusions: Auto-ML will generalize to various clinical prediction/classification problems. With minimal help from data scientists, healthcare researchers can use Auto-ML to quickly build high-quality models. This will boost wider use of machine learning in healthcare and improve patient outcomes.

5 Figures and Tables

Cite this paper

@inproceedings{Luo2017AutomatingCO, title={Automating Construction of Machine Learning Models with Clinical Big Data: Rationale and Methods}, author={Gang Luo and Bryan L. Stone and Michael D. Johnson and Peter Tarczy-Hornoch and Adam B. Wilcox and Sean D. Mooney and Xiaoming Sheng and Peter J. Haug and Mario R . Capecchi}, year={2017} }