Ensemble-based regression analysis of multimodal medical data for osteopenia diagnosis

Abstract

Areal bone mineral density (aBMD) is used in clinical practice to diagnose osteoporosis. In previous studies, aBMD was estimated from diagnostic computed tomography (dCT) images, but a battery of medical tests was also taken that can be used to improve the regression performance. However, it is difficult to exploit the multimodal data as the additional features have poor informativeness and may lead to overfitting. An ensemble-based framework is proposed to improve the regression accuracy and robust-ness on multimodal medical data with a high relative dimensionality. Instead of case-wise bootstrap aggregating, a filtering-based metalearner scheme was employed to build feature-wise ensembles. The proposed approach was evaluated on clinical data and was found to be superior to bagging and other ensemble methods. The feature-wise ensembling approach can also be used to automatically determine if any multimodal features are related to bone mineral density. Several blood measurements were identified to be linked with bone mineral density, and a literature search supported the automatic identification results. In clinical studies, besides the main modalities being studied, other medical measurements are often taken. For a radiological study, it is common to also take blood and hormone measurements for control purposes. These multimodal data are often left unstud-ied as they are not the focus of the investigation. However, there may be hidden relationships between the disease symptoms and these multimodal data. Although it is likely that any hidden relationships are weaker than the primary modality, there is potential for the primary relationship to be improved by exploiting the hidden information contained in multimodal data. In this study, we are interested in using blood, hormone, and physical measurements to improve the areal bone mineral density (aBMD) estimated from diagnostic computed tomography (dCT). It is not feasible to solve the problem by directly applying multivariate regression, as the additional multimodal features are less informative. The increased ratio of features to training cases also introduces the problem of high relative dimensionality, which may lead to overfitting. Ensemble method have favorable properties that make them suitable for datasets with high dimensionality (Moon et al., 2007), high class imbalance (Lo et al., 2008), or missing features (Nanni, Lumini, & Braham, 2012). Data from medical studies typically suffer from one or more of the above conditions, due to the difficulty and cost of acquiring clinical data. Ensemble methods are therefore suitable to be applied to medical datasets. By modifying the ensemble method (Wall, …

DOI: 10.1016/j.eswa.2012.08.031

Extracted Key Phrases

Statistics

010020030020132014201520162017
Citations per Year

113 Citations

Semantic Scholar estimates that this publication has received between 5 and 609 citations based on the available data.

See our FAQ for additional information.