Corpus ID: 211480380

A Benchmark of Medical Out of Distribution Detection

@article{Cohen2020ABO,
  title={A Benchmark of Medical Out of Distribution Detection},
  author={Joseph Paul Cohen and Tianshi Cao and Chin-Wei Huang and D. Y. Hui},
  journal={ArXiv},
  year={2020},
  volume={abs/2007.04250}
}
Motivation: Deep learning models deployed for use on medical tasks can be equipped with Out-of-Distribution Detection (OoDD) methods in order to avoid erroneous predictions. However it is unclear which OoDD method should be used in practice. Specific Problem: Systems trained for one particular domain of images cannot be expected to perform accurately on images of a different domain. These images should be flagged by an OoDD method prior to diagnosis. Our approach: This paper defines 3… Expand
Margin-Aware Intra-Class Novelty Identification for Medical Images
TLDR
The anomaly detection model TEND can effectively identify the challenging intra-class out-of-distribution medical images in an unsupervised fashion and can be applied to discover unseen medical image classes and serve as the abnormal data screening for downstream medical tasks. Expand
CheXseen: Unseen Disease Detection for Deep Learning Interpretation of Chest X-rays
TLDR
It is found that the penultimate layer of the deep neural network provides useful features for unseen disease detection, and this can inform the safe clinical deployment of deep learning models trained on a non-exhaustive set of disease classes. Expand
Learn what you can't learn: Regularized Ensembles for Transductive Out-of-distribution Detection
TLDR
This paper proposes a novel method that uses an artificial labeling scheme for the test data and regularization to obtain ensembles of models that produce contradictory predictions only on the OOD samples in a test batch, and is able to significantly outperform both inductive and transductive baselines on difficult OOD detection scenarios. Expand
Novelty detection using ensembles with regularized disagreement
Despite their excellent performance on in-distribution (ID) data, machine learning-based prediction systems often predict out-of-distribution (OOD) samples incorrectly while indicating highExpand
Problems in the deployment of machine-learned models in health care
  • Joseph Paul Cohen, Tianshi Cao, +6 authors Y. Bengio
  • Medicine
  • Canadian Medical Association Journal
  • 2021
CMAJ | SEPTEMBER 7, 2021 | VOLUME 193 | ISSUE 35 E1391 I n a companion article, Verma and colleagues discuss how machine-learned solutions can be developed and implemented to support medicalExpand
Loss Estimators Improve Model Generalization
TLDR
This paper proposes to train a loss estimator alongside the predictive model, using a contrastive training objective, to directly estimate the prediction uncertainties, and finds that, in addition to producing well-calibrated uncertainties, this approach improves the generalization behavior of the predictor. Expand
ENERGY-BASED OUT-OF-DISTRIBUTION DETECTION FOR MULTI-LABEL CLASSIFICATION
  • 2020
Out-of-distribution (OOD) detection is essential to prevent anomalous inputs from causing a model to fail during deployment. Improved methods for OOD detection in multi-class classification haveExpand
Machine Learning Applications on Neuroimaging for Diagnosis and Prognosis of Epilepsy: A Review
TLDR
This review highlights the interactions between neuroimaging and machine learning in the context of the epilepsy diagnosis and prognosis, and discusses current achievements, challenges, potential future directions in the field, with the hope to pave a way to computer-aided diagnosis and prediction of epilepsy. Expand
Thinkback: Task-SpecificOut-of-Distribution Detection
TLDR
This paper proposes in this paper a novel way to formulate the out-of-distribution detection problem, tailored for DL models, that does not require fine tuning process on training data, yet is significantly more accurate than the state of the art for out- of-dist distribution detection. Expand
How is BERT surprised? Layerwise detection of linguistic anomalies
TLDR
The best performing model RoBERTa exhibits surprisal in earlier layers when the anomaly is morphosyntactic than when it is semantic, while commonsense anomalies do not exhibit surprisal at any intermediate layer, suggesting that language models employ separate mechanisms to detect different types of linguistic anomalies. Expand
...
1
2
...

References

SHOWING 1-10 OF 34 REFERENCES
Likelihood Ratios for Out-of-Distribution Detection
TLDR
This work investigates deep generative model based approaches for OOD detection and observes that the likelihood score is heavily affected by population level background statistics, and proposes a likelihood ratio method forDeep generative models which effectively corrects for these confounding background statistics. Expand
Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases
TLDR
This paper investigates concepts through seven unique DP tasks as use cases to elucidate techniques needed to produce comparable, and in many cases, superior to results from the state-of-the-art hand-crafted feature-based classification approaches. Expand
Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks
TLDR
The proposed ODIN method, based on the observation that using temperature scaling and adding small perturbations to the input can separate the softmax score distributions between in- and out-of-distribution images, allowing for more effective detection, consistently outperforms the baseline approach by a large margin. Expand
Deep Learning for Medical Image Processing: Overview, Challenges and Future
The health care sector is totally different from any other industry. It is a high priority sector and consumers expect the highest level of care and services regardless of cost. The health careExpand
Does Your Model Know the Digit 6 Is Not a Cat? A Less Biased Evaluation of "Outlier" Detectors
TLDR
The OD-test benchmark provides a straightforward means of comparison for methods that address the out-of-distribution sample detection problem and an exhaustive evaluation of a broad set of methods from related areas on image classification tasks shows that for realistic applications of high-dimensional images, the existing methods have low accuracy. Expand
PadChest: A large chest x-ray image dataset with multi-label annotated reports
TLDR
This dataset includes more than 160,000 images obtained from 67,000 patients that were interpreted and reported by radiologists at San Juan Hospital (Spain) from 2009 to 2017, covering six different position views and additional information on image acquisition and patient demography. Expand
A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks
TLDR
This paper proposes a simple yet effective method for detecting any abnormal samples, which is applicable to any pre-trained softmax neural classifier, and obtains the class conditional Gaussian distributions with respect to (low- and upper-level) features of the deep models under Gaussian discriminant analysis. Expand
Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer
TLDR
In the setting of a challenge competition, some deep learning algorithms achieved better diagnostic performance than a panel of 11 pathologists participating in a simulation exercise designed to mimic routine pathology workflow; algorithm performance was comparable with an expert pathologist interpreting whole-slide images without time constraints. Expand
Do Deep Generative Models Know What They Don't Know?
TLDR
The density learned by flow-based models, VAEs, and PixelCNNs cannot distinguish images of common objects such as dogs, trucks, and horses from those of house numbers, and such behavior persists even when the flows are restricted to constant-volume transformations. Expand
ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases
TLDR
A new chest X-rays database, namely ChestX-ray8, is presented, which comprises 108,948 frontal-view X-ray images of 32,717 unique patients with the text-mined eight disease image labels from the associated radiological reports using natural language processing, which is validated using the proposed dataset. Expand
...
1
2
3
4
...