Can non-specialists provide high quality gold standard labels in challenging modalities?

  title={Can non-specialists provide high quality gold standard labels in challenging modalities?},
  author={Samuel Budd and Thomas G Day and John Simpson and Karen Lloyd and Jacqueline Matthew and Emily Skelton and Reza Razavi and Bernhard Kainz},
Probably yes. — Supervised Deep Learning dominates performance scores for many computer vision tasks and defines the stateof-the-art. However, medical image analysis lags behind natural image applications. One of the many reasons is the lack of well annotated medical image data available to researchers. One of the first things researchers are told is that we require significant expertise to reliably and accurately interpret and label such data. We see significant interand intra-observer… Expand

Figures and Tables from this paper


Embracing Imperfect Datasets: A Review of Deep Learning Solutions for Medical Image Segmentation
This article provides a detailed review of the solutions above, summarizing both the technical novelties and empirical results, and compares the benefits and requirements of the surveyed methodologies and provides recommended solutions. Expand
How Many Annotators Do We Need? - A Study on the Influence of Inter-Observer Variability on the Reliability of Automatic Mitotic Figure Assessment
It is concluded that databases by few pathologists with high label precision may be the best compromise between high algorithmic performance and time investment. Expand
Large-scale medical image annotation with crowd-powered algorithms
A multistage segmentation pipeline incorporating a hybrid crowd-algorithm 3-D segmentation algorithm integrated into a medical imaging platform is developed and it is shown that the crowd is able to detect and refine inaccurate organ contours with a quality similar to that of experts. Expand
Robustness study of noisy annotation in deep learning based medical image segmentation.
This study suggests that the involved network is robust to noisy annotation to some extent in mandible segmentation from CT images, and highlights the importance of labeling quality in deep learning. Expand
Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks
This work explores the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web, and proposes a technique for bias correction that significantly improves annotation quality on two tasks. Expand
U-Net: Convolutional Networks for Biomedical Image Segmentation
It is shown that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Expand
Early Experiences with Crowdsourcing Airway Annotations in Chest CT
This work investigates whether crowdsourcing can be used to gather airway annotations which can serve directly for measuring the airways, or as training data for the algorithms, and describes a number of further research directions. Expand
Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria
An empirical study is conducted to examine the effect of noisy annotations on the performance of sentiment classification models, and evaluate the utility of annotation selection on classification accuracy and efficiency. Expand
Detecting Hypo-plastic Left Heart Syndrome in Fetal Ultrasound via Disease-specific Atlas Maps
The recently introduced Image-and-Spatial Transformer Networks (Atlas-ISTN) is proposed to extend into a framework that enables sensitising atlas generation to disease and can jointly learn image segmentation, registration, atlas construction and disease prediction while providing a maximum level of clinical interpretability. Expand
Needle in a Haystack: Reducing the Costs of Annotating Rare-Class Instances in Imbalanced Datasets
This paper shows that annotation redundancy for noise reduction is very expensive on a class-imbalanced dataset, and should be discarded for instances receiving a single common-class label, and produces annotations at approximately the same cost of a metadata-trained, supervised cascading machine classifier. Expand