Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports

  title={Generalized radiograph representation learning via cross-supervision between images and free-text radiology reports},
  author={H.-Y. Zhou and X. Chen and Y. Zhang and Rui Luo and L. Wang and Y Y Yu},
  journal={Nature Machine Intelligence},
  • H.-Y. ZhouX. Chen Y. Yu
  • Published 4 November 2021
  • Computer Science
  • Nature Machine Intelligence
Pre-training lays the foundation for recent successes in radiograph analysis supported by deep learning. It learns transferable image representations by conducting large-scale fully- or self-supervised learning on a source domain; however, supervised pre-training requires a complex and labour-intensive two-stage human-assisted annotation process, whereas self-supervised learning cannot compete with the supervised paradigm. To tackle these issues, we propose a cross-supervised methodology called… 

Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning

A novel M ulti- G ranularity C ross-modal A lignment (MGCA) framework for generalized medical visual representation learning by harnessing the naturally exhibited semantic correspondences between medical image and radiology reports at three different levels, i.e. , pathological region-level, instance- level, and disease-level.

Evaluating Progress in Automatic Chest X-Ray Radiology Report Generation

This study quantitatively examines the correlation between automated metrics and the scoring of reports by radiologists, and proposes a composite metric, called RadCliQ, that is able to rank the quality of reports similarly to radiologists and better than existing metrics.

UNet-2022: Exploring Dynamics in Non-isomorphic Architecture

A parallel non-isomorphic block is proposed that takes the advantages of self-attention and convolution with simple parallelization and is named UNet-2022, which obviously outperforms its counterparts in a range segmentation tasks, and has the potential to become the model of choice for medical image segmentation.

A Survey on Attention Mechanisms for Medical Applications: are we Moving Toward Better Algorithms?

This paper extensively reviews the use of attention mechanisms in machine learning methods (including Transformers) for several medical applications based on the types of tasks that may integrate several works pipelines of the medical domain and proposes future research lines in medical applications that may benefit from these frameworks.

Centrality and Consistency: Two-Stage Clean Samples Identification for Learning with Instance-Dependent Noisy Labels

This work proposes a two-stage clean samples identification method that employs a class-level feature clustering procedure for the early identification of clean samples that are near the class-wise prediction centers and addresses the class imbalance problem by aggregating rare classes according to their prediction entropy.

Neighborhood Collective Estimation for Noisy Label Identification and Correction

This work proposes Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors, and considerably outperforms state-of-the-art methods.

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

A comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective and divides their applications into categories according to the modality of input data, i.e., 2D natural images, videos, 3D data, vision + language, and medical images.

Multimodal biomedical AI

This Review outlines the most promising uses and the technical pitfalls to avoid of multimodal artificial intelligence in health, and explores opportunities in personalized medicine, digital clinical trials, remote monitoring and care, pandemic surveillance, digital twin technology and virtual health assistants.

Self-Supervised Pretraining Enables High-Performance Chest X-Ray Interpretation Across Clinical Distributions

This work investigated whether self-supervised pretraining methods could outperform traditional ImageNet pretraining for chest X-ray interpretation and found that SSL-pretrained models outperformed ImageNet-pretraining models on thirteen different datasets representing high diversity in geographies, clinical settings, and prediction tasks.



TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-Rays

A novel Text-Image Embedding network (TieNet) is proposed for extracting the distinctive image and text representations of chest X-rays and multi-level attention models are integrated into an end-to-end trainable CNN-RNN architecture for highlighting the meaningful text words and image regions.

CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison

A labeler is designed to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation, in CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients.

Interleaved text/image Deep Mining on a large-scale radiology database

The large-scale datasets of extracted key images and their categorization, embedded vector labels and sentence descriptions can be harnessed to alleviate the deep learning “data-hungry” obstacle in the medical domain.

Models Genesis

MIMIC-CXR: A large publicly available database of labeled chest radiographs

MIMic-CXR-JPG is derived entirely from the MIMIC-C XR database, and aims to provide a convenient processed version of MIMICS CXR, as well as to provided a standard reference for data splits and image labels.

Scaling and Benchmarking Self-Supervised Visual Representation Learning

It is shown that by scaling on various axes (including data size and problem 'hardness'), one can largely match or even exceed the performance of supervised pre-training on a variety of tasks such as object detection, surface normal estimation and visual navigation using reinforcement learning.

MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports

A large dataset of 227,835 imaging studies for 65,379 patients presenting to the Beth Israel Deaconess Medical Center Emergency Department between 2011–2016 is described, making freely available to facilitate and encourage a wide range of research in computer vision, natural language processing, and clinical data mining.

VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations

A dataset of more than 100,000 chest X-ray scans that were retrospectively collected from two major hospitals in Vietnam is described and a labeling platform for DICOM images is designed and built to facilitate these annotation procedures.

ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases

A new chest X-rays database, namely ChestX-ray8, is presented, which comprises 108,948 frontal-view X-ray images of 32,717 unique patients with the text-mined eight disease image labels from the associated radiological reports using natural language processing, which is validated using the proposed dataset.