Corpus ID: 222125307

Contrastive Learning of Medical Visual Representations from Paired Images and Text

@article{Zhang2020ContrastiveLO,
  title={Contrastive Learning of Medical Visual Representations from Paired Images and Text},
  author={Yuhao Zhang and Hang Jiang and Y. Miura and Christopher D. Manning and C. Langlotz},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.00747}
}
Learning visual representations of medical images is core to medical image understanding but its progress has been held back by the small size of hand-labeled datasets. Existing work commonly relies on transferring weights from ImageNet pretraining, which is suboptimal due to drastically different image characteristics, or rule-based label extraction from the textual report data paired with medical images, which is inaccurate and hard to generalize. We propose an alternative unsupervised… Expand

Figures and Tables from this paper

Semi-weakly Supervised Contrastive Representation Learning for Retinal Fundus Images
TLDR
This work considers weak labels in the form of pseudolabels and proposes a semi-weakly supervised contrastive learning (SWCL) framework for representation learning using semiweakly annotated images, which surpasses all prior self-supervised methods and standard cross-entropy training, while closing the gaps with ImageNet pretraining. Expand
Unsupervised Local Discrimination for Medical Images
Contrastive representation learning is an effective unsupervised method to alleviate the demand for expensive annotated data in medical image processing. Recent work mainly based on instance-wiseExpand
MedAug: Contrastive learning leveraging patient metadata improves representations for chest X-ray interpretation
TLDR
This work develops a method to select positive pairs coming from views of possibly different images through the use of patient metadata that is broadly applicable to medical image interpretation and allows flexibility for incorporating medical insights in choosing pairs for contrastive learning. Expand
Self-supervised Image-text Pre-training With Mixed Data In Chest X-rays
TLDR
An image-text pretraining framework that can learn from raw data with mixed data inputs, a mixture of paired and unpaired data and multi-scale masked vision modeling is introduced as a self-supervised training task for image patch regeneration. Expand
Data Efficient Language-Supervised Zero-Shot Recognition with Optimal Transport Distillation
Traditional computer vision models are trained to predict a fixed set of predefined categories. Recently, natural language has been shown to be a broader and richer source of supervision thatExpand
Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation
TLDR
This work proposes a data-efficient contrastive distillation method that uses soft labels to learn from noisy image-text pairs and exceeds the previous SoTA of general zero-shot learning on ImageNet 21k+1k by 73% relatively with a ResNet50 image encoder and DeCLUTR text encoder. Expand
Learning Transferable Visual Models From Natural Language Supervision
TLDR
It is demonstrated that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. Expand
Recent advances and clinical applications of deep learning in medical image analysis
TLDR
The latest progress and contributions of state-of-the-art unsupervised and semi-supervised deep learning in medical images are emphasized and summarized based on different application scenarios, including lesion classification, segmentation, detection, and image registration. Expand
LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision
Computer vision tasks such as object detection and semantic/instance segmentation rely on the painstaking annotation of large training datasets. In this paper, we propose LocTex that takes advantageExpand
Exploiting the relationship between visual and textual features in social networks for image classification with zero-shot deep learning
TLDR
This work proposes a classifier ensemble based on the transferable learning capabilities of the CLIP neural network architecture in multimodal environments (image and text) from social media and uses the InstaNY100K dataset and proposes a validation approach based on sampling techniques. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 38 REFERENCES
Transfusion: Understanding Transfer Learning for Medical Imaging
TLDR
Investigating the learned representations and features finds that some of the differences from transfer learning are due to the over-parametrization of standard models rather than sophisticated feature reuse, and isolate where useful feature reuse occurs, and outline the implications for more efficient model exploration. Expand
TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-Rays
TLDR
A novel Text-Image Embedding network (TieNet) is proposed for extracting the distinctive image and text representations of chest X-rays and multi-level attention models are integrated into an end-to-end trainable CNN-RNN architecture for highlighting the meaningful text words and image regions. Expand
Data-Efficient Image Recognition with Contrastive Predictive Coding
TLDR
This work revisit and improve Contrastive Predictive Coding, an unsupervised objective for learning such representations which make the variability in natural signals more predictable, and produces features which support state-of-the-art linear classification accuracy on the ImageNet dataset. Expand
Clinically applicable deep learning for diagnosis and referral in retinal disease
TLDR
A novel deep learning architecture performs device-independent tissue segmentation of clinical 3D retinal images followed by separate diagnostic classification that meets or exceeds human expert clinical diagnoses of retinal disease. Expand
CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison
TLDR
A labeler is designed to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation, in CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. Expand
On the Automatic Generation of Medical Imaging Reports
TLDR
This work builds a multi-task learning framework which jointly performs the prediction of tags and the generation of paragraphs, proposes a co-attention mechanism to localize regions containing abnormalities and generate narrations for them, and develops a hierarchical LSTM model to generate long paragraphs. Expand
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
TLDR
The LXMERT (Learning Cross-Modality Encoder Representations from Transformers) framework, a large-scale Transformer model that consists of three encoders, achieves the state-of-the-art results on two visual question answering datasets and shows the generalizability of the pre-trained cross-modality model. Expand
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. Expand
Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists
TLDR
CheXNeXt, a convolutional neural network to concurrently detect the presence of 14 different pathologies, including pneumonia, pleural effusion, pulmonary masses, and nodules in frontal-view chest radiographs, achieved radiologist-level performance on 11 pathologies and did not achieve radiologists' level performance on 3 pathologies. Expand
Momentum Contrast for Unsupervised Visual Representation Learning
We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and aExpand
...
1
2
3
4
...