Unifying Relational Sentence Generation and Retrieval for Medical Image Report Composition

  title={Unifying Relational Sentence Generation and Retrieval for Medical Image Report Composition},
  author={Fuyu Wang and Xiaodan Liang and Lin Xu and Liang Lin},
  journal={IEEE Transactions on Cybernetics},
Beyond generating long and topic-coherent paragraphs in traditional captioning tasks, the medical image report composition task poses more task-oriented challenges by requiring both the highly accurate medical term diagnosis and multiple heterogeneous forms of information, including impression and findings. Current methods often generate the most common sentences due to dataset bias for the individual case, regardless of whether the sentences properly capture key entities and relationships… 
Attention-based CNN-GRU Model For Automatic Medical Images Captioning: ImageCLEF 2021
This work addressed the challenge of medical image captioning by combining a CNN encoder model with an attention-based GRU language generator model whereas a multi-label CNN classifier is used for the concept detection task.
Contrastive Attention for Automatic Chest X-ray Report Generation
The Contrastive Attention (CA) model is proposed, which can help existing models better attend to the abnormal regions and provide more accurate descriptions which are crucial for an interpretable diagnosis.
Egocentric Image Captioning for Privacy-Preserved Passive Dietary Intake Monitoring
This paper proposes a privacypreserved secure solution for dietary assessment with passive monitoring, which unifies food recognition, volume estimation, and scene understanding, and a novel transformer-based architecture is designed to caption egocentric dietary images.


Knowledge-driven Encode, Retrieve, Paraphrase for Medical Image Report Generation
Experiments show that the proposed KERP approach generates structured and robust reports supported with accurate abnormality description and explainable attentive regions, achieving the state-of-the-art results on two medical report benchmarks, with the best medical abnormality and disease classification accuracy and improved human evaluation performance.
Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation
A novel Hybrid Retrieval-Generation Reinforced Agent (HRGR-Agent) is proposed which reconciles traditional retrieval-based approaches populated with human prior knowledge, with modern learning- based approaches to achieve structured, robust, and diverse report generation.
On the Automatic Generation of Medical Imaging Reports
This work builds a multi-task learning framework which jointly performs the prediction of tags and the generation of paragraphs, proposes a co-attention mechanism to localize regions containing abnormalities and generate narrations for them, and develops a hierarchical LSTM model to generate long paragraphs.
Aligning where to see and what to tell: image caption with region-based attention and scene factorization
This paper proposes an image caption system that exploits the parallel structures between images and sentences and makes another novel modeling contribution by introducing scene-specific contexts that capture higher-level semantic information encoded in an image.
Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting
An accurate and fast summarization model that first selects salient sentences and then rewrites them abstractively to generate a concise overall summary is proposed, which achieves the new state-of-the-art on all metrics on the CNN/Daily Mail dataset, as well as significantly higher abstractiveness scores.
A Hierarchical Approach for Generating Descriptive Image Paragraphs
A model that decomposes both images and paragraphs into their constituent parts is developed, detecting semantic regions in images and using a hierarchical recurrent neural network to reason about language.
CIDEr: Consensus-based image description evaluation
A novel paradigm for evaluating image descriptions that uses human consensus is proposed and a new automated metric that captures human judgment of consensus better than existing metrics across sentences generated by various sources is evaluated.
Video Captioning With Attention-Based LSTM and Semantic Consistency
A novel end-to-end framework named aLSTMs, an attention-based LSTM model with semantic consistency, to transfer videos to natural sentences with competitive or even better results than the state-of-the-art baselines for video captioning in both BLEU and METEOR.
Multi-Attention and Incorporating Background Information Model for Chest X-Ray Image Report Generation
A new hierarchical model with multi-attention considering the background information that outperforms all baselines, achieving the state-of-the-art performance in terms of accuracy.
Know More Say Less: Image Captioning Based on Scene Graphs
A framework based on scene graphs for image captioning that leverages both visual features and semantic knowledge in structured scene graphs and introduces a hierarchical-attention-based module to learn discriminative features for word generation at each time step.