Graph R-CNN for Scene Graph Generation

@article{Yang2018GraphRF,
  title={Graph R-CNN for Scene Graph Generation},
  author={Jianwei Yang and Jiasen Lu and Stefan Lee and Dhruv Batra and Devi Parikh},
  journal={ArXiv},
  year={2018},
  volume={abs/1808.00191}
}
We propose a novel scene graph generation model called Graph R-CNN, that is both effective and efficient at detecting objects and their relations in images. [] Key Method We also propose an attentional Graph Convolutional Network (aGCN) that effectively captures contextual information between objects and relations. Finally, we introduce a new evaluation metric that is more holistic and realistic than existing metrics. We report state-of-the-art performance on scene graph generation as evaluated using both…

Fully Convolutional Scene Graph Generation

A fully convolutional scene graph generation (FCSGG) model that detects objects and relations simultaneously and achieves highly competitive results on recall and zeroshot recall with significantly reduced inference time is presented.

Relation Regularized Scene Graph Generation

A relation regularized network (R2-Net) is proposed, which can predict whether there is a relationship between two objects and encode this relation into object feature refinement and better SGG.

Scene Graph Generation With External Knowledge and Image Reconstruction

This paper proposes a novel scene graph generation algorithm with external knowledge and image reconstruction loss to overcome dataset issues, and extracts commonsense knowledge from the external knowledge base to refine object and phrase features for improving generalizability inscene graph generation.

Scene Graph Generation Using Depth, Spatial, and Visual Cues in 2D Images

A framework (S2G) is proposed for generating scene graphs directly from images using depth and spatial information of object pairs and evaluated on the scene graph generation model reveal that the proposed framework achieves better results on data than the state-of-the-art.

DH-GCN: Saliency-Aware Complex Scene Graph Generation Using Dual-Hierarchy Graph Convolutional Network

An innovative dual-hierarchy graph convolutional network (DH-GCN) is proposed, which is a conceptually elegant and efficient top-down approach to graph generation that leverages salient object detector to hierarchize objects and give gist nodes more accurate representation.

Transformer-based Scene Graph Generation Network With Relational Attention Module

A novel transformer-based network and a training scheme with instance-level pseudotargets are proposed and the relational attention module is introduced to overcome the cropped feature problem and achieves state-of-the-art or competitive performance in all tasks.

Memory-Based Network for Scene Graph with Unbalanced Relations

This work proposes a novel scene graph generation model that can effectively improve the detection of low-frequency relations and uses the method of memory features to realize the transfer of high-frequency relation features to low- frequencies.

Attentive Gated Graph Neural Network for Image Scene Graph Generation

This work translates the scene graph into an Attentive Gated Graph Neural Network which can propagate a message by visual relationship embedding and can increase the accuracy of object classification and reduce the complexity of relationship classification.

Attentive Relational Networks for Mapping Images to Scene Graphs

A novel Attentive Relational Network that consists of two key modules with an object detection backbone to approach this problem, and accurate scene graphs are produced by the relation inference module to recognize all entities and corresponding relations.

Image Scene Graph Generation (SGG) Benchmark

A much-needed scene graph generation benchmark based on the maskrcnn-benchmark and several popular models and a comprehensive ablation study of scenegraph generation models using the Visual Genome and OpenImages Visual relationship detection datasets are presented.
...

References

SHOWING 1-10 OF 47 REFERENCES

Pixels to Graphs by Associative Embedding

A method for training a convolutional neural network such that it takes in an input image and produces a full graph definition and is done end-to-end in a single stage with the use of associative embeddings.

Scene Graph Generation from Objects, Phrases and Region Captions

This work proposes a novel neural network model, termed as Multi-level Scene Description Network (denoted as MSDN), to solve the three vision tasks jointly in an end-to-end manner and shows the joint learning across three tasks with the proposed method can bring mutual improvements over previous models.

Scene Graph Generation by Iterative Message Passing

This work explicitly model the objects and their relationships using scene graphs, a visually-grounded graphical structure of an image, and proposes a novel end-to-end model that generates such structured scene representation from an input image.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.

Neural Motifs: Scene Graph Parsing with Global Context

This work analyzes the role of motifs: regularly appearing substructures in scene graphs and introduces Stacked Motif Networks, a new architecture designed to capture higher order motifs in scene graph graphs that improves on the previous state-of-the-art by an average of 3.6% relative improvement across evaluation settings.

Relationship Proposal Networks

The model is named the Relationship Proposal Network (Rel-PN), which is class-agnostic and thus scalable to an open vocabulary of objects and demonstrates the ability of the model to localize relationships with only a few thousand proposals.

Graph-Structured Representations for Visual Question Answering

This paper proposes to build graphs over the scene objects and over the question words, and describes a deep neural network that exploits the structure in these representations, and achieves significant improvements over the state-of-the-art.

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

Image retrieval using scene graphs

A conditional random field model that reasons about possible groundings of scene graphs to test images and shows that the full model can be used to improve object localization compared to baseline methods and outperforms retrieval methods that use only objects or low-level image features.

ViP-CNN: A Visual Phrase Reasoning Convolutional Neural Network for Visual Relationship Detection

In ViP-CNN, the visual relationship is considered as a phrase with three components and a Visual Phrase Reasoning Structure (VPRS) is presented to set up the connection among the relationship components and help the model consider the three problems jointly.