Corpus ID: 236447406

Image Scene Graph Generation (SGG) Benchmark

  title={Image Scene Graph Generation (SGG) Benchmark},
  author={Xiaotian Han and Jianwei Yang and Houdong Hu and Lei Zhang and Jianfeng Gao and Pengchuan Zhang},
There is a surge of interest in image scene graph generation (object, attribute and relationship detection) due to the need of building fine-grained image understanding models that go beyond object detection. Due to the lack of a good benchmark, the reported results of different scene graph generation models are not directly comparable, impeding the research progress. We have developed a much-needed scene graph generation benchmark based on the maskrcnn-benchmark[13] and several popular models… Expand

Figures and Tables from this paper


Graph R-CNN for Scene Graph Generation
A novel scene graph generation model called Graph R-CNN, that is both effective and efficient at detecting objects and their relations in images, is proposed and a new evaluation metric is introduced that is more holistic and realistic than existing metrics. Expand
Image Generation from Scene Graphs
This work proposes a method for generating images from scene graphs, enabling explicitly reasoning about objects and their relationships, and validates this approach on Visual Genome and COCO-Stuff. Expand
Scene Graph Generation by Iterative Message Passing
This work explicitly model the objects and their relationships using scene graphs, a visually-grounded graphical structure of an image, and proposes a novel end-to-end model that generates such structured scene representation from an input image. Expand
Scene Graph Generation from Objects, Phrases and Region Captions
This work proposes a novel neural network model, termed as Multi-level Scene Description Network (denoted as MSDN), to solve the three vision tasks jointly in an end-to-end manner and shows the joint learning across three tasks with the proposed method can bring mutual improvements over previous models. Expand
Unpaired Image Captioning via Scene Graph Alignments
This paper proposes an unsupervised feature alignment method that maps the scene graph features from the image to the sentence modality and can generate quite promising results without using any image-caption training pairs, outperforming existing methods by a wide margin. Expand
Image Captioning with Scene-graph Based Semantic Concepts
This paper explores the co-occurrence dependency of high-level semantic concepts and proposes a novel method with scene-graph based semantic representation for image captioning using a CNN-RNN-SVM framework to generate the scene- graph-based sequence. Expand
Image retrieval using scene graphs
A conditional random field model that reasons about possible groundings of scene graphs to test images and shows that the full model can be used to improve object localization compared to baseline methods and outperforms retrieval methods that use only objects or low-level image features. Expand
3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera
A semi-automatic framework that employs existing detection methods and enhances them using two main constraints: framing of query images sampled on panoramas to maximize the performance of 2D detectors, and multi-view consistency enforcement across 2D detections that originate in different camera locations. Expand
Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
This work designs two particular scene graph encoders in their model for VSG and TSG, which can refine the representation of each node on the graph by aggregating neighborhood information, which favorably enables us to evaluate the similarity of image and text in the two levels in a more plausible way. Expand
Neural Motifs: Scene Graph Parsing with Global Context
This work analyzes the role of motifs: regularly appearing substructures in scene graphs and introduces Stacked Motif Networks, a new architecture designed to capture higher order motifs in scene graph graphs that improves on the previous state-of-the-art by an average of 3.6% relative improvement across evaluation settings. Expand