Scene Graph Generation by Iterative Message Passing
@article{Xu2017SceneGG, title={Scene Graph Generation by Iterative Message Passing}, author={Danfei Xu and Yuke Zhu and Christopher Bongsoo Choy and Li Fei-Fei}, journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2017}, pages={3097-3106} }
Understanding a visual scene goes beyond recognizing individual objects in isolation. [] Key Method Our joint inference model can take advantage of contextual cues to make better predictions on objects and their relationships. The experiments show that our model significantly outperforms previous methods on the Visual Genome dataset as well as support relation inference in NYU Depth V2 dataset.
Figures and Tables from this paper
852 Citations
Iterative Scene Graph Generation with Generative Transformers
- Computer ScienceArXiv
- 2022
This work in-troduces a generative transformer-based approach to gen-erating scene graphs beyond link prediction, outperforming state-of-the-art SGG approaches while offering competitive performance to unbiased S GG approaches.
Visual Graphs from Motion (VGfM): Scene understanding with object geometry reasoning
- Computer ScienceACCV
- 2018
This paper proposes a system that first computes the geometrical location of objects in a generic scene and then efficiently constructs scene graphs from video by embedding such geometric reasoning in a new model where geometric and visual features are merged using an RNN framework.
Exploring and Exploiting the Hierarchical Structure of a Scene for Scene Graph Generation
- Computer Science2020 25th International Conference on Pattern Recognition (ICPR)
- 2021
A novel neural network model is used to construct a hierarchical structure whose leaf nodes correspond to objects depicted in the image, and a message is passed along the estimated structure on the fly to maintain global consistency.
Scenes and Surroundings: Scene Graph Generation using Relation Transformer
- Computer ScienceArXiv
- 2021
A novel local-context aware relation transformer architecture has been proposed which also exploits complex global object to object and object to edge interactions and efficiently captures dependencies between objects and predicts contextual relationships.
Scene Graph Generation Based on Node-Relation Context Module
- Computer ScienceICONIP
- 2018
A node-relation context module for scene graph generation that uses GRU hidden states of the nodes and the edges to guide the attention of subject and object regions and is competitive with the current methods on Visual Genome dataset.
Scene Graph Generation by Belief RNNs
- Computer Science
- 2017
A novel deep structure prediction module Belief RNNs is introduced that performs learning on a large graphs in a very efficient and generic way and presents a method in an end-to-end model that given an image generates a scene graph.
Grounding Scene Graphs on Natural Images via Visio-Lingual Message Passing
- Computer Science2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
- 2023
A framework for jointly grounding objects that follow certain semantic relationship constraints given in a scene graph, referred to as Visio-Lingual Message Passing Graph Neural Network (VL-MPAG Net), which significantly outperforms the baselines on four public datasets.
Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation
- Computer ScienceECCV
- 2018
A subgraph-based connection graph is proposed to concisely represent the scene graph during the inference to improve the efficiency of scene graph generation and outperforms the state-of-the-art method in both accuracy and speed.
Unconditional Scene Graph Generation
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
This work develops a deep auto-regressive model called SceneGraphGen which can directly learn the probability distribution over labelled and directed graphs using a hierarchical recurrent architecture and demonstrates the application of the generated graphs in image synthesis, anomaly detection and scene graph completion.
LinkNet: Relational Embedding for Scene Graph
- Computer ScienceNeurIPS
- 2018
This paper designs a simple and effective relational embedding module that enables the model to jointly represent connections among all related objects, rather than focus on an object in isolation, and proves its efficacy in scene graph generation.
43 References
Characterizing structural relationships in scenes using graph kernels
- Computer ScienceACM Trans. Graph.
- 2011
This paper shows how to represent scenes as graphs that encode models and their semantic relationships, and shows that incorporating structural relationships allows the method to provide a more relevant set of results when compared against previous approaches to model context search.
Semantic Object Parsing with Graph LSTM
- Computer ScienceECCV
- 2016
The Graph Long Short-Term Memory network is proposed, which is the generalization of LSTM from sequential data or multi-dimensional data to general graph-structured data.
Image retrieval using scene graphs
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
A conditional random field model that reasons about possible groundings of scene graphs to test images and shows that the full model can be used to improve object localization compared to baseline methods and outperforms retrieval methods that use only objects or low-level image features.
Graph-Structured Representations for Visual Question Answering
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
This paper proposes to build graphs over the scene objects and over the question words, and describes a deep neural network that exploits the structure in these representations, and achieves significant improvements over the state-of-the-art.
Learning Spatial Knowledge for Text to 3D Scene Generation
- Computer ScienceEMNLP
- 2014
The main innovation of this work is to show how to augment explicit constraints with learned spatial knowledge to infer missing objects and likely layouts for the objects in the scene.
Indoor Segmentation and Support Inference from RGBD Images
- Computer ScienceECCV
- 2012
The goal is to parse typical, often messy, indoor scenes into floor, walls, supporting surfaces, and object regions, and to recover support relationships, to better understand how 3D cues can best inform a structured 3D interpretation.
3D-Based Reasoning with Blocks, Support, and Stability
- Computer Science2013 IEEE Conference on Computer Vision and Pattern Recognition
- 2013
This work proposes a new approach for parsing RGB-D images using 3D block units for volumetric reasoning, and incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting arrangement of objects.
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
- Computer ScienceInternational Journal of Computer Vision
- 2016
The Visual Genome dataset is presented, which contains over 108K images where each image has an average of $$35$$35 objects, $$26$$26 attributes, and $$21$$21 pairwise relationships between objects, and represents the densest and largest dataset of image descriptions, objects, attributes, relationships, and question answer pairs.
Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
- Computer ScienceNIPS
- 2011
This paper considers fully connected CRF models defined on the complete set of pixels in an image and proposes a highly efficient approximate inference algorithm in which the pairwise edge potentials are defined by a linear combination of Gaussian kernels.