Neural Motifs: Scene Graph Parsing with Global Context
@article{Zellers2017NeuralMS, title={Neural Motifs: Scene Graph Parsing with Global Context}, author={Rowan Zellers and Mark Yatskar and Sam Thomson and Yejin Choi}, journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year={2017}, pages={5831-5840} }
We investigate the problem of producing structured graph representations of visual scenes. [] Key Method Our analysis motivates a new baseline: given object detections, predict the most frequent relation between object pairs with the given labels, as seen in the training set. This baseline improves on the previous state-of-the-art by an average of 3.6% relative improvement across evaluation settings.
Figures and Tables from this paper
635 Citations
Structured Neural Motifs: Scene Graph Parsing via Enhanced Context
- Computer ScienceMMM
- 2020
This work proposes Structured Motif Network (StrcMN) which predicts object labels and pairwise relationships by mining more complete global context features and significantly outperforms previous methods on the VRD and Visual Genome datasets.
Visual Relationships as Functions:Enabling Few-Shot Scene Graph Prediction
- Computer Science2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)
- 2019
This work introduces the first scene graph prediction model that supports few-shot learning of predicates, enabling scene graph approaches to generalize to a set of new predicates.
Scene Graph Prediction With Limited Labels
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
This paper introduces a semi-supervised method that assigns probabilistic relationship labels to a large number of unlabeled images using few labeled examples and defines a complexity metric for relationships that serves as an indicator for conditions under which the method succeeds over transfer learning, the de-facto approach for training with limited labels.
Scene Graph Prediction with Limited Labels
- Computer Science2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)
- 2019
This paper introduces a semi-supervised method that assigns probabilistic relationship labels to a large number of unlabeled images using few labeled examples and defines a complexity metric for relationships that serves as an indicator for conditions under which the method succeeds over transfer learning, the de-facto approach for training with limited labels.
Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction
- Computer ScienceNeurIPS
- 2018
This paper takes an axiomatic perspective to derive the desired properties and invariances of a such network to certain input permutations, presenting a structural characterization that is provably both necessary and sufficient.
Attentive Relational Networks for Mapping Images to Scene Graphs
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
A novel Attentive Relational Network that consists of two key modules with an object detection backbone to approach this problem, and accurate scene graphs are produced by the relation inference module to recognize all entities and corresponding relations.
Addressing Class Imbalance in Scene Graph Parsing by Learning to Contrast and Score
- Computer ScienceACCV
- 2020
A new Contrasting Cross-Entropy loss is designed, which promotes the detection of rare relations by suppressing incorrect frequent ones, and a novel scoring module is proposed, termed as Scorer, which learns to rank the relations based on the image features and relation features to improve the recall of predictions.
A unified deep sparse graph attention network for scene graph generation
- Computer SciencePattern Recognit.
- 2022
Learning Latent Scene-Graph Representations for Referring Relationships
- Computer ScienceArXiv
- 2019
This work describes a family of models that uses scene-graph like representations, and uses them in downstream tasks, and shows how these representations can be trained from partial supervision.
Memory-Based Network for Scene Graph with Unbalanced Relations
- Computer ScienceACM Multimedia
- 2020
This work proposes a novel scene graph generation model that can effectively improve the detection of low-frequency relations and uses the method of memory features to realize the transfer of high-frequency relation features to low- frequencies.
References
SHOWING 1-10 OF 63 REFERENCES
Scene Graph Generation from Objects, Phrases and Region Captions
- Computer Science2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
This work proposes a novel neural network model, termed as Multi-level Scene Description Network (denoted as MSDN), to solve the three vision tasks jointly in an end-to-end manner and shows the joint learning across three tasks with the proposed method can bring mutual improvements over previous models.
Scene Graph Generation by Iterative Message Passing
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
This work explicitly model the objects and their relationships using scene graphs, a visually-grounded graphical structure of an image, and proposes a novel end-to-end model that generates such structured scene representation from an input image.
Detecting Visual Relationships with Deep Relational Networks
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
The proposed Deep Relational Network is a novel formulation designed specifically for exploiting the statistical dependencies between objects and their relationships and achieves substantial improvement over state-of-the-art on two large data sets.
Graph-Structured Representations for Visual Question Answering
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
This paper proposes to build graphs over the scene objects and over the question words, and describes a deep neural network that exploits the structure in these representations, and achieves significant improvements over the state-of-the-art.
Deep Variation-Structured Reinforcement Learning for Visual Relationship and Attribute Detection
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
A deep Variation-structured Re-inforcement Learning (VRL) framework is proposed to sequentially discover object relationships and attributes in the whole image, and an ambiguity-aware object mining scheme is used to resolve semantic ambiguity among object categories that the object detector fails to distinguish.
Deep Visual-Semantic Alignments for Generating Image Descriptions
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2017
A model that generates natural language descriptions of images and their regions based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding is presented.
Pixels to Graphs by Associative Embedding
- Computer ScienceNIPS
- 2017
A method for training a convolutional neural network such that it takes in an input image and produces a full graph definition and is done end-to-end in a single stage with the use of associative embeddings.
Obj2Text: Generating Visually Descriptive Language from Object Layouts
- Computer ScienceEMNLP
- 2017
OBJ2TEXT is explored, a sequence-to-sequence model that encodes a set of objects and their locations as an input sequence using an LSTM network, and decodes this representation using anLSTM language model and shows that this model despite using a sequence encoder can effectively represent complex spatial object-object relationships.
Image retrieval using scene graphs
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
A conditional random field model that reasons about possible groundings of scene graphs to test images and shows that the full model can be used to improve object localization compared to baseline methods and outperforms retrieval methods that use only objects or low-level image features.
Visual Relationship Detection with Language Priors
- Computer ScienceECCV
- 2016
This work proposes a model that can scale to predict thousands of types of relationships from a few examples and improves on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship.