Neural Motifs: Scene Graph Parsing with Global Context

@article{Zellers2018NeuralMS,
  title={Neural Motifs: Scene Graph Parsing with Global Context},
  author={Rowan Zellers and Mark Yatskar and Sam Thomson and Yejin Choi},
  journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2018},
  pages={5831-5840}
}
We investigate the problem of producing structured graph representations of visual scenes. [] Key Method Our analysis motivates a new baseline: given object detections, predict the most frequent relation between object pairs with the given labels, as seen in the training set. This baseline improves on the previous state-of-the-art by an average of 3.6% relative improvement across evaluation settings.

Figures and Tables from this paper

Structured Neural Motifs: Scene Graph Parsing via Enhanced Context
TLDR
This work proposes Structured Motif Network (StrcMN) which predicts object labels and pairwise relationships by mining more complete global context features and significantly outperforms previous methods on the VRD and Visual Genome datasets.
Visual Relationships as Functions:Enabling Few-Shot Scene Graph Prediction
TLDR
This work introduces the first scene graph prediction model that supports few-shot learning of predicates, enabling scene graph approaches to generalize to a set of new predicates.
Scene Graph Prediction With Limited Labels
TLDR
This paper introduces a semi-supervised method that assigns probabilistic relationship labels to a large number of unlabeled images using few labeled examples and defines a complexity metric for relationships that serves as an indicator for conditions under which the method succeeds over transfer learning, the de-facto approach for training with limited labels.
Scene Graph Prediction with Limited Labels
TLDR
This paper introduces a semi-supervised method that assigns probabilistic relationship labels to a large number of unlabeled images using few labeled examples and defines a complexity metric for relationships that serves as an indicator for conditions under which the method succeeds over transfer learning, the de-facto approach for training with limited labels.
Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction
TLDR
This paper takes an axiomatic perspective to derive the desired properties and invariances of a such network to certain input permutations, presenting a structural characterization that is provably both necessary and sufficient.
Attentive Relational Networks for Mapping Images to Scene Graphs
TLDR
A novel Attentive Relational Network that consists of two key modules with an object detection backbone to approach this problem, and accurate scene graphs are produced by the relation inference module to recognize all entities and corresponding relations.
Learning Latent Scene-Graph Representations for Referring Relationships
TLDR
This work describes a family of models that uses scene-graph like representations, and uses them in downstream tasks, and shows how these representations can be trained from partial supervision.
Memory-Based Network for Scene Graph with Unbalanced Relations
TLDR
This work proposes a novel scene graph generation model that can effectively improve the detection of low-frequency relations and uses the method of memory features to realize the transfer of high-frequency relation features to low- frequencies.
LinkNet: Relational Embedding for Scene Graph
TLDR
This paper designs a simple and effective relational embedding module that enables the model to jointly represent connections among all related objects, rather than focus on an object in isolation, and proves its efficacy in scene graph generation.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 63 REFERENCES
Scene Graph Generation from Objects, Phrases and Region Captions
TLDR
This work proposes a novel neural network model, termed as Multi-level Scene Description Network (denoted as MSDN), to solve the three vision tasks jointly in an end-to-end manner and shows the joint learning across three tasks with the proposed method can bring mutual improvements over previous models.
Scene Graph Generation by Iterative Message Passing
TLDR
This work explicitly model the objects and their relationships using scene graphs, a visually-grounded graphical structure of an image, and proposes a novel end-to-end model that generates such structured scene representation from an input image.
Detecting Visual Relationships with Deep Relational Networks
TLDR
The proposed Deep Relational Network is a novel formulation designed specifically for exploiting the statistical dependencies between objects and their relationships and achieves substantial improvement over state-of-the-art on two large data sets.
Graph-Structured Representations for Visual Question Answering
TLDR
This paper proposes to build graphs over the scene objects and over the question words, and describes a deep neural network that exploits the structure in these representations, and achieves significant improvements over the state-of-the-art.
Deep Variation-Structured Reinforcement Learning for Visual Relationship and Attribute Detection
TLDR
A deep Variation-structured Re-inforcement Learning (VRL) framework is proposed to sequentially discover object relationships and attributes in the whole image, and an ambiguity-aware object mining scheme is used to resolve semantic ambiguity among object categories that the object detector fails to distinguish.
Pixels to Graphs by Associative Embedding
TLDR
A method for training a convolutional neural network such that it takes in an input image and produces a full graph definition and is done end-to-end in a single stage with the use of associative embeddings.
Obj2Text: Generating Visually Descriptive Language from Object Layouts
TLDR
OBJ2TEXT is explored, a sequence-to-sequence model that encodes a set of objects and their locations as an input sequence using an LSTM network, and decodes this representation using anLSTM language model and shows that this model despite using a sequence encoder can effectively represent complex spatial object-object relationships.
Image retrieval using scene graphs
TLDR
A conditional random field model that reasons about possible groundings of scene graphs to test images and shows that the full model can be used to improve object localization compared to baseline methods and outperforms retrieval methods that use only objects or low-level image features.
Visual Relationship Detection with Language Priors
TLDR
This work proposes a model that can scale to predict thousands of types of relationships from a few examples and improves on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship.
Deep visual-semantic alignments for generating image descriptions
TLDR
A model that generates natural language descriptions of images and their regions based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding is presented.
...
1
2
3
4
5
...