Corpus ID: 3607155

Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction

@inproceedings{Herzig2018MappingIT,
  title={Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction},
  author={Roei Herzig and Moshiko Raboh and Gal Chechik and Jonathan Berant and A. Globerson},
  booktitle={NeurIPS},
  year={2018}
}
Structured prediction is concerned with predicting multiple inter-dependent labels simultaneously. [...] Key Method We take an axiomatic perspective to derive the desired properties and invariances of a such network to certain input permutations, presenting a structural characterization that is provably both necessary and sufficient. We then discuss graph-permutation invariant (GPI) architectures that satisfy this characterization and explain how they can be used for deep structured prediction. We evaluate our…Expand
Attentive Relational Networks for Mapping Images to Scene Graphs
TLDR
A novel Attentive Relational Network that consists of two key modules with an object detection backbone to approach this problem, and accurate scene graphs are produced by the relation inference module to recognize all entities and corresponding relations. Expand
HOSE-Net: Higher Order Structure Embedded Network for Scene Graph Generation
TLDR
A novel structure-aware embedding-to-classifier(SEC) module to incorporate both local and global structural information of relationships into the output space and a hierarchical semantic aggregation module to reduces the number of subspaces by introducing higher order structural information. Expand
Visual Relationships as Functions:Enabling Few-Shot Scene Graph Prediction
TLDR
This work introduces the first scene graph prediction model that supports few-shot learning of predicates, enabling scene graph approaches to generalize to a set of new predicates. Expand
Attention-Translation-Relation Network for Scalable Scene Graph Generation
TLDR
A three-stage pipeline that employs Multi-Head Attention driven by language and spatial features, Translation Embeddings and Multi-Tasking to detect an interacting pair of objects is proposed, which is able to maximize the visual features' interpretability and capture the nature of datasets of different scales. Expand
Using Scene Graph Context to Improve Image Generation
TLDR
This paper introduces a scene graph context network that pools features generated by a graph convolutional neural network that are then provided to both the image generation network and the adversarial loss and defines two novel evaluation metrics, the relation score and the mean opinion relation score, for this task that directly evaluate scene graph compliance. Expand
Triplet-Aware Scene Graph Embeddings
TLDR
A significant performance increase in both metrics that measure the goodness of layout prediction, mean intersection-over-union (mIoU) and relation score is seen, after the addition of triplet supervision and data augmentation. Expand
Image-Graph-Image Translation via Auto-Encoding
TLDR
This work presents the first convolutional neural network that learns an image-to-graph translation task without needing external supervision, and is the first to present a self-supervised approach based on a fully-differentiable auto-encoder in which the bottleneck encodes the graph's nodes and edges. Expand
Learning Latent Scene-Graph Representations for Referring Relationships
TLDR
This work describes a family of models that uses scene-graph like representations, and uses them in downstream tasks, and shows how these representations can be trained from partial supervision. Expand
Scene Graph to Image Generation with Contextualized Object Layout Refinement
TLDR
This work proposes a method that alleviates generated images with high inter-object overlap, empty areas, blurry objects, and overall compromised quality by generating all object layouts together and reducing the reliance on supervised learning. Expand
Attentive Gated Graph Neural Network for Image Scene Graph Generation
TLDR
This work translates the scene graph into an Attentive Gated Graph Neural Network which can propagate a message by visual relationship embedding and can increase the accuracy of object classification and reduce the complexity of relationship classification. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 39 REFERENCES
Neural Motifs: Scene Graph Parsing with Global Context
TLDR
This work analyzes the role of motifs: regularly appearing substructures in scene graphs and introduces Stacked Motif Networks, a new architecture designed to capture higher order motifs in scene graph graphs that improves on the previous state-of-the-art by an average of 3.6% relative improvement across evaluation settings. Expand
Image Generation from Scene Graphs
TLDR
This work proposes a method for generating images from scene graphs, enabling explicitly reasoning about objects and their relationships, and validates this approach on Visual Genome and COCO-Stuff. Expand
Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs
TLDR
The DVN framework achieves the state-of-the-art results on multi-label prediction and image segmentation benchmarks. Expand
Pixels to Graphs by Associative Embedding
TLDR
A method for training a convolutional neural network such that it takes in an input image and produces a full graph definition and is done end-to-end in a single stage with the use of associative embeddings. Expand
Discovering objects and their relations from entangled scene representations
TLDR
It is shown that RNs are capable of learning object relations from scene description data and can act as a bottleneck that induces the factorization of objects from entangled scene description inputs, and from distributed deep representations of scene images provided by a variational autoencoder. Expand
Deeply Learning the Messages in Message Passing Inference
TLDR
A new, efficient deep structured model learning scheme, in which deep Convolutional Neural Networks can be used to directly estimate the messages in message passing inference for structured prediction with Conditional Random Fields (CRFs). Expand
Scene Graph Generation by Iterative Message Passing
TLDR
This work explicitly model the objects and their relationships using scene graphs, a visually-grounded graphical structure of an image, and proposes a novel end-to-end model that generates such structured scene representation from an input image. Expand
On Support Relations and Semantic Scene Graphs
TLDR
This paper proposes a novel framework for automatic generation of semantic scene graphs which interpret indoor environments using a Convolutional Neural Network to detect objects of interest and a semantic scene graph describing the contextual relations within a cluttered indoor scene is constructed. Expand
Fully Convolutional Networks for Semantic Segmentation
TLDR
It is shown that convolutional networks by themselves, trained end- to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation. Expand
Conditional Random Fields as Recurrent Neural Networks
TLDR
A new form of convolutional neural network that combines the strengths of Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs)-based probabilistic graphical modelling is introduced, and top results are obtained on the challenging Pascal VOC 2012 segmentation benchmark. Expand
...
1
2
3
4
...