GEMS: Scene Expansion using Generative Models of Graphs

  title={GEMS: Scene Expansion using Generative Models of Graphs},
  author={Rishi G. Agarwal and Tirupati Saketh Chandra and Vaidehi Patil and Aniruddha Mahapatra and K. Kulkarni and Vishwa Vinay},
  journal={2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
Applications based on image retrieval require editing and associating in intermediate spaces that are representative of the high-level concepts like objects and their relationships rather than dense, pixel-level representations like RGB images or semantic-label maps. We focus on one such representation, scene graphs, and propose a novel scene expansion task where we enrich an input seed graph by adding new nodes (objects) and the corresponding relationships. To this end, we formulate scene… 



Unconditional Scene Graph Generation

This work develops a deep auto-regressive model called SceneGraphGen which can directly learn the probability distribution over labelled and directed graphs using a hierarchical recurrent architecture and demonstrates the application of the generated graphs in image synthesis, anomaly detection and scene graph completion.

Scene Graph Generation With External Knowledge and Image Reconstruction

This paper proposes a novel scene graph generation algorithm with external knowledge and image reconstruction loss to overcome dataset issues, and extracts commonsense knowledge from the external knowledge base to refine object and phrase features for improving generalizability inscene graph generation.

VarScene: A Deep Generative Model for Realistic Scene Graph Synthesis

V AR S CENE is proposed, a variational autoencoder for scene graphs, which is optimized for the maximum mean discrepancy (MMD) between the ground truth scene graph distribution and distribution of the generated scene graphs.

Energy-Based Learning for Scene Graph Generation

This work introduces a novel energy-based learning framework for generating scene graphs that allows for efficiently incorporating the structure of scene graphs in the output space and showcases the learning efficiency of the proposed framework by demonstrating superior performance in the zero- and few-shot settings where data is scarce.

Scene Graph Generation from Objects, Phrases and Region Captions

This work proposes a novel neural network model, termed as Multi-level Scene Description Network (denoted as MSDN), to solve the three vision tasks jointly in an end-to-end manner and shows the joint learning across three tasks with the proposed method can bring mutual improvements over previous models.

Visual Relationships as Functions:Enabling Few-Shot Scene Graph Prediction

This work introduces the first scene graph prediction model that supports few-shot learning of predicates, enabling scene graph approaches to generalize to a set of new predicates.

Context-aware Scene Graph Generation with Seq2Seq Transformers

This work proposes an encoder-decoder model built using Transformers where the encoder captures global context and long range interactions, and introduces a novel reinforcement learning-based training strategy tailored to Seq2Seq scene graph generation.

Neural Motifs: Scene Graph Parsing with Global Context

This work analyzes the role of motifs: regularly appearing substructures in scene graphs and introduces Stacked Motif Networks, a new architecture designed to capture higher order motifs in scene graph graphs that improves on the previous state-of-the-art by an average of 3.6% relative improvement across evaluation settings.

Bridging Knowledge Graphs to Generate Scene Graphs

This paper proposes a novel graph-based neural network that iteratively propagates information between the two graphs, as well as within each of them, while gradually refining their bridge in each iteration, resulting in a new state of the art graph Bridging Network.

Scene Graph Generation by Iterative Message Passing

This work explicitly model the objects and their relationships using scene graphs, a visually-grounded graphical structure of an image, and proposes a novel end-to-end model that generates such structured scene representation from an input image.