Generative Compositional Augmentations for Scene Graph Prediction

  title={Generative Compositional Augmentations for Scene Graph Prediction},
  author={Boris Knyazev and Harm de Vries and Cătălina Cangea and Graham W. Taylor and Aaron C. Courville and Eugene Belilovsky},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
Inferring objects and their relationships from an image in the form of a scene graph is useful in many applications at the intersection of vision and language. We consider a challenging problem of compositional generalization that emerges in this task due to a long tail data distribution. Current scene graph generation models are trained on a tiny fraction of the distribution corresponding to the most frequent compositions, e.g. . However, test images might contain zero- and few-shot… 
Context-aware Scene Graph Generation with Seq2Seq Transformers
This work proposes an encoder-decoder model built using Transformers where the encoder captures global context and long range interactions, and introduces a novel reinforcement learning-based training strategy tailored to Seq2Seq scene graph generation.
SGTR: End-to-end Scene Graph Generation with Transformer
A transformer-based end-to-end framework that first generates the entity and predicate proposal set, followed by inferring directed edges to form the relation triplets, and a new entity-aware predicate representation based on a structural predicate generator that leverages the compositional property of relationships.


Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation
A density-normalized edge loss is introduced, which provides more than a two-fold improvement in certain generalization metrics in scene graph generation, and highlights the difficulty of accurately evaluating models using existing metrics, especially on zero/few shots, and introduces a novel weighted metric.
Generative Scene Graph Networks
Generative Scene Graph Networks are proposed, the first deep generative model that learns to discover the primitive parts and infer the part-whole relationship jointly from multi-object scenes without supervision and in an end-to-end trainable way.
Visual Relationships as Functions:Enabling Few-Shot Scene Graph Prediction
This work introduces the first scene graph prediction model that supports few-shot learning of predicates, enabling scene graph approaches to generalize to a set of new predicates.
Learning Visual Commonsense for Robust Scene Graph Generation
This work proposes the first method to acquire visual commonsense such as affordance and intuitive physics automatically from data, and uses that to improve the robustness of scene understanding.
Scene Graph Prediction With Limited Labels
This paper introduces a semi-supervised method that assigns probabilistic relationship labels to a large number of unlabeled images using few labeled examples and defines a complexity metric for relationships that serves as an indicator for conditions under which the method succeeds over transfer learning, the de-facto approach for training with limited labels.
Energy-Based Learning for Scene Graph Generation
This work introduces a novel energy-based learning framework for generating scene graphs that allows for efficiently incorporating the structure of scene graphs in the output space and showcases the learning efficiency of the proposed framework by demonstrating superior performance in the zero- and few-shot settings where data is scarce.
Scene Graph Generation With External Knowledge and Image Reconstruction
This paper proposes a novel scene graph generation algorithm with external knowledge and image reconstruction loss to overcome dataset issues, and extracts commonsense knowledge from the external knowledge base to refine object and phrase features for improving generalizability inscene graph generation.
Knowledge-Embedded Routing Network for Scene Graph Generation
This work finds that the statistical correlations between object pairs and their relationships can effectively regularize semantic space and make prediction less ambiguous, and thus well address the unbalanced distribution issue.
PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation
A novel Predicate-Correlation Perception Learning scheme to adaptively seek out appropriate loss weights by directly perceiving and utilizing the correlation among predicate classes is proposed, which significantly outperforms previous state-of-the-art methods.
Differentiable Scene Graphs
Differentiable Scene Graphs (DSGs) are proposed, an image representation that is amenable to differentiable end-to-end optimization, and requires supervision only from the downstream tasks.