X-GGM: Graph Generative Modeling for Out-of-distribution Generalization in Visual Question Answering

  title={X-GGM: Graph Generative Modeling for Out-of-distribution Generalization in Visual Question Answering},
  author={Jingjing Jiang and Zi-yi Liu and Yifan Liu and Zhixiong Nan and Nanning Zheng},
  journal={Proceedings of the 29th ACM International Conference on Multimedia},
Encouraging progress has been made towards Visual Question Answering (VQA) in recent years, but it is still challenging to enable VQA models to adaptively generalize to out-of-distribution (OOD) samples. Intuitively, recompositions of existing visual concepts (i.e., attributes and objects) can generate unseen compositions in the training set, which will promote VQA models to generalize to OOD samples. In this paper, we formulate OOD generalization in VQA as a compositional generalization… Expand


Adversarial Regularization for Visual Question Answering: Strengths, Shortcomings, and Side Effects
The results suggest that AdvReg requires further refinement before it can be considered a viable bias mitigation technique for VQA, and it is demonstrated that gradual introduction of regularization during training helps to alleviate, but not completely solve, these issues. Expand
Loss-rescaling VQA: Revisiting Language Prior Problem from a Class-imbalance View
A novel interpretation scheme whereby the loss of mis-predicted frequent and sparse answers of the same question type is distinctly exhibited during the late training phase, which explicitly reveals why the VQA model tends to produce a frequent yet obviously wrong answer, to a given question whose right answer is sparse in the training set. Expand
Overcoming Language Priors with Self-supervised Learning for Visual Question Answering
This paper first automatically generate labeled data to balance the biased data, and proposes a self-supervised auxiliary task to utilize the balanced data to assist the base VQA model to overcome language priors. Expand
Relation-Aware Graph Attention Network for Visual Question Answering
A Relation-aware Graph Attention Network (ReGAT), which encodes each image into a graph and models multi-type inter-object relations via a graph attention mechanism, to learn question-adaptive relation representations. Expand
GRAM: Scalable Generative Models for Graphs with Graph Attention Mechanism
This paper proposes GRAM, a generative model for graphs that is scalable in all three contexts, especially in training, and aims to achieve scalability by employing a novel graph attention mechanism, formulating the likelihood of graphs in a simple and general manner. Expand
Exploring Visual Relationship for Image Captioning
This paper introduces a new design to explore the connections between objects for image captioning under the umbrella of attention-based encoder-decoder framework that novelly integrates both semantic and spatial object relationships into image encoder. Expand
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
This work balances the popular VQA dataset by collecting complementary images such that every question in this balanced dataset is associated with not just a single image, but rather a pair of similar images that result in two different answers to the question. Expand
Learning to Contrast the Counterfactual Samples for Robust Visual Question Answering
This work introduces a novel self-supervised contrastive learning mechanism to learn the relationship between original samples, factual samples and counterfactual samples and evaluates the effectiveness by surpassing current state-of-the-art models on the VQA-CP dataset. Expand
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
GVQA explicitly disentangles the recognition of visual concepts present in the image from the identification of plausible answer space for a given question, enabling the model to more robustly generalize across different distributions of answers. Expand
House-GAN: Relational Generative Adversarial Networks for Graph-constrained House Layout Generation
A novel graph-constrained generative adversarial network, whose generator and discriminator are built upon relational architecture, to encode the constraint into the graph structure of its relational networks. Expand