CLOSURE: Assessing Systematic Generalization of CLEVR Models
@article{Bahdanau2019CLOSUREAS, title={CLOSURE: Assessing Systematic Generalization of CLEVR Models}, author={Dzmitry Bahdanau and Harm de Vries and Timothy J. O'Donnell and Shikhar Murty and Philippe Beaudoin and Yoshua Bengio and Aaron C. Courville}, journal={ArXiv}, year={2019}, volume={abs/1912.05783} }
The CLEVR dataset of natural-looking questions about 3D-rendered scenes has recently received much attention from the research community. A number of models have been proposed for this task, many of which achieved very high accuracies of around 97-99%. In this work, we study how systematic the generalization of such models is, that is to which extent they are capable of handling novel combinations of known linguistic constructs. To this end, we test models' understanding of referring…
63 Citations
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
This paper proposes a visual capsule module with a query-based selection mechanism of capsule features, that allows the model to focus on relevant regions based on the textual cues about visual information in the question and shows that integrating the proposed capsule module in existing VQA systems significantly improves their performance on the weakly supervised grounding task.
CURI: A Benchmark for Productive Concept Learning Under Uncertainty
- Computer ScienceICML
- 2021
A new few-shot, meta-learning benchmark, Compositional Reasoning Under Uncertainty (CURI), which defines a model-independent "compositionality gap" to evaluate the difficulty of generalizing out-of-distribution along each of these axes.
Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning
- Computer ScienceArXiv
- 2022
A virtual benchmark, Super-CLEVR, where different factors in VQA domain shifts can be isolated in order that their effects can be studied independently, and suggests that disentangling reasoning and perception, combined with probabilistic uncertainty, form a strong V QA model that is more robust to domain shifts.
Multimodal Graph Networks for Compositional Generalization in Visual Question Answering
- Computer ScienceNeurIPS
- 2020
This model first creates a multimodal graph, processes it with a graph neural network to induce a factor correspondence matrix, and then outputs a symbolic program to predict answers to questions.
Latent Compositional Representations Improve Systematic Generalization in Grounded Question Answering
- Computer ScienceTransactions of the Association for Computational Linguistics
- 2021
This work proposes a model that computes a representation and denotation for all question spans in a bottom-up, compositional manner using a CKY-style parser, and shows that this inductive bias towards tree structures dramatically improves systematic generalization to out-of- distribution examples.
Structurally Diverse Sampling Reduces Spurious Correlations in Semantic Parsing Datasets
- Computer ScienceArXiv
- 2022
This work proposes a novel algorithm for sampling a structurally diverse set of instances from a labeled instance pool with structured outputs that leads to better generalization and uses information theory to show that reduction in spurious correlations between substructures may be one reason why diverse training sets improve generalization.
A causal view of compositional zero-shot recognition
- Computer ScienceNeurIPS
- 2020
A causal-inspired embedding model that learns disentangled representations of elementary components of visual objects from correlated (confounded) training data is presented, and improvements compared to strong baselines are shown.
But Should VQA expect Them To ?
- Computer Science
The GQAOOD benchmark is proposed, which is a benchmark designed to overcome concerns over accuracy over both rare and frequent question-answer pairs, and it is argued that the former is better suited to the evaluation of reasoning abilities.
A Benchmark for Systematic Generalization in Grounded Language Understanding
- Computer ScienceNeurIPS
- 2020
A new benchmark, gSCAN, is introduced for evaluating compositional generalization in models of situated language understanding, taking inspiration from standard models of meaning composition in formal linguistics and defining a language grounded in the states of a grid world.
Improving Compositional Generalization in Semantic Parsing
- Computer ScienceFINDINGS
- 2020
This work analyzes a wide variety of models and proposes multiple extensions to the attention module of the semantic parser, aiming to improve compositional generalization in semantic parsing, as output programs are constructed from sub-components.
References
SHOWING 1-10 OF 33 REFERENCES
Systematic Generalization: What Is Required and Can It Be Learned?
- Computer ScienceICLR
- 2019
The findings show that the generalization of modular models is much more systematic and that it is highly sensitive to the module layout, i.e. to how exactly the modules are connected, whereas systematic generalization in language understanding may require explicit regularizers or priors.
Measuring Compositional Generalization: A Comprehensive Method on Realistic Data
- Computer ScienceICLR
- 2020
A novel method to systematically construct compositional generalization benchmarks by maximizing compound divergence while guaranteeing a small atom divergence between train and test sets is introduced, and it is demonstrated how this method can be used to create new compositionality benchmarks on top of the existing SCAN dataset.
Analyzing the Behavior of Visual Question Answering Models
- Computer ScienceEMNLP
- 2016
Today's VQA models are "myopic" (tend to fail on sufficiently novel instances), often "jump to conclusions" (converge on a predicted answer after 'listening' to just half the question), and are "stubborn" (do not change their answers across images).
C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset
- Computer ScienceArXiv
- 2017
This paper proposes a new setting for Visual Question Answering where the test question-answer pairs are compositionally novel compared to training question- answer pairs, and presents a new compositional split of the VQA v1.0 dataset, which it is called Compositional VZA (C-VQA).
Learning Visual Reasoning Without Strong Priors
- Computer ScienceICML 2017
- 2017
This work shows that a general-purpose, Conditional Batch Normalization approach achieves state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4% error rate, and probes the model to shed light on how it reasons, showing it has learned a question-dependent, multi-step process.
Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks
- Computer ScienceBlackboxNLP@EMNLP
- 2018
Systematic compositionality is the ability to recombine meaningful units with regular and predictable outcomes, and it’s seen as key to the human capacity for generalization in language. Recent work…
Learning to Reason: End-to-End Module Networks for Visual Question Answering
- Computer Science2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
End-to-End Module Networks are proposed, which learn to reason by directly predicting instance-specific network layouts without the aid of a parser, and achieve an error reduction of nearly 50% relative to state-of-theart attentional approaches.
ShapeWorld - A new test methodology for multimodal language understanding
- Computer ScienceArXiv
- 2017
We introduce a novel framework for evaluating multimodal deep learning models with respect to their language understanding and generalization abilities. In this approach, artificial data is…
Neural Module Networks
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
A procedure for constructing and learning neural module networks, which compose collections of jointly-trained neural "modules" into deep networks for question answering, and uses these structures to dynamically instantiate modular networks (with reusable components for recognizing dogs, classifying colors, etc.).
Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks
- Computer ScienceICML
- 2018
This paper introduces the SCAN domain, consisting of a set of simple compositional navigation commands paired with the corresponding action sequences, and tests the zero-shot generalization capabilities of a variety of recurrent neural networks trained on SCAN with sequence-to-sequence methods.