CLOSURE: Assessing Systematic Generalization of CLEVR Models
@article{Bahdanau2019CLOSUREAS, title={CLOSURE: Assessing Systematic Generalization of CLEVR Models}, author={Dzmitry Bahdanau and Harm de Vries and Timothy J. O'Donnell and Shikhar Murty and Philippe Beaudoin and Yoshua Bengio and Aaron C. Courville}, journal={ArXiv}, year={2019}, volume={abs/1912.05783} }
The CLEVR dataset of natural-looking questions about 3D-rendered scenes has recently received much attention from the research community. A number of models have been proposed for this task, many of which achieved very high accuracies of around 97-99%. In this work, we study how systematic the generalization of such models is, that is to which extent they are capable of handling novel combinations of known linguistic constructs. To this end, we test models' understanding of referring…
62 Citations
Multimodal Graph Networks for Compositional Generalization in Visual Question Answering
- Computer ScienceNeurIPS
- 2020
This model first creates a multimodal graph, processes it with a graph neural network to induce a factor correspondence matrix, and then outputs a symbolic program to predict answers to questions.
Latent Compositional Representations Improve Systematic Generalization in Grounded Question Answering
- Computer ScienceTransactions of the Association for Computational Linguistics
- 2021
This work proposes a model that computes a representation and denotation for all question spans in a bottom-up, compositional manner using a CKY-style parser, and shows that this inductive bias towards tree structures dramatically improves systematic generalization to out-of- distribution examples.
Structurally Diverse Sampling Reduces Spurious Correlations in Semantic Parsing Datasets
- Computer ScienceArXiv
- 2022
This work proposes a novel algorithm for sampling a structurally diverse set of instances from a labeled instance pool with structured outputs that leads to better generalization and uses information theory to show that reduction in spurious correlations between substructures may be one reason why diverse training sets improve generalization.
A causal view of compositional zero-shot recognition
- Computer ScienceNeurIPS
- 2020
A causal-inspired embedding model that learns disentangled representations of elementary components of visual objects from correlated (confounded) training data is presented, and improvements compared to strong baselines are shown.
Unobserved Local Structures Make Compositional Generalization Hard
- Computer ScienceEMNLP
- 2022
This work investigates the factors that make generalization to certain test instances challenging and proposes a criterion for the difficulty of an example: a test instance is hard if it contains a local structure that was not observed at training time.
But Should VQA expect Them To ?
- Computer Science
The GQAOOD benchmark is proposed, which is a benchmark designed to overcome concerns over accuracy over both rare and frequent question-answer pairs, and it is argued that the former is better suited to the evaluation of reasoning abilities.
A Benchmark for Systematic Generalization in Grounded Language Understanding
- Computer ScienceNeurIPS
- 2020
A new benchmark, gSCAN, is introduced for evaluating compositional generalization in models of situated language understanding, taking inspiration from standard models of meaning composition in formal linguistics and defining a language grounded in the states of a grid world.
Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding
- Computer ScienceArXiv
- 2023
It is argued that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization and proposed a variational cross-graph reasoning framework that explicitly decomposes video and language into hierarchical semantic graphs, respectively, and learns semantic correspondence between the two graphs.
Improving Compositional Generalization in Semantic Parsing
- Computer ScienceFINDINGS
- 2020
This work analyzes a wide variety of models and proposes multiple extensions to the attention module of the semantic parser, aiming to improve compositional generalization in semantic parsing, as output programs are constructed from sub-components.
ReaSCAN: Compositional Reasoning in Language Grounding
- Computer ScienceNeurIPS Datasets and Benchmarks
- 2021
This work proposes ReaSCAN, a benchmark dataset that builds off gSCAN but requires compositional language interpretation and reasoning about entities and relations, and assesses two models on Rea SCAN: a multi-modal baseline and a state-of-the-art graph convolutional neural model.
References
SHOWING 1-10 OF 33 REFERENCES
Systematic Generalization: What Is Required and Can It Be Learned?
- Computer ScienceICLR
- 2019
The findings show that the generalization of modular models is much more systematic and that it is highly sensitive to the module layout, i.e. to how exactly the modules are connected, whereas systematic generalization in language understanding may require explicit regularizers or priors.
Measuring Compositional Generalization: A Comprehensive Method on Realistic Data
- Computer ScienceICLR
- 2020
A novel method to systematically construct compositional generalization benchmarks by maximizing compound divergence while guaranteeing a small atom divergence between train and test sets is introduced, and it is demonstrated how this method can be used to create new compositionality benchmarks on top of the existing SCAN dataset.
Analyzing the Behavior of Visual Question Answering Models
- Computer ScienceEMNLP
- 2016
Today's VQA models are "myopic" (tend to fail on sufficiently novel instances), often "jump to conclusions" (converge on a predicted answer after 'listening' to just half the question), and are "stubborn" (do not change their answers across images).
C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset
- Computer ScienceArXiv
- 2017
This paper proposes a new setting for Visual Question Answering where the test question-answer pairs are compositionally novel compared to training question- answer pairs, and presents a new compositional split of the VQA v1.0 dataset, which it is called Compositional VZA (C-VQA).
Learning Visual Reasoning Without Strong Priors
- Computer ScienceICML 2017
- 2017
This work shows that a general-purpose, Conditional Batch Normalization approach achieves state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4% error rate, and probes the model to shed light on how it reasons, showing it has learned a question-dependent, multi-step process.
Rearranging the Familiar: Testing Compositional Generalization in Recurrent Networks
- Computer ScienceBlackboxNLP@EMNLP
- 2018
Systematic compositionality is the ability to recombine meaningful units with regular and predictable outcomes, and it’s seen as key to the human capacity for generalization in language. Recent work…
Learning to Reason: End-to-End Module Networks for Visual Question Answering
- Computer Science2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
End-to-End Module Networks are proposed, which learn to reason by directly predicting instance-specific network layouts without the aid of a parser, and achieve an error reduction of nearly 50% relative to state-of-theart attentional approaches.
Neural Module Networks
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
A procedure for constructing and learning neural module networks, which compose collections of jointly-trained neural "modules" into deep networks for question answering, and uses these structures to dynamically instantiate modular networks (with reusable components for recognizing dogs, classifying colors, etc.).
Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks
- Computer ScienceICML
- 2018
This paper introduces the SCAN domain, consisting of a set of simple compositional navigation commands paired with the corresponding action sequences, and tests the zero-shot generalization capabilities of a variety of recurrent neural networks trained on SCAN with sequence-to-sequence methods.
Compositional Attention Networks for Machine Reasoning
- Computer ScienceICLR
- 2018
The MAC network is presented, a novel fully differentiable neural network architecture, designed to facilitate explicit and expressive reasoning that is computationally-efficient and data-efficient, in particular requiring 5x less data than existing models to achieve strong results.