Corpus ID: 203593757

On Incorporating Semantic Prior Knowlegde in Deep Learning Through Embedding-Space Constraints

  title={On Incorporating Semantic Prior Knowlegde in Deep Learning Through Embedding-Space Constraints},
  author={Damien Teney and Ehsan Abbasnejad and Anton van den Hengel},
The knowledge that humans hold about a problem often extends far beyond a set of training data and output labels. [...] Key Method Existing methods to use these annotations, including auxiliary losses and data augmentation, cannot guarantee the strict inclusion of these relations into the model since they require a careful balancing against the end-to-end objective. Our method uses these relations to shape the embedding space of the model, and treats them as strict constraints on its learned representations.Expand
Learning from Lexical Perturbations for Consistent Visual Question Answering
A novel approach based on modular networks is proposed, which creates two questions related by linguistic perturbations and regularizes the visual reasoning process between them to be consistent during training, and shows that this framework markedly improves consistency and generalization ability. Expand
VQA With No Questions-Answers Training
  • B. Vatashsky, S. Ullman
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2020
This approach is able to handle novel domains (extended question types and new object classes, properties and relations) as long as corresponding visual estimators are available and can provide explanations to its answers and suggest alternatives when questions are not grounded in the image. Expand
Learning with Instance Bundles for Reading Comprehension
Drawing on ideas from contrastive estimation, several new supervision techniques are introduced that compare question-answer scores across multiple related instances, and normalize these scores across various neighborhoods of closely contrasting questions and/or answers. Expand
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision
This work proposes an auxiliary training objective that improves the generalization capabilities of neural networks by leveraging an overlooked supervisory signal found in existing datasets, a.k.a counterfactual examples, which provide a signal indicative of the underlying causal structure of the task. Expand
Unshuffling Data for Improved Generalization
This work describes a training procedure to capture the patterns that are stable across environments while discarding spurious ones, and demonstrates multiple use cases with the task of visual question answering, which is notorious for dataset biases. Expand


Visual Question Answering as a Meta Learning Task
This work adapts a state-of-the-art VQA model with two techniques from the recent meta learning literature, namely prototypical networks and meta networks, and produces qualitatively distinct results with higher recall of rare answers, and a better sample efficiency that allows training with little initial data. Expand
Zero-Shot Visual Question Answering
This work proposes and evaluates several strategies for achieving Zero-Shot VQA, including methods based on pretrained word embeddings, object classifiers with semantic embeddlings, and test-time retrieval of example images. Expand
Cycle-Consistency for Robust Visual Question Answering
A model-agnostic framework is proposed that trains a model to not only answer a question, but also generate a question conditioned on the answer, such that the answer predicted for the generated question is the same as the ground truth answer to the original question. Expand
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
We introduce GQA, a new dataset for real-world visual reasoning and compositional question answering, seeking to address key shortcomings of previous VQA datasets. We have developed a strong andExpand
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
This work balances the popular VQA dataset by collecting complementary images such that every question in this balanced dataset is associated with not just a single image, but rather a pair of similar images that result in two different answers to the question. Expand
Actively Seeking and Learning From Live Data
  • Damien Teney, A. V. Hengel
  • Computer Science
  • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
This work learns a set of base weights for a simple VQA model, that are specifically adapted to a given question with the information specifically retrieved for this question, and demonstrates the use of external non-VQA data using the MS COCO captioning dataset to support the answering process. Expand
Visual question answering: A survey of methods and datasets
The state of the art by comparing modern approaches to VQA, and the common approach of combining convolutional and recurrent neural networks to map images and questions to a common feature space are examined. Expand
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
GVQA explicitly disentangles the recognition of visual concepts present in the image from the identification of plausible answer space for a given question, enabling the model to more robustly generalize across different distributions of answers. Expand
Neural Module Networks
A procedure for constructing and learning neural module networks, which compose collections of jointly-trained neural "modules" into deep networks for question answering, and uses these structures to dynamically instantiate modular networks (with reusable components for recognizing dogs, classifying colors, etc.). Expand
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge
This work presents a massive exploration of the effects of the myriad architectural and hyperparameter choices that must be made in generating a state-of-the-art model and provides a detailed analysis of the impact of each choice on model performance. Expand