Corpus ID: 224819464

Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies

@article{Gat2020RemovingBI,
  title={Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies},
  author={Itai Gat and Idan Schwartz and Alexander G. Schwing and Tamir Hazan},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.10802}
}
Many recent datasets contain a variety of different data modalities, for instance, image, question, and answer data in visual question answering (VQA). When training deep net classifiers on those multi-modal datasets, the modalities get exploited at different scales, i.e., some modalities can more easily contribute to the classification results than others. This is suboptimal because the classifier is inherently biased towards a subset of the modalities. To alleviate this shortcoming, we… Expand

Figures and Tables from this paper

Perceptual Score: What Data Modalities Does Your Model Perceive?
TLDR
The perceptual score is introduced, a metric that assesses the degree to which a model relies on the different subsets of the input features, i.e., modalities, and finds a surprisingly consistent trend across four popular datasets: recent, more accurate state-of-the-art multi-modal models for visual question-answering or visual dialog tend to perceive the visual data less than their predecessors. Expand
Greedy Gradient Ensemble for Robust Visual Question Answering
TLDR
A new de-bias framework, Greedy Gradient Ensemble (GGE), which combines multiple biased models for unbiased base model learning and forces the biased models to over-fit the biased data distribution in priority, thus makes the base model pay more attention to examples that are hard to solve by biased models. Expand
Answer Questions with Right Image Regions: A Visual Attention Regularization Approach
TLDR
A novel visual attention regularization approach, namely AttReg, which can be integrated into most visual attention-based VQA models and require no human attention supervision, and empirically validate such property of visual attention and compare it with the prevalent gradient-based approaches. Expand
Language bias in Visual Question Answering: A Survey and Taxonomy
  • Desen Yuan
  • Computer Science
  • ArXiv
  • 2021
TLDR
A comprehensive review and analysis of visual question answering for the first time is conducted, and the existing methods are classified according to three categories, including enhancing visual information, weakening language priors, data enhancement and training strategies. Expand
A Review on Explainability in Multimodal Deep Neural Nets
TLDR
This paper extensively reviews the present literature to present a comprehensive survey and commentary on the explainability in multimodal deep neural nets, especially for the vision and language tasks. Expand
A Review on Explainability in Multimodal Deep Neural Nets
TLDR
This paper extensively reviews the present literature to present a comprehensive survey and commentary on the explainability in multimodal deep neural nets, especially for the vision and language tasks. Expand
X-GGM: Graph Generative Modeling for Out-of-distribution Generalization in Visual Question Answering
TLDR
This paper forms OOD generalization in VQA as a compositional generalization problem and proposes a graph generative modeling-based training scheme (X-GGM) to handle the problem implicitly, to alleviate the unstable training issue in graphGenerative modeling. Expand
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
TLDR
It is found that many of the “unknowns” to the learned VQA model are indeed “known” in the dataset implicitly, and a simple data augmentation pipeline SIMPLEAUG is presented to turn this “ known” knowledge into training examples for V QA. Expand
Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions
TLDR
A new robustness measure, Robustness to Augmented Data (RAD), is proposed, which measures the consistency of model predictions between original and augmented examples, and can quantify when stateof-the-art systems are not robust to counterfactuals. Expand
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
TLDR
It is demonstrated that even state-of-the-art models perform poorly and that existing techniques to reduce biases are largely ineffective in this context. Expand
...
1
2
...

References

SHOWING 1-10 OF 56 REFERENCES
What Makes Training Multi-Modal Classification Networks Hard?
TLDR
This paper identifies two main causes for this performance drop: first, multi-modal networks are often prone to overfitting due to increased capacity and second, different modalities overfit and generalize at different rates, so training them jointly with a single optimization strategy is sub-optimal. Expand
Adversarial Regularization for Visual Question Answering: Strengths, Shortcomings, and Side Effects
TLDR
The results suggest that AdvReg requires further refinement before it can be considered a viable bias mitigation technique for VQA, and it is demonstrated that gradual introduction of regularization during training helps to alleviate, but not completely solve, these issues. Expand
Bilinear Attention Networks
TLDR
BAN is proposed that find bilinear attention distributions to utilize given vision-language information seamlessly and quantitatively and qualitatively evaluates the model on visual question answering and Flickr30k Entities datasets, showing that BAN significantly outperforms previous methods and achieves new state-of-the-arts on both datasets. Expand
RUBi: Reducing Unimodal Biases in Visual Question Answering
TLDR
RUBi, a new learning strategy to reduce biases in any VQA model, is proposed, which reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image. Expand
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
TLDR
This work balances the popular VQA dataset by collecting complementary images such that every question in this balanced dataset is associated with not just a single image, but rather a pair of similar images that result in two different answers to the question. Expand
Overcoming Language Priors in Visual Question Answering with Adversarial Regularization
TLDR
This work introduces a question-only model that takes as input the question encoding from the VQA model and must leverage language biases in order to succeed, and poses training as an adversarial game between this model and this question- only adversary -- discouraging the V QA model from capturing language bias in its question encoding. Expand
REPAIR: Removing Representation Bias by Dataset Resampling
  • Y. Li, N. Vasconcelos
  • Computer Science
  • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
TLDR
Experiments with synthetic and action recognition data show that dataset REPAIR can significantly reduce representation bias, and lead to improved generalization of models trained on REPAired datasets. Expand
Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases
TLDR
This paper trains a naive model that makes predictions exclusively based on dataset biases, and a robust model as part of an ensemble with the naive one in order to encourage it to focus on other patterns in the data that are more likely to generalize. Expand
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
TLDR
This work extensively evaluates Multimodal Compact Bilinear pooling (MCB) on the visual question answering and grounding tasks and consistently shows the benefit of MCB over ablations without MCB. Expand
A negative case analysis of visual grounding methods for VQA
TLDR
It is found that it is not actually necessary to provide proper, human-based cues; random, insensible cues also result in similar improvements, and a simpler regularization scheme is proposed that achieves near state-of-the-art performance on VQA-CPv2. Expand
...
1
2
3
4
5
...