• Corpus ID: 240354576

Introspective Distillation for Robust Question Answering

  title={Introspective Distillation for Robust Question Answering},
  author={Yulei Niu and Hanwang Zhang},
  booktitle={Neural Information Processing Systems},
Question answering (QA) models are well-known to exploit data bias, e.g., the language prior in visual QA and the position bias in reading comprehension. Recent debiasing methods achieve good out-of-distribution (OOD) generalizability with a considerable sacrifice of the in-distribution (ID) performance. Therefore, they are only applicable in domains where the test distribution is known in advance. In this paper, we present a novel debiasing method called Introspective Distillation (IntroD) to… 

Rethinking Data Augmentation for Robust Visual Question Answering

A model-agnostic DA strategy that can be seamlessly incorporated into any VQA architecture, and a knowledge distillation based answer assignment to generate pseudo answers for all composed image-question pairs, which are robust to both in-domain and out-of-distribution settings.

Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense

A Multimodal Evaluation (ME) pipeline is presented to automatically generate question-answer pairs to test models’ understanding of the visual scene, text, and related knowledge and shows that training with the ME data boosts model’s performance in standard VCR evaluation.

Respecting Transfer Gap in Knowledge Distillation

Inverse Probability Weighting Distillation (IPWD) is proposed that estimates the propensity score of a training sample belonging to the machine domain, and assigns its inverse amount to compensate for under-represented samples.

Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances

A novel training framework is proposed that explicitly encourages the VQA model to distinguish between the superficially similar instances and is forced to further focus on the other parts of the input beyond the question type, which helps to overcome the language priors.

NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation

This paper proposes a novel novel NICEST, which consists of two parts: NICE and NIST, which rule out these noisy label issues by generating high-quality samples and the effective training strategy, respectively, and proposes a new benchmark VG-OOD, which helps disentangle the subject-object category based frequency biases.

Prompt-aligned Gradient for Prompt Tuning

This work presents Prompt-aligned Gradient, dubbed ProGrad, to prevent prompt tuning from forgetting the the general knowledge learned from VLMs, and demonstrates the stronger few-shot generalization ability of ProGrad over state-of-the-art prompt tuning methods.

Interpolative Distillation for Unifying Biased and Debiased Recommendation

An Interpolative Distillation framework is proposed, which interpolates the biased and debiased models at user-item pair level by distilling a student model, which stands out on both tests and demonstrates remarkable gains on less popular items.

Causal Reasoning with Spatial-temporal Representation Learning: A Prospective Study

This paper conducts a comprehensive review of existing causal reasoning methods for spatial-temporal representation learning, covering fundamental theories, models, and datasets, and proposes some primary challenges, opportunities, and future research directions for benchmarking causal reasoning algorithms in spatial- temporal representation learning.

Causal Reasoning Meets Visual Representation Learning: A Prospective Study

This paper conducts a comprehensive review of existing causal reasoning methods for visual representation learning, covering fundamental theories, models, and datasets, and proposes some prospective challenges, opportunities, and future research directions for benchmarking causal reasoning algorithms inVisual representation learning.

Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification

Cross-Domain Empirical Risk Minimization (xERM) is proposed for training an unbiased test-agnostic model to achieve strong performances on both test distributions, which empirically demonstrates that xERM fundamentally improves the classification by learning better feature representation rather than the "head vs. tail" game.



MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

This work presents MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge.

RUBi: Reducing Unimodal Biases in Visual Question Answering

RUBi, a new learning strategy to reduce biases in any VQA model, is proposed, which reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image.

Counterfactual VQA: A Cause-Effect Look at Language Bias

A novel counterfactual inference framework is proposed, which enables the language bias to be captured as the direct causal effect of questions on answers and reduced by subtracting the direct language effect from the total causal effect.

Look at the First Sentence: Position Bias in Question Answering

It is found that using the prior distribution of answer positions as a bias model is very effective at reducing position bias recovering the performance of BERT from 35.24% to 81.17% when trained on a biased SQuAD dataset.

SQuAD: 100,000+ Questions for Machine Comprehension of Text

A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).

Counterfactual Samples Synthesizing for Robust Visual Question Answering

A model-agnostic Counterfactual Samples Synthesizing (CSS) training scheme that significantly improves both visual-explainable and question-sensitive abilities of VQA models and, in return, the performance of these models is further boosted.

PyTorch: An Imperative Style, High-Performance Deep Learning Library

This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.

Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases

This paper trains a naive model that makes predictions exclusively based on dataset biases, and a robust model as part of an ensemble with the naive one in order to encourage it to focus on other patterns in the data that are more likely to generalize.

XLNet: Generalized Autoregressive Pretraining for Language Understanding

XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

GVQA explicitly disentangles the recognition of visual concepts present in the image from the identification of plausible answer space for a given question, enabling the model to more robustly generalize across different distributions of answers.