Learning by Asking Questions

  title={Learning by Asking Questions},
  author={Ishan Misra and Ross B. Girshick and Rob Fergus and Martial Hebert and Abhinav Kumar Gupta and Laurens van der Maaten},
  journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
We introduce an interactive learning framework for the development and testing of intelligent visual systems, called learning-by-asking (LBA). We explore LBA in context of the Visual Question Answering (VQA) task. LBA differs from standard VQA training in that most questions are not observed during training time, and the learner must ask questions it wants answers to. Thus, LBA more closely mimics natural learning and has the potential to be more data-efficient than the traditional VQA setting… 

Figures and Tables from this paper

Learning to Caption Images Through a Lifetime by Asking Questions

Inspired by a student learning in a classroom, an agent is presented that can continuously learn by posing natural language questions to humans and achieves better performance using less human supervision than the baselines on the challenging MSCOCO dataset.

Learning to Ask Informative Sub-Questions for Visual Question Answering

This work proposes a novel VQA model that generates questions to actively obtain auxiliary perceptual information useful for correct reasoning, and shows that by inputting the generated questions and their answers as additional information to the V QA model, it can indeed predict the answer more correctly than the baseline model.

Learning to Retrieve Videos by Asking Questions

This work proposes a novel framework for Video Retrieval using Dialog (ViReD), which enables the user to interact with an AI agent via multiple rounds of dialog and proposes an Information-Guided Supervision (IGS), which guides the question generator to ask questions that would boost subsequent video retrieval accuracy.

Learning to Ask for Conversational Machine Learning

A reinforcement learning framework that allows learning classifiers from a blend of strategies, including learning from observations, explanations and clarifications, and shows that learned question-asking strategies expedite classifier training by asking appropriate questions at different points in the learning process.

A Competence-aware Curriculum for Visual Concepts Learning via Question Answering

This work designs a neural-symbolic concept learner for learning the visual concepts and a multi-dimensional Item Response Theory (mIRT) model for guiding the learning process with an adaptive curriculum.

A Dataset and Baselines for Visual Question Answering on Art

This work introduces the first attempt towards building a new dataset, coined AQUA (Art QUestion Answering), where question-answer (QA) pairs are automatically generated using state-of-the-art question generation methods based on paintings and comments provided in an existing art understanding dataset.

Curriculum Learning Effectively Improves Low Data VQA

This paper proposes curriculum-based learning (CL) regime to increase the accuracy of VQA models trained on small datasets and offers three criteria to rank the samples in these datasets and proposes a training strategy for each criterion.

Cycle-Consistency for Robust Visual Question Answering

A model-agnostic framework is proposed that trains a model to not only answer a question, but also generate a question conditioned on the answer, such that the answer predicted for the generated question is the same as the ground truth answer to the original question.

Ask Before You Act: Generalising to Novel Environments by Asking Questions

This work investigates the ability of an RL agent to learn to ask natural language questions as a tool to understand its environment and achieve greater generalisation performance in novel, temporally-extended environments by endowing this agent with the able of asking “yes-no” questions to an all-knowing Oracle.

Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition

This work develops an agent empowered with visual curiosity, i.e. the ability to ask questions to an Oracle and build visual recognition model based on the answers received, and proposes a novel framework and formulate the learning of visual curiosity as a reinforcement learning problem.



Revisiting Visual Question Answering Baselines

The results suggest that a key problem of current VQA systems lies in the lack of visual grounding and localization of concepts that occur in the questions and answers, and a simple alternative model based on binary classification is developed.

Easy Questions First? A Case Study on Curriculum Learning for Question Answering

This work compares a number of curriculum learning proposals in the context of four non-convex models for QA and shows that they lead to real improvements in each of them.

Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions

These approaches, based on LSTM-RNNs, VQA model uncertainty, and caption-question similarity, are able to outperform strong baselines on both relevance tasks and are shown to be more intelligent, reasonable, and human-like than previous approaches.

Generating Natural Questions About an Image

This paper introduces the novel task of Visual Question Generation, where the system is tasked with asking a natural and engaging question when shown an image, and provides three datasets which cover a variety of images from object-centric to event-centric.

Question Asking as Program Generation

A cognitive model capable of constructing human-like questions is introduced that predicts what questions people will ask, and can creatively produce novel questions that were not present in the training set.

VQA: Visual Question Answering

We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language

Learning to Reason: End-to-End Module Networks for Visual Question Answering

End-to-End Module Networks are proposed, which learn to reason by directly predicting instance-specific network layouts without the aid of a parser, and achieve an error reduction of nearly 50% relative to state-of-theart attentional approaches.

A Joint Model for Question Answering and Question Generation

A generative machine comprehension model that learns jointly to ask and answer questions based on documents that uses a sequence-to-sequence framework that encodes the document and generates a question given an answer.

Yin and Yang: Balancing and Answering Binary Visual Questions

This paper addresses binary Visual Question Answering on abstract scenes as visual verification of concepts inquired in the questions by converting the question to a tuple that concisely summarizes the visual concept to be detected in the image.

Visual7W: Grounded Question Answering in Images

A semantic link between textual descriptions and image regions by object-level grounding enables a new type of QA with visual answers, in addition to textual answers used in previous work, and proposes a novel LSTM model with spatial attention to tackle the 7W QA tasks.