Corpus ID: 202712902

Learning Sparse Mixture of Experts for Visual Question Answering

  title={Learning Sparse Mixture of Experts for Visual Question Answering},
  author={Vardaan Pahuja and Jie Fu and C. Pal},
There has been a rapid progress in the task of Visual Question Answering with improved model architectures. Unfortunately, these models are usually computationally intensive due to their sheer size which poses a serious challenge for deployment. We aim to tackle this issue for the specific task of Visual Question Answering (VQA). A Convolutional Neural Network (CNN) is an integral part of the visual processing pipeline of a VQA model (assuming the CNN is trained along with entire VQA model). In… Expand
AdaTag: Multi-Attribute Value Extraction from Product Profiles with Adaptive Decoding
This paper parameterize the decoder with pretrained attribute embeddings, through a hypernetwork and a Mixture-of-Experts (MoE) module, which allows for separate, but semantically correlated, decoders to be generated on the fly for different attributes. Expand


ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering
The proposed ABC-CNN architecture for visual question answering task (VQA) achieves significant improvements over state-of-the-art methods on three benchmark VQA datasets and is shown to reflect the regions that are highly relevant to the questions. Expand
Stacked Attention Networks for Image Question Answering
A multiple-layer SAN is developed in which an image is queried multiple times to infer the answer progressively, and the progress that the SAN locates the relevant visual clues that lead to the answer of the question layer-by-layer. Expand
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
This work balances the popular VQA dataset by collecting complementary images such that every question in this balanced dataset is associated with not just a single image, but rather a pair of similar images that result in two different answers to the question. Expand
Image Question Answering Using Convolutional Neural Network with Dynamic Parameter Prediction
The proposed network-joint network with the CNN for ImageQA and the parameter prediction network-is trained end-to-end through back-propagation, where its weights are initialized using a pre-trained CNN and GRU. Expand
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge
This work presents a massive exploration of the effects of the myriad architectural and hyperparameter choices that must be made in generating a state-of-the-art model and provides a detailed analysis of the impact of each choice on model performance. Expand
Going deeper with convolutions
We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual RecognitionExpand
Very Deep Convolutional Networks for Large-Scale Image Recognition
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. Expand
A simple neural network module for relational reasoning
This work shows how a deep learning architecture equipped with an RN module can implicitly discover and learn to reason about entities and their relations. Expand
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
This work introduces a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks, and applies the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora. Expand
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features. Expand