Corpus ID: 1657806

Multimodal Residual Learning for Visual QA

@inproceedings{Kim2016MultimodalRL,
  title={Multimodal Residual Learning for Visual QA},
  author={J. Kim and Sang-Woo Lee and Dong-Hyun Kwak and Min-Oh Heo and Jung-Woo Ha and B. Zhang},
  booktitle={NIPS},
  year={2016}
}
Deep neural networks continue to advance the state-of-the-art of image recognition tasks with various methods. However, applications of these methods to multimodality remain limited. We present Multimodal Residual Networks (MRN) for the multimodal residual learning of visual question-answering, which extends the idea of the deep residual learning. Unlike the deep residual learning, MRN effectively learns the joint representation from vision and language information. The main idea is to use… Expand
Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering
Residual Self-Attention for Visual Question Answering
Learning a Recurrent Residual Fusion Network for Multimodal Matching
DRAU: Dual Recurrent Attention Units for Visual Question Answering
Dual Recurrent Attention Units for Visual Question Answering
Visual Explanations from Hadamard Product in Multimodal Deep Networks
Co-Attention Network With Question Type for Visual Question Answering
Multi-Channel Co-Attention Network for Visual Question Answering
Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 39 REFERENCES
Deep Residual Learning for Image Recognition
Multimodal Deep Learning
Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images
Multimodal learning with deep Boltzmann machines
Very Deep Convolutional Networks for Large-Scale Image Recognition
A Focused Dynamic Attention Model for Visual Question Answering
Deep Visual-Semantic Alignments for Generating Image Descriptions
  • A. Karpathy, Li Fei-Fei
  • Computer Science, Medicine
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2017
Dynamic Memory Networks for Visual and Textual Question Answering
Exploring Models and Data for Image Question Answering
...
1
2
3
4
...