MemexQA: Visual Memex Question Answering
@article{Jiang2017MemexQAVM, title={MemexQA: Visual Memex Question Answering}, author={Lu Jiang and Junwei Liang and Liangliang Cao and Yannis Kalantidis and Sachin Sudhakar Farfade and Alexander Hauptmann}, journal={ArXiv}, year={2017}, volume={abs/1708.01336} }
This paper proposes a new task, MemexQA: given a collection of photos or videos from a user, the goal is to automatically answer questions that help users recover their memory about events captured in the collection. [] Key Result Experimental results on the MemexQA dataset demonstrate that MemexNet outperforms strong baselines and yields the state-of-the-art on this novel and challenging task. The promising results on TextQA and VideoQA suggest MemexNet's efficacy and scalability across various QA tasks.
24 Citations
Focal Visual-Text Attention for Memex Question Answering
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2019
The MemexQA dataset is presented, the first publicly available multimodal question answering dataset consisting of real personal photo albums and an end-to-end trainable network that makes use of a hierarchical process to dynamically determine what media and what time to focus on in the sequential data to answer the question is proposed.
Visual Question Answering using Deep Learning: A Survey and Performance Analysis
- Computer ScienceCVIP
- 2020
This survey covers and discusses the recent datasets released in the VQA domain dealing with various types of question-formats and enabling robustness of the machine-learning models, and presents and discusses some of the results computed by us over the vanilla V QA models, Stacked Attention Network and the VqA Challenge 2017 winner model.
Focal Visual-Text Attention for Visual Question Answering
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
A novel neural network called Focal Visual-Text Attention network (FVTA) is described for collective reasoning in visual question answering, where both visual and text sequence information such as images and text metadata are presented.
Video Question-Answering Techniques, Benchmark Datasets and Evaluation Metrics Leveraging Video Captioning: A Comprehensive Survey
- Computer ScienceIEEE Access
- 2021
The presented survey shows that recent works on Memory Networks, Generative Adversarial Networks, and Reinforced Decoders, have the capability to handle the complexities and challenges of video-QA.
Diverse Visuo-Lingustic Question Answering (DVLQA) Challenge
- Computer ScienceArXiv
- 2020
A Diverse Visuo-Lingustic Question Answering (DVLQA) challenge corpus, where the task is to derive joint inference about the given image-text modality in a question answering setting and a modular method is developed which demonstrates slightly better baseline performance and offers more transparency for interpretation of intermediate outputs.
Semantic Reanalysis of Scene Words in Visual Question Answering
- Computer SciencePRCV
- 2019
A new image and sentence similarity matching model is proposed, which outputs the correct image representation by learning the semantic concept and improves the accuracy by nearly 10%.
Progressive Attention Memory Network for Movie Story Question Answering
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
Experiments on publicly available benchmark datasets, MovieQA and TVQA, demonstrate that each feature contributes to the movie story QA architecture, PAMN, and improves performance to achieve the state-of-the-art result.
Photo Stream Question Answer
- Computer ScienceACM Multimedia
- 2020
This paper presents a new visual question answering (VQA) task -- Photo Stream QA, which aims to answer the open-ended questions about a narrative photo stream, and proposes an end-to-end baseline (E-TAA), which provides promising results outperforming all the other baseline methods.
A survey of methods, datasets and evaluation metrics for visual question answering
- Computer ScienceImage Vis. Comput.
- 2021
References
SHOWING 1-10 OF 36 REFERENCES
Dynamic Memory Networks for Visual and Textual Question Answering
- Computer ScienceICML
- 2016
The new DMN+ model improves the state of the art on both the Visual Question Answering dataset and the \babi-10k text question-answering dataset without supporting fact supervision.
Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
We address a question answering task on real-world images that is set up as a Visual Turing Test. By combining latest advances in image representation and natural language processing, we propose…
Hierarchical Question-Image Co-Attention for Visual Question Answering
- Computer ScienceNIPS
- 2016
This paper presents a novel co-attention model for VQA that jointly reasons about image and question attention in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN).
Stacked Attention Networks for Image Question Answering
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
A multiple-layer SAN is developed in which an image is queried multiple times to infer the answer progressively, and the progress that the SAN locates the relevant visual clues that lead to the answer of the question layer-by-layer.
Dynamic Coattention Networks For Question Answering
- Computer ScienceICLR
- 2017
The Dynamic Coattention Network (DCN) for question answering first fuses co-dependent representations of the question and the document in order to focus on relevant parts of both, then a dynamic pointing decoder iterates over potential answer spans to recover from initial local maxima corresponding to incorrect answers.
Neural Module Networks
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
A procedure for constructing and learning neural module networks, which compose collections of jointly-trained neural "modules" into deep networks for question answering, and uses these structures to dynamically instantiate modular networks (with reusable components for recognizing dogs, classifying colors, etc.).
Gated-Attention Readers for Text Comprehension
- Computer ScienceACL
- 2017
The Gated-Attention (GA) Reader, a model that integrates a multi-hop architecture with a novel attention mechanism, which is based on multiplicative interactions between the query embedding and the intermediate states of a recurrent neural network document reader, enables the reader to build query-specific representations of tokens in the document for accurate answer selection.
SQuAD: 100,000+ Questions for Machine Comprehension of Text
- Computer ScienceEMNLP
- 2016
A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).
VQA: Visual Question Answering
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language…
Machine Comprehension Using Match-LSTM and Answer Pointer
- Computer ScienceICLR
- 2017
This work proposes an end-to-end neural architecture for the Stanford Question Answering Dataset (SQuAD), based on match-LSTM, a model previously proposed previously for textual entailment, and Pointer Net, a sequence- to-sequence model proposed by Vinyals et al.(2015) to constrain the output tokens to be from the input sequences.