An Improved Attention for Visual Question Answering

  title={An Improved Attention for Visual Question Answering},
  author={Tanzila Rahman and Shih-Han Chou and Leonid Sigal and Giuseppe Carenini},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
We consider the problem of Visual Question Answering (VQA). Given an image and a free-form, open-ended, question, expressed in natural language, the goal of VQA system is to provide accurate answer to this question with respect to the image. The task is challenging because it requires simultaneous and intricate understanding of both visual and textual information. Attention, which captures intra- and inter-modal dependencies, has emerged as perhaps the most widely used mechanism for addressing… Expand

