• Corpus ID: 221186646

Document Visual Question Answering Challenge 2020

@article{Mathew2020DocumentVQ,
  title={Document Visual Question Answering Challenge 2020},
  author={Minesh Mathew and Rub{\`e}n P{\'e}rez Tito and Dimosthenis Karatzas and R. Manmatha and C. V. Jawahar},
  journal={ArXiv},
  year={2020},
  volume={abs/2008.08899}
}
This paper presents results of Document Visual Question Answering Challenge organized as part of "Text and Documents in the Deep Learning Era" workshop, in CVPR 2020. The challenge introduces a new problem - Visual Question Answering on document images. The challenge comprised two tasks. The first task concerns with asking questions on a single document image. On the other hand, the second task is set as a retrieval task where the question is posed over a collection of images. For the task 1 a… 

Figures and Tables from this paper

Document Collection Visual Question Answering

This work introduces Document Collection Visual Question Answering (DocCVQA) a new dataset and related task, where questions are posed over a whole collection of document images and the goal is not only to provide the answer to the given question, but also to retrieve the set of documents that contain the information needed to infer the answer.

Asking questions on handwritten document collections

It is argued that the recognition-free approach is suitable for handwritten documents and historical collections where robust text recognition is often difficult and for human users, document image snippets containing answers act as a valid alternative to textual answers.

ICDAR 2021 Competition on Document Visual Question Answering

Results of the ICDAR 2021 edition of the Document Visual Question Challenges are presented and a newly introduced on Infographics VQA is introduced, based on a new dataset of more than 5, 000 infographics images and 30, 000 question-answer pairs.

References

SHOWING 1-6 OF 6 REFERENCES

DocVQA: A Dataset for VQA on Document Images

Although the existing models perform reasonably well on certain types of questions, there is large performance gap compared to human performance (94.36% accuracy).

Scene Text Visual Question Answering

A new dataset, ST-VQA, is presented that aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the Visual Question Answering process and proposes a new evaluation metric for these tasks to account both for reasoning errors as well as shortcomings of the text recognition module.

ICDAR 2019 Competition on Scene Text Visual Question Answering

This paper presents final results of ICDAR 2019 Scene Text Visual Question Answering competition (ST-VQA), which introduces a new dataset comprising 23,038 images annotated with 31,791 question / answer pairs where the answer is always grounded on text instances present in the image.

Towards VQA Models That Can Read

A novel model architecture is introduced that reads text in the image, reasons about it in the context of the image and the question, and predicts an answer which might be a deduction based on the text and the image or composed of the strings found in the images.

VizWiz Grand Challenge: Answering Visual Questions from Blind People

Evaluation of modern algorithms for answering visual questions and deciding if a visual question is answerable reveals that VizWiz is a challenging dataset, which is introduced to encourage a larger community to develop more generalized algorithms that can assist blind people.

TextCaps: Handwritten Character Recognition With Very Small Datasets

This work introduces a technique of generating new training samples from the existing samples, with realistic augmentations which reflect actual variations that are present in human hand writing, by adding random controlled noise to their corresponding instantiation parameters.