PlotQA: Reasoning over Scientific Plots

@article{Methani2020PlotQARO,
  title={PlotQA: Reasoning over Scientific Plots},
  author={Nitesh Methani and Pritha Ganguly and Mitesh M. Khapra and Pratyush Kumar},
  journal={2020 IEEE Winter Conference on Applications of Computer Vision (WACV)},
  year={2020},
  pages={1516-1525}
}
Existing synthetic datasets (Figure QA, DVQA) for reasoning over plots do not contain variability in data labels, real-valued data, or complex reasoning questions. Consequently, proposed models for these datasets do not fully address the challenge of reasoning over plots. In particular, they assume that the answer comes either from a small fixed size vocabulary or from a bounding box within the image. However, in practice, this is an unrealistic assumption because many questions require… 
MultiModalQA: Complex Question Answering over Text, Tables and Images
TLDR
This paper creates MMQA, a challenging question answering dataset that requires joint reasoning over text, tables and images, and defines a formal language that allows it to take questions that can be answered from a single modality, and combine them to generate cross-modal questions.
Classification-Regression for Chart Comprehension
TLDR
This work proposes a new model that jointly learns classification and regression on chart question answering questions with out-of-vocabulary answers, outperforming previous approaches by a large margin and showing competitive performance on FigureQA.
MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding
TLDR
A new QA evaluation benchmark with 1,384 questions over news articles that require crossmedia grounding of objects in images onto text, and introduces a novel multimedia data augmentation framework, based on cross-media knowledge extraction and synthetic question-answer generation, to automatically augment data that can provide weak supervision for this task.
DUE: End-to-End Document Understanding Benchmark
Understanding documents with rich layouts plays a vital role in digitization and hyper-automation but remains a challenging topic in the NLP research community. Additionally, the lack of a commonly
VisQA: Quantifying Information Visualisation Recallability via Question Answering
TLDR
This work proposes a visual question answering (VQA) paradigm to study visualisation recallability and presents VisQA — a novel VQA dataset consisting of 200 visualisations that are annotated with crowd-sourced human recallability scores obtained from 1,000 questions from five question types.
STL-CQA: Structure-based Transformers with Localization and Encoding for Chart Question Answering
TLDR
STL-CQA is proposed which improves the question/answering through sequential elements localization, question encoding and then, a structural transformer-based learning approach which shows a significant accuracy improvement compared to the state-of-the-art approaches on various chart Q/A datasets, while outperforming even human baseline on the DVQA Dataset.
BarChartAnalyzer: Digitizing Images of Bar Charts
TLDR
This work narrow down the scope to bar charts and proposes a semi-automated workflow, BarChartAnalyzer, for data extraction from chart images, which can effectively and accurately extract data from images of different resolutions and of different subtypes of bar charts.
TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text
TLDR
This work proposes Text OCR, an arbitrary-shaped scene text detection and recognition with 900k annotated words collected on real images from TextVQA dataset and uses a TextOCR trained OCR model to create PixelM4C model which can do scene text based reasoning on an image in an end-to-end fashion.
ScatterPlotAnalyzer: Digitizing Images of Charts Using Tensor-Based Computational Model
TLDR
This work narrows down the scope to scatter plots and proposes a semi-automated algorithm, ScatterPlotAnalyzer, for data extraction from chart images, designed around the use of second-order tensor fields to model the chart image.
AI4VIS: Survey on Artificial Intelligence Approaches for Data Visualization.
  • Aoyu Wu, Yun Wang, +5 authors Huamin Qu
  • Computer Science, Medicine
    IEEE transactions on visualization and computer graphics
  • 2021
TLDR
This survey probes the underlying vision of formalizing visualizations as an emerging data format and review the recent advance in applying AI techniques to visualization data (AI4VIS), and defines visualization data as the digital representations of visualizations in computers and focus on data visualization.
...
1
2
...

References

SHOWING 1-10 OF 42 REFERENCES
FigureQA: An Annotated Figure Dataset for Visual Reasoning
TLDR
FigureQA is envisioned as a first step towards developing models that can intuitively recognize patterns from visual representations of data, and preliminary results indicate that the task poses a significant machine learning challenge.
Towards VQA Models That Can Read
TLDR
A novel model architecture is introduced that reads text in the image, reasons about it in the context of the image and the question, and predicts an answer which might be a deduction based on the text and the image or composed of the strings found in the images.
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
TLDR
This work balances the popular VQA dataset by collecting complementary images such that every question in this balanced dataset is associated with not just a single image, but rather a pair of similar images that result in two different answers to the question.
VQA: Visual Question Answering
We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image, the task is to provide an accurate natural language
FigureSeer: Parsing Result-Figures in Research Papers
TLDR
This paper introduces FigureSeer, an end-to-end framework for parsing result-figures, that enables powerful search and retrieval of results in research papers and formulates a novel graph-based reasoning approach using a CNN-based similarity metric.
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
TLDR
This work presents a diagnostic dataset that tests a range of visual reasoning abilities and uses this dataset to analyze a variety of modern visual reasoning systems, providing novel insights into their abilities and limitations.
Hierarchical Question-Image Co-Attention for Visual Question Answering
TLDR
This paper presents a novel co-attention model for VQA that jointly reasons about image and question attention in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN).
Stacked Attention Networks for Image Question Answering
TLDR
A multiple-layer SAN is developed in which an image is queried multiple times to infer the answer progressively, and the progress that the SAN locates the relevant visual clues that lead to the answer of the question layer-by-layer.
DVQA: Understanding Data Visualizations via Question Answering
TLDR
DVQA is presented, a dataset that tests many aspects of bar chart understanding in a question answering framework and two strong baselines are proposed that perform considerably better than current VQA algorithms.
A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input
TLDR
This work proposes a method for automatically answering questions about images by bringing together recent advances from natural language processing and computer vision by a multi-world approach that represents uncertainty about the perceived world in a bayesian framework.
...
1
2
3
4
5
...