Dynamic Graph Generation Network: Generating Relational Knowledge from Diagrams

@article{Kim2018DynamicGG,
  title={Dynamic Graph Generation Network: Generating Relational Knowledge from Diagrams},
  author={Daesik Kim and Young Joon Yoo and Jeesoo Kim and Sangkuk Lee and Nojun Kwak},
  journal={2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2018},
  pages={4167-4175}
}
  • Daesik KimY. Yoo Nojun Kwak
  • Published 27 November 2017
  • Computer Science
  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
In this work, we introduce a new algorithm for analyzing a diagram, which contains visual and textual information in an Abstract and integrated way. [] Key Method Specifically, we propose a dynamic graph-generation network that is based on dynamic memory and graph theory. We explore the dynamics of information in a diagram with activation of gates in gated recurrent unit (GRU) cells. On publicly available diagram datasets, our model demonstrates a state-of-the-art result that outperforms other baselines…

Textbook Question Answering with Knowledge Graph Understanding and Unsupervised Open-set Text Comprehension

A novel algorithm for solving the textbook question answering (TQA) task is introduced which describes more realistic QA problems compared to other recent tasks and significantly outperforms prior state-of-the-art methods.

Textbook Question Answering with Multi-modal Context Graph Understanding and Self-supervised Open-set Comprehension

A novel algorithm for solving the textbook question answering (TQA) task is introduced which describes more realistic QA problems compared to other recent tasks and a novel self-supervised open-set learning process without any annotations is introduced.

Classifying Diagrams and Their Parts using Graph Neural Networks: A Comparison of Crowd-Sourced and Expert Annotations

The results show that the identity of diagram elements can be learned from their layout features, while the expert annotations provide better representations of diagram types.

Hierarchical Multi-Task Learning for Diagram Question Answering with Multi-Modal Transformer

This paper proposes a novel structural parsing-integrated Hierarchical Multi-Task Learning (HMTL) model for diagram question answering based on a multi-modal transformer framework and demonstrates the effectiveness of the proposed HMTL over other state-of-the-art methods.

It's Not About the Journey; It's About the Destination: Following Soft Paths Under Question-Guidance for Visual Reasoning

It is shown, that finding relevant semantic structures facilitates generalization to new tasks by introducing a novel problem of knowledge transfer: training on one question type and answering questions from a different domain without any training data.

Transfer Learning in Visual and Relational Reasoning

A new, end-to-end differentiable recurrent model (SAMNet) is introduced, which shows state-of-the-art accuracy and better performance in transfer learning on both datasets.

Figuring out Figures: Using Textual References to Caption Scientific Figures

This work uses the S CI C AP datasets curated by Hsu et al. and uses a variant of a CLIP+GPT-2 encoder-decoder model with cross-attention to generate captions conditioned on the image, and uses SciBERT to encode the textual metadata and uses this encoding alongside the figure embedding.

AI2D-RST: A multimodal corpus of 1000 primary school science diagrams

A multi-layer annotation schema that provides a rich description of diagram elements into perceptual units, the connections set up by diagrammatic elements such as arrows and lines, and the discourse relations between diagram elements, which are described using Rhetorical Structure Theory (RST).

Adversarial Multimodal Network for Movie Story Question Answering

In AMN, a self-attention mechanism is developed to enforce the newly introduced consistency constraint in order to preserve the self-correlation between the visual cues of the original video clips in the learned multimodal representations.

OsGG-Net: One-step Graph Generation Network for Unbiased Head Pose Estimation

OsGG-Net is proposed, a One-step Graph Generation Network for estimating head poses from a single image by generating a landmark-connection graph to model the 3D angle associated with the landmark distribution robustly and the UnBiased Head Pose Dataset, called UBHPD, and a new unbiased metric, namely UBMAE are proposed.

References

SHOWING 1-10 OF 25 REFERENCES

Gated Graph Sequence Neural Networks

This work studies feature learning techniques for graph-structured inputs and achieves state-of-the-art performance on a problem from program verification, in which subgraphs need to be matched to abstract data structures.

A Diagram is Worth a Dozen Images

An LSTM-based method for syntactic parsing of diagrams and a DPG-based attention model for diagram question answering are devised and a new dataset of diagrams with exhaustive annotations of constituents and relationships is compiled.

Scene Graph Generation by Iterative Message Passing

This work explicitly model the objects and their relationships using scene graphs, a visually-grounded graphical structure of an image, and proposes a novel end-to-end model that generates such structured scene representation from an input image.

Dynamic Memory Networks for Visual and Textual Question Answering

The new DMN+ model improves the state of the art on both the Visual Question Answering dataset and the \babi-10k text question-answering dataset without supporting fact supervision.

Memory Networks

This work describes a new class of learning models called memory networks, which reason with inference components combined with a long-term memory component; they learn how to use these jointly.

The Graph Neural Network Model

A new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in graph domains, and implements a function tau(G,n) isin IRm that maps a graph G and one of its nodes n into an m-dimensional Euclidean space.

Fully convolutional networks for semantic segmentation

The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

The dynamic memory network (DMN), a neural network architecture which processes input sequences and questions, forms episodic memories, and generates relevant answers, is introduced.

The More You Know: Using Knowledge Graphs for Image Classification

This paper investigates the use of structured prior knowledge in the form of knowledge graphs and shows that using this knowledge improves performance on image classification, and introduces the Graph Search Neural Network as a way of efficiently incorporating large knowledge graphs into a vision classification pipeline.

ViP-CNN: A Visual Phrase Reasoning Convolutional Neural Network for Visual Relationship Detection

In ViP-CNN, the visual relationship is considered as a phrase with three components and a Visual Phrase Reasoning Structure (VPRS) is presented to set up the connection among the relationship components and help the model consider the three problems jointly.