Diagram Understanding in Geometry Questions

  title={Diagram Understanding in Geometry Questions},
  author={Minjoon Seo and Hannaneh Hajishirzi and Ali Farhadi and Oren Etzioni},
Automatically solving geometry questions is a long-standing AI problem. A geometry question typically includes a textual description accompanied by a diagram. The first step in solving geometry questions is diagram understanding, which consists of identifying visual elements in the diagram, their locations, their geometric properties, and aligning them to corresponding textual descriptions. In this paper, we present a method for diagram understanding that identifies visual elements in a… 

Figures and Tables from this paper

Understanding Plane Geometry Problems by Integrating Relations Extracted from Text and Diagram

The proposed method can mine geometric relations in high accuracy and it can understand some problems that cannot be understood by using text only or by using diagram only.

Solving Geometry Problems: Combining Text and Diagram Interpretation

GEOS is introduced, the first automated system to solve unaltered SAT geometry questions by combining text understanding and diagram interpretation, and it is shown that by integrating textual and visual information, GEOS boosts the accuracy of dependency and semantic parsing of the question text.

PGDP5K: A Diagram Parsing Dataset for Plane Geometry Problems

This work proposes a new large-scale geometry diagram dataset named PGDP5K and a novel annotation method that can generate intelligible geometric propositions automatically and uniquely.

Computer Science Diagram Understanding with Topology Parsing

This paper constructs the first novel geometric type of diagrams dataset in Computer Science field, which has more abstract expressions and complex logical relations, and proposes the Diagram Paring Net (DPN) that focuses on analyzing the topological structure and text information of diagrams.

A Diagram is Worth a Dozen Images

An LSTM-based method for syntactic parsing of diagrams and a DPG-based attention model for diagram question answering are devised and a new dataset of diagrams with exhaustive annotations of constituents and relationships is compiled.

Plane Geometry Diagram Parsing

A modified instance segmentation method is proposed to extract geometric primitives, and the graph neural network (GNN) is leveraged to realize relation parsing and primitive classification incorporating geometric features and prior knowledge.

GeoQA: A Geometric Question Answering Benchmark Towards Multimodal Numerical Reasoning

A Neural Geometric Solver (NGS) is introduced to address geometric problems by comprehensively parsing multimodal information and generating interpretable programs, and multiple self-supervised auxiliary tasks on NGS are added to enhance cross-modal semantic representation.

Structured Set Matching Networks for One-Shot Part Labeling

The Structured Set Matching Network (SSMN), a structured prediction model that incorporates convolutional neural networks, is introduced for the problem of one-shot part labeling: labeling multiple parts of an object in a target image given only a single source image of that category.

Recognition and Modeling of Planar Mechanical Linkages from Images Using Symbolic and Behavioral Cues

A computational method capable of quickly generating accurate kinematic models from images of planar mechanical linkages is created and a novel metric called the user effort ratio is introduced to compare the overall performance of different algorithms and assess the benefit of automatic recognition over manual model construction.

Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning

This work constructs a new largescale benchmark, Geometry3K, consisting of 3,002 geometry problems with dense annotation in formal language, and proposes a novel geometry solving approach with formal language and symbolic reasoning, called Interpretable Geometry Problem Solver (InterGPS).



GeoRep: A Flexible Tool for Spatial Representation of Line Drawings

GeoRep is created, a spatial reasoning engine that generates qualitative spatial descriptions from line drawings that has been successfully used in several research projects, including cognitive simulation studies of human vision and military courses of action.

Understanding text with an accompanying diagram

A program which understands elementary physics problems, parsing the text and picture components of the problem together, and produces an abstract model of the information contained within.

Understanding Machines from Text and Diagrams.

Every Picture Tells a Story: Generating Sentences from Images

A system that can compute a score linking an image to a sentence, which can be used to attach a descriptive sentence to a given image, or to obtain images that illustrate a given sentence.

Computational models for integrating linguistic and visual information: A survey

  • R. Srihari
  • Computer Science
    Artificial Intelligence Review
  • 2004
An important contribution of this paper is to categorize existing research based on inputs and objectives, namely how to associate visual events with words and vice versa.

Efficient diagram understanding with characteristic pattern detection

Telling juxtapositions: Using repetition and alignable difference in diagram understanding

This paper shows how MAGI, the model of repetition and symmetry detection, can model the cognitive processes humans use when reading repetition-based diagrams, and describes JUXTA, which uses this insight to critique a class of diagrams that juxtapose similar scenes to demonstrate physical laws.

Understanding Natural Language with Diagrams

A program, BEATRIX, is described that can understand textbook physics problems specified by a combination of English text and a diagram, to establish coreference, that is, determining when parts of the text and diagram refer to the same object.

Baby Talk : Understanding and Generating Image Descriptions

A system to automatically generate natural language descriptions from images that exploits both statistics gleaned from parsing large quantities of text data and recognition algorithms from computer vision that is very effective at producing relevant sentences for images.

Baby talk: Understanding and generating simple image descriptions

A system to automatically generate natural language descriptions from images that exploits both statistics gleaned from parsing large quantities of text data and recognition algorithms from computer vision that is very effective at producing relevant sentences for images.