A Dataset for Tracking Entities in Open Domain Procedural Text

  title={A Dataset for Tracking Entities in Open Domain Procedural Text},
  author={Niket Tandon and Keisuke Sakaguchi and Bhavana Dalvi and Dheeraj Rajagopal and Peter Clark and Michal Guerquin and Kyle Richardson and Eduard H. Hovy},
We present the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. For example, in a text describing fog removal using potatoes, a car window may transition between being foggy, sticky, opaque, and clear. Previous formulations of this task provide the text and entities involved, and ask how those entities change for just a small, pre-defined set of attributes (e.g., location), limiting their fidelity. Our solution is a… 

Figures and Tables from this paper

Coalescing Global and Local Information for Procedural Text Understanding

A new model that builds entity- and timestep-aware input representations (local input) considering the whole context (global input) and jointly model the entity states with a structured prediction objective (global output) is proposed, which optimizes for both precision and recall.

SOCCER: An Information-Sparse Discourse State Tracking Collection in the Sports Commentary Domain

This paper proposes a new task formulation where, given paragraphs of commentary of a game at different timestamps, the system is asked to recognize the occurrence of in-game events, which allows for rich descriptions of state while avoiding the complexities of many other real-world settings.

proScript: Partially Ordered Scripts Generation

This work demonstrates for the first time that pre-trained neural language models can be finetuned to generate high-quality scripts, at varying levels of granularity, for a wide range of everyday scenarios (e.g., bake a cake).

Modeling Temporal-Modal Entity Graph for Procedural Multimodal Machine Comprehension

A novel Temporal-Modal Entity Graph (TMEG) is proposed to capture textual and visual entities and trace their temporal-modal evolution and a graph aggregation module is introduced to conduct graph encoding and reasoning.

Temporal Reasoning on Implicit Events from Distant Supervision

A neuro-symbolic temporal reasoning model, SymTime, is proposed, which exploits distant supervision signals from large-scale text and uses temporal rules to combine start times and durations to infer end times and generalizes to other temporal reasoning tasks.

Visual Recipe Flow: A Dataset for Learning Visual State Changes of Objects with Recipe Flows

A new multimodal dataset called Visual Recipe Flow is presented, which enables us to learn each cooking action result in a recipe text, and consists of object state changes and the work of the recipe text.

PASTA: A Dataset for Modeling Participant States in Narratives

The events in a narrative can be understood as a coherent whole via the underlying states of its participants. Often, these participant states are not explicitly mentioned in the narrative, left to

A Neural Edge-Editing Approach for Document-Level Relation Graph Extraction

The experimental results show the effectiveness of the novel edge-editing approach to extract relation information from a document in editing the graphs initialized by the authors' in-house rule-based system and empty graphs.

Process-Level Representation of Scientific Protocols with Interactive Annotation

Graph-prediction models are used to develop Process Execution Graphs, finding them to be good at entity identification and local relation extraction, while the corpus facilitates further exploration of challenging long-range relations.

Multimedia Generative Script Learning for Task Planning

This work proposes a new task, Multimedia Generative Script Learning, to generate subsequent steps by tracking historical states in both text and vision modal-ities, as well as presenting the first benchmark containing 2,338 tasks and 31,496 steps with descriptive images.



Reasoning about Actions and State Changes by Injecting Commonsense Knowledge

This paper shows how the predicted effects of actions in the context of a paragraph can be improved in two ways: by incorporating global, commonsense constraints (e.g., a non-existent entity cannot be destroyed), and by biasing reading with preferences from large-scale corpora.

Tracking the World State with Recurrent Entity Networks

The EntNet sets a new state-of-the-art on the bAbI tasks, and is the first method to solve all the tasks in the 10k training examples setting, and can generalize past its training horizon.

VirtualHome: Simulating Household Activities Via Programs

This paper crowd-source programs for a variety of activities that happen in people's homes, via a game-like interface used for teaching kids how to code, and implements the most common atomic actions in the Unity3D game engine, and uses them to "drive" an artificial agent to execute tasks in a simulated household environment.

COMET: Commonsense Transformers for Automatic Knowledge Graph Construction

This investigation reveals promising results when implicit knowledge from deep pre-trained language models is transferred to generate explicit knowledge in commonsense knowledge graphs, and suggests that using generative commonsense models for automatic commonsense KB completion could soon be a plausible alternative to extractive methods.

Discovering states and transformations in image collections

A dataset of objects, scenes, and materials, each of which is found in a variety of transformed states, is introduced and given a novel collection of images, it is shown how to explain the collection in terms of the states and transformations it depicts.

Image specificity

  • M. JasDevi Parikh
  • Computer Science
    2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2015
This paper introduces the notion of image specificity and presents two mechanisms to measure specificity given multiple descriptions of an image: an automated measure and a measure that relies on human judgement.

Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks

This work argues for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering, and classify these tasks into skill sets so that researchers can identify (and then rectify) the failings of their systems.

ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks

It is shown that a baseline model based on recent embodied vision-and-language tasks performs poorly on ALFRED, suggesting that there is significant room for developing innovative grounded visual language understanding models with this benchmark.

ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning

Experimental results demonstrate that multitask models that incorporate the hierarchical structure of if-then relation types lead to more accurate inference compared to models trained in isolation, as measured by both automatic and human evaluation.

The Materials Science Procedural Text Corpus: Annotating Materials Synthesis Procedures with Shallow Semantic Structures

This work introduces a dataset of 230 synthesis procedures annotated by domain experts with labeled graphs that express the semantics of the synthesis sentences, and makes the corpus available to the community to promote further research and development of scientific information extraction systems.