Benchmarking Hierarchical Script Knowledge

  title={Benchmarking Hierarchical Script Knowledge},
  author={Yonatan Bisk and Jan Buys and Karl Pichotta and Yejin Choi},
Understanding procedural language requires reasoning about both hierarchical and temporal relations between events. For example, “boiling pasta” is a sub-event of “making a pasta dish”, typically happens before “draining pasta,” and requires the use of omitted tools (e.g. a strainer, sink...). While people are able to choose when and how to use abstract versus concrete instructions, the NLP community lacks corpora and tasks for evaluating if our models can do the same. In this paper, we… 

Figures and Tables from this paper

Script Parsing with Hierarchical Sequence Modelling

This model improves the state of the art of event parsing by over 16 points F-score and, for the first time, accurately tags script participants.

Reading between the Lines: Exploring Infilling in Visual Narratives

This paper presents a new large scale visual procedure telling (ViPT) dataset with a total of 46,200 procedures and around 340k pairwise images and textual descriptions that is rich in such contextual dependencies.

PIQA: Reasoning about Physical Commonsense in Natural Language

The task of physical commonsense reasoning and a corresponding benchmark dataset Physical Interaction: Question Answering or PIQA are introduced and analysis about the dimensions of knowledge that existing models lack are provided, which offers significant opportunities for future research.

Reasoning about Procedures with Natural Language Processing: A Tutorial

This tutorial provides a comprehensive and in-depth view of the research on procedures, primarily in Natural Language Processing, by discussing established approaches to collect procedures, by human annotation or extraction from web resources.

Learning to Segment Actions from Observation and Narration

A generative segmental model of task structure, guided by narration, is applied to action segmentation in video, and it is found that both task structure and narrative language provide large benefits in segmentation quality.

Pretraining on Interactions for Learning Grounded Affordance Representations

A neural network is trained to predict objects? trajectories in a simulated interaction and it is shown that the network?s latent representations differentiate between both observed and unobserved affordances.

Event Representation with Sequential, Semi-Supervised Discrete Variables

A sequential neural variational autoencoder is constructed, which uses Gumbel-Softmax reparametrization within a carefully defined encoder, to allow for successful backpropagation during training.

Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts

A multimodal dataset of 1299 Instagram posts labeled for three orthogonal taxonomies is introduced, showing that employing both text and image improves intent detection by 9.6 compared to using only the image modality, demonstrating the commonality of non-intersective meaning multiplication.



Unsupervised Learning of Narrative Event Chains

A three step process to learning narrative event chains using unsupervised distributional methods to learn narrative relations between events sharing coreferring arguments and introduces two evaluations: the narrative cloze to evaluate event relatedness, and an order coherence task to evaluate narrative order.

Generating Coherent Event Schemas at Scale

This work presents a novel approach to inducing open-domain event schemas that overcomes limitations of Chambers and Jurafsky's (2009) schemas and uses cooccurrence statistics of semantically typed relational triples, which it calls Rel-grams (relational n- grams).

Learning to predict script events from domain-specific text

The automatic induction of scripts (Schank and Abelson, 1977) has been the focus of many recent works. In this paper, we employ a variety of these methods to learn Schank and Abelson’s canonical

Probabilistic Frame Induction

This paper proposes the first probabilistic approach to frame induction, which incorporates frames, events, and participants as latent topics and learns those frame and event transitions that best explain the text.

Hierarchical Quantized Representations for Script Generation

An autoencoder model with a latent space defined by a hierarchy of categorical variables, utilizing a recently proposed vector quantization based approach, which allows continuous embeddings to be associated with each latent variable value.

Event Schema Induction with a Probabilistic Entity-Driven Model

This paper presents the first generative model for schema induction that integrates coreference chains into learning, and matches the pipeline’s performance, and outperforms the HMM by 7 F1 points.

Behind the Scenes of an Evolving Event Cloze Test

It is argued that the narrative event cloze test has slowly/unknowingly been altered to accommodate LMs, and recommended recommendations on how to return to the test’s original intent are offered.

Deep Contextualized Word Representations

A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.

Scripts, plans, goals and understanding: an inquiry into human knowledge structures

For both people and machines, each in their own way, there is a serious problem in common of making sense out of what they hear, see, or are told about the world. The conceptual apparatus necessary

Viterbi Training Improves Unsupervised Dependency Parsing

We show that Viterbi (or "hard") EM is well-suited to unsupervised grammar induction. It is more accurate than standard inside-outside re-estimation (classic EM), significantly faster, and simpler.