Unsupervised Alignment of Natural Language Instructions with Video Segments

Abstract

We propose an unsupervised learning algorithm for automatically inferring the mappings between English nouns and corresponding video objects. Given a sequence of natural language instructions and an unaligned video recording, we simultaneously align each instruction to its corresponding video segment, and also align nouns in each instruction to their… (More)

Topics

6 Figures and Tables

Statistics

0102030201520162017
Citations per Year

Citation Velocity: 13

Averaging 13 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.

Slides referencing similar topics