Unsupervised Alignment of Natural Language Instructions with Video Segments


We propose an unsupervised learning algorithm for automatically inferring the mappings between English nouns and corresponding video objects. Given a sequence of natural language instructions and an unaligned video recording, we simultaneously align each instruction to its corresponding video segment, and also align nouns in each instruction to their… (More)


6 Figures and Tables


Citations per Year

Citation Velocity: 13

Averaging 13 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.

Slides referencing similar topics