Seeing What You're Told: Sentence-Guided Activity Recognition in Video

Abstract

We present a system that demonstrates how the compositional structure of events, in concert with the compositional structure of language, can interplay with the underlying focusing mechanisms in video action recognition, providing a medium for top-down and bottom-up integration as well as multi-modal integration between vision and language. We show how the… (More)
DOI: 10.1109/CVPR.2014.99

Topics

6 Figures and Tables

Statistics

02040201520162017
Citations per Year

Citation Velocity: 16

Averaging 16 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.

Slides referencing similar topics