Look, Listen and Learn

We consider the question: what can be learnt by looking at and listening to a large number of unlabelled videos? There is a valuable, but so far untapped, source of information contained in the video itself – the correspondence between the visual and the audio streams, and we introduce a novel “Audio-Visual Correspondence” learning task that makes use of… CONTINUE READING

14 Figures & Tables

Statistics

020406020172018
Citations per Year

Citation Velocity: 33

Averaging 33 citations per year over the last 2 years.

Learn more about how we calculate this metric in our FAQ.