Video Moment Localization using Object Evidence and Reverse Captioning

  Madhawa Vidanapathirana and Supriya Pandhre and Sonia Raychaudhuri and Anjali Khurana
We address the problem of language-based temporal localization of moments in untrimmed videos. Compared to temporal localization with fixed categories, this problem is more challenging as the language-based queries have no predefined activity classes and may also contain complex descriptions. Current state-of-the-art model MAC addresses it by mining activity concepts from both video and language modalities. This method encodes the semantic activity concepts from the verb/object pair in a… 

Figures and Tables from this paper

