Disentangling Motion, Foreground and Background Features in Videos


This paper introduces an unsupervised framework to extract semantically rich features for video representation. Inspired by how the human visual system groups objects based on motion cues, we propose a deep convolutional neural network that disentangles motion, foreground and background information. The proposed architecture consists of a 3D convolutional… (More)


4 Figures and Tables

Slides referencing similar topics