Learn More
We propose a deep neural network for the prediction of future frames in natural video sequences. To effectively handle complex evolution of pixels in videos, we propose to decompose the motion and content, two key components generating dynamics in videos. Our model is built upon the Encoder-Decoder Convolutional Neural Network and Convolutional LSTM for(More)
We propose a hierarchical approach for making long-term predictions of future frames. To avoid inherent compounding errors in recursive pixellevel prediction, we propose to first estimate highlevel structure in the input frames, then predict how that structure evolves in the future, and finally by observing a single frame from the past and the predicted(More)
This paper introduces an unsupervised framework to extract semantically rich features for video representation. Inspired by how the human visual system groups objects based on motion cues, we propose a deep convolutional neural network that disentangles motion, foreground and background information. The proposed architecture consists of a 3D convolutional(More)
  • 1