Learning Features by Watching Objects Move

@article{Pathak2017LearningFB,
  title={Learning Features by Watching Objects Move},
  author={Deepak Pathak and Ross B. Girshick and Piotr Doll{\'a}r and Trevor Darrell and Bharath Hariharan},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2017},
  pages={6024-6033}
}
This paper presents a novel yet intuitive approach to unsupervised feature learning. [...] Key Method Specifically, we use unsupervised motion-based segmentation on videos to obtain segments, which we use as pseudo ground truth to train a convolutional network to segment objects from a single frame. Given the extensive evidence that motion plays a key role in the development of the human visual system, we hope that this straightforward approach to unsupervised learning will be more effective than cleverly…Expand
OCEAN: Object-centric arranging network for self-supervised visual representations learning
TLDR
This paper learns the correct arrangement of object proposals to represent an image using a convolutional neural network without any manual annotations, and discovers the representation that considers overlap, inclusion, and exclusion relationship of proposals as well as their relative position. Expand
A Classification approach towards Unsupervised Learning of Visual Representations
TLDR
A model for foreground and background classification task, in the process of which it learns visual representations, is trained which is close to the best performing unsupervised feature learn- ing technique whereas better than many other proposed al- gorithms. Expand
Unsupervised Learning from Video to Detect Foreground Objects in Single Images
TLDR
A student pathway is trained, consisting of a deep neural network that learns to predict, from a single input image, the output of a teacher pathway that performs unsupervised object discovery in video, that achieves state of the art results on two current benchmarks, YouTube Objects and Object Discovery datasets. Expand
CortexNet: a Generic Network Family for Robust Visual Temporal Representations
TLDR
Inspired by the human visual system, a deep neural network family, CortexNet, is proposed, which features not only bottom-up feed-forward connections, but also it models the abundant top-down feedback and lateral connections, which are present in the authors' visual cortex. Expand
Learning Visual Features Under Motion Invariance
TLDR
It is claimed that processing visual streams naturally leads to formulate the motion invariance principle, which enables the construction of a new theory of learning that originates from variational principles, just like in physics. Expand
Disentangling Motion, Foreground and Background Features in Videos
TLDR
Qualitative results indicate that the network can successfully update the foreground appearance based on pure-motion features and the benefits of these learned features are shown in a discriminative classification task. Expand
Momentum Contrast for Unsupervised Visual Representation Learning
We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and aExpand
Unsupervised Representation Learning by Predicting Image Rotations
TLDR
This work proposes to learn image features by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input, and demonstrates both qualitatively and quantitatively that this apparently simple task actually provides a very powerful supervisory signal for semantic feature learning. Expand
Integrating low-level motion cues in deep video saliency
TLDR
This thesis investigates the importance of motion when predicting saliency in videos and proposes a simple implementation for the generation of saliency maps using previously extracted static and dynamic information from the images. Expand
Self-Supervised Visual Representation Learning from Hierarchical Grouping
We create a framework for bootstrapping visual representation learning from a primitive visual grouping capability. We operationalize grouping via a contour detector that partitions an image intoExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 65 REFERENCES
Unsupervised Visual Representation Learning by Context Prediction
TLDR
It is demonstrated that the feature representation learned using this within-image context indeed captures visual similarity across images and allows us to perform unsupervised visual discovery of objects like cats, people, and even birds from the Pascal VOC 2011 detection dataset. Expand
Learning Image Representations Tied to Ego-Motion
TLDR
This work proposes to exploit proprioceptive motor signals to provide unsupervised regularization in convolutional neural networks to learn visual representations from egocentric video to enforce that the authors' learned features exhibit equivariance, i.e, they respond predictably to transformations associated with distinct ego-motions. Expand
Learning to See by Moving
TLDR
It is found that using the same number of training images, features learnt using egomotion as supervision compare favourably to features learnt with class-label as supervision on the tasks of scene recognition, object recognition, visual odometry and keypoint matching. Expand
Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification
TLDR
This paper forms an approach for learning a visual representation from the raw spatiotemporal signals in videos using a Convolutional Neural Network, and shows that this method captures information that is temporally varying, such as human pose. Expand
Segmentation of Moving Objects by Long Term Video Analysis
TLDR
This paper demonstrates that motion will be exploited most effectively, if it is regarded over larger time windows, and suggests working with a paradigm that starts with semi-dense motion cues first and that fills up textureless areas afterwards based on color. Expand
Unsupervised Learning of Edges
TLDR
This work presents a simple yet effective approach for training edge detectors without human supervision, and shows that when using a deep network for the edge detector, this approach provides a novel pre-training scheme for object detection. Expand
Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
TLDR
A novel unsupervised learning approach to build features suitable for object detection and classification and to facilitate the transfer of features to other tasks, the context-free network (CFN), a siamese-ennead convolutional neural network is introduced. Expand
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
TLDR
DeCAF, an open-source implementation of deep convolutional activation features, along with all associated network parameters, are released to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms. Expand
Learning to Segment Object Candidates
TLDR
A new way to generate object proposals is proposed, introducing an approach based on a discriminative convolutional network that obtains substantially higher object recall using fewer proposals and is able to generalize to unseen categories it has not seen during training. Expand
Unsupervised Learning of Visual Representations using Videos
This is a review of unsupervised learning applied to videos with the aim of learning visual representations. We look at different realizations of the notion of temporal coherence across variousExpand
...
1
2
3
4
5
...