Unsupervised Learning of Visual Representations Using Videos

@article{Wang2015UnsupervisedLO,
  title={Unsupervised Learning of Visual Representations Using Videos},
  author={X. Wang and A. Gupta},
  journal={2015 IEEE International Conference on Computer Vision (ICCV)},
  year={2015},
  pages={2794-2802}
}
  • X. Wang, A. Gupta
  • Published 2015
  • Computer Science
  • 2015 IEEE International Conference on Computer Vision (ICCV)
  • Is strong supervision necessary for learning a good visual representation. [...] Key Method We design a Siamese-triplet network with a ranking loss function to train this CNN representation. Without using a single image from ImageNet, just using 100K unlabeled videos and the VOC 2012 dataset, we train an ensemble of unsupervised networks that achieves 52% mAP (no bounding box regression). This performance comes tantalizingly close to its ImageNet-supervised counterpart, an ensemble which achieves a mAP of 54.4…Expand Abstract
    340 Citations
    OCEAN: Object-centric arranging network for self-supervised visual representations learning
    • 2
    • Highly Influenced
    • PDF
    Unsupervised learning from videos using temporal coherency deep networks
    • 7
    • Highly Influenced
    • PDF
    Leveraging Inexpensive Supervision Signals for Visual Learning
    • PDF
    Unsupervised Learning using Sequential Verification for Action Recognition
    • 48
    • Highly Influenced
    • PDF
    Deep unsupervised learning of visual similarities
    • 24
    • Highly Influenced
    • PDF
    Learning Deep Representations Using Convolutional Auto-Encoders with Symmetric Skip Connections
    • 13
    • Highly Influenced
    • PDF
    Unsupervised Deep Learning by Neighbourhood Discovery
    • 35
    • PDF
    A Classification approach towards Unsupervised Learning of Visual Representations
    • Highly Influenced
    • PDF

    References

    SHOWING 1-10 OF 62 REFERENCES
    Unsupervised Visual Representation Learning by Context Prediction
    • 1,051
    • PDF
    Unsupervised Learning of Video Representations using LSTMs
    • 1,564
    • PDF
    Building high-level features using large scale unsupervised learning
    • 1,943
    • PDF
    Deep Metric Learning Using Triplet Network
    • 866
    • PDF
    Computational Baby Learning
    • 27
    • Highly Influential
    • PDF
    Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations
    • 2,296
    • PDF
    Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
    • 12,834
    • Highly Influential
    • PDF
    Convolutional Learning of Spatio-temporal Features
    • 551
    • PDF
    Unsupervised learning of invariant features using video
    • David Stavens, S. Thrun
    • Computer Science
    • 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
    • 2010
    • 45
    • PDF