• Publications
  • Influence
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%. Expand
Caffe: Convolutional Architecture for Fast Feature Embedding
TLDR
Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Expand
Long-term recurrent convolutional networks for visual recognition and description
TLDR
A novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and shows such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized. Expand
Large Scale GAN Training for High Fidelity Natural Image Synthesis
TLDR
It is found that applying orthogonal regularization to the generator renders it amenable to a simple "truncation trick," allowing fine control over the trade-off between sample fidelity and variety by reducing the variance of the Generator's input. Expand
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
TLDR
DeCAF, an open-source implementation of deep convolutional activation features, along with all associated network parameters, are released to enable vision researchers to be able to conduct experimentation with deep representations across a range of visual concept learning paradigms. Expand
Context Encoders: Feature Learning by Inpainting
TLDR
It is found that a context encoder learns a representation that captures not just appearance but also the semantics of visual structures, and can be used for semantic inpainting tasks, either stand-alone or as initialization for non-parametric methods. Expand
Sequence to Sequence -- Video to Text
TLDR
A novel end- to-end sequence-to-sequence model to generate captions for videos that naturally is able to learn the temporal structure of the sequence of frames as well as the sequence model of the generated sentences, i.e. a language model. Expand
Adversarial Feature Learning
TLDR
Bidirectional Generative Adversarial Networks are proposed as a means of learning the inverse mapping of GANs, and it is demonstrated that the resulting learned feature representation is useful for auxiliary supervised discrimination tasks, competitive with contemporary approaches to unsupervised and self-supervised feature learning. Expand
Region-Based Convolutional Networks for Accurate Object Detection and Segmentation
TLDR
A simple and scalable detection algorithm that improves mean average precision (mAP) by more than 50 percent relative to the previous best result on VOC 2012-achieving a mAP of 62.4 percent. Expand
Part-Based R-CNNs for Fine-Grained Category Detection
TLDR
This work proposes a model for fine-grained categorization that overcomes limitations by leveraging deep convolutional features computed on bottom-up region proposals, and learns whole-object and part detectors, enforces learned geometric constraints between them, and predicts a fine- grained category from a pose-normalized representation. Expand
...
1
2
3
4
5
...