• Publications
  • Influence
Going deeper with convolutions
We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition
SSD: Single Shot MultiBox Detector
TLDR
The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.
Generative Adversarial Text to Image Synthesis
TLDR
A novel deep architecture and GAN formulation is developed to effectively bridge advances in text and image modeling, translating visual concepts from characters to pixels.
Evaluation of output embeddings for fine-grained image classification
TLDR
This project shows that compelling classification performance can be achieved on fine-grained categories even without labeled training data, and establishes a substantially improved state-of-the-art on the Animals with Attributes and Caltech-UCSD Birds datasets.
Training Deep Neural Networks on Noisy Labels with Bootstrapping
TLDR
A generic way to handle noisy and incomplete labeling by augmenting the prediction objective with a notion of consistency is proposed, which considers a prediction consistent if the same prediction is made given similar percepts, where the notion of similarity is between deep network features computed from the input data.
Learning Deep Representations of Fine-Grained Visual Descriptions
TLDR
This model achieves strong performance on zero-shot text-based image retrieval and significantly outperforms the attribute-based state-of-the-art for zero- shot classification on the Caltech-UCSD Birds 200-2011 dataset.
Deep Visual Analogy-Making
TLDR
A novel deep network trained end-to-end to perform visual analogy making, which is the task of transforming a query image according to an example pair of related images, is developed.
Learning What and Where to Draw
TLDR
This work proposes a new model, the Generative Adversarial What-Where Network (GAWWN), that synthesizes images given instructions describing what content to draw in which location, and shows high-quality 128 x 128 image synthesis on the Caltech-UCSD Birds dataset.
Neural Arithmetic Logic Units
TLDR
Experiments show that NALU-enhanced neural networks can learn to track time, perform arithmetic over images of numbers, translate numerical language into real-valued scalars, execute computer code, and count objects in images.
Scalable, High-Quality Object Detection
TLDR
It is demonstrated that learning-based proposal methods can effectively match the performance of hand-engineered methods while allowing for very efficient runtime-quality trade-offs.
...
...