• Publications
  • Influence
DeepFace: Closing the Gap to Human-Level Performance in Face Verification
TLDR
This work revisits both the alignment step and the representation step by employing explicit 3D face modeling in order to apply a piecewise affine transformation, and derive a face representation from a nine-layer deep neural network.
Word Translation Without Parallel Data
TLDR
It is shown that a bilingual dictionary can be built between two languages without using any parallel corpora, by aligning monolingual word embedding spaces in an unsupervised way.
Large Scale Distributed Deep Networks
TLDR
This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.
DeViSE: A Deep Visual-Semantic Embedding Model
TLDR
This paper presents a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text and shows that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training.
Gradient Episodic Memory for Continual Learning
TLDR
A model for continual learning, called Gradient Episodic Memory (GEM) is proposed that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks.
Sequence Level Training with Recurrent Neural Networks
TLDR
This work proposes a novel sequence level training algorithm that directly optimizes the metric used at test time, such as BLEU or ROUGE, and outperforms several strong baselines for greedy generation.
Unsupervised Machine Translation Using Monolingual Corpora Only
TLDR
This work proposes a model that takes sentences from monolingual corpora in two different languages and maps them into the same latent space and effectively learns to translate without using any labeled data.
What is the best multi-stage architecture for object recognition?
TLDR
It is shown that using non-linearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks and that two stages of feature extraction yield better accuracy than one.
Efficient Lifelong Learning with A-GEM
TLDR
An improved version of GEM is proposed, dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC and other regularization-based methods.
Phrase-Based & Neural Unsupervised Machine Translation
TLDR
This work investigates how to learn to translate when having access to only large monolingual corpora in each language, and proposes two model variants, a neural and a phrase-based model, which are significantly better than methods from the literature, while being simpler and having fewer hyper-parameters.
...
1
2
3
4
5
...