DeepFace: Closing the Gap to Human-Level Performance in Face Verification
- Yaniv Taigman, Ming Yang, M. Ranzato, Lior Wolf
- Computer ScienceIEEE Conference on Computer Vision and Pattern…
- 1 June 2014
This work revisits both the alignment step and the representation step by employing explicit 3D face modeling in order to apply a piecewise affine transformation, and derive a face representation from a nine-layer deep neural network.
Word Translation Without Parallel Data
- Alexis Conneau, Guillaume Lample, M. Ranzato, Ludovic Denoyer, Herv'e J'egou
- Computer ScienceInternational Conference on Learning…
- 11 October 2017
It is shown that a bilingual dictionary can be built between two languages without using any parallel corpora, by aligning monolingual word embedding spaces in an unsupervised way.
Large Scale Distributed Deep Networks
- J. Dean, G. Corrado, A. Ng
- Computer ScienceNIPS
- 3 December 2012
This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.
Gradient Episodic Memory for Continual Learning
- David Lopez-Paz, M. Ranzato
- Computer ScienceNIPS
- 1 June 2017
A model for continual learning, called Gradient Episodic Memory (GEM) is proposed that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks.
DeViSE: A Deep Visual-Semantic Embedding Model
- Andrea Frome, G. Corrado, Tomas Mikolov
- Computer ScienceNIPS
- 5 December 2013
This paper presents a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text and shows that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training.
Sequence Level Training with Recurrent Neural Networks
- M. Ranzato, S. Chopra, Michael Auli, Wojciech Zaremba
- Computer ScienceInternational Conference on Learning…
- 20 November 2015
This work proposes a novel sequence level training algorithm that directly optimizes the metric used at test time, such as BLEU or ROUGE, and outperforms several strong baselines for greedy generation.
Efficient Lifelong Learning with A-GEM
- Arslan Chaudhry, M. Ranzato, Marcus Rohrbach, Mohamed Elhoseiny
- Computer ScienceInternational Conference on Learning…
- 27 September 2018
An improved version of GEM is proposed, dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC and other regularization-based methods.
Unsupervised Machine Translation Using Monolingual Corpora Only
- Guillaume Lample, Ludovic Denoyer, M. Ranzato
- Computer ScienceInternational Conference on Learning…
- 31 October 2017
This work proposes a model that takes sentences from monolingual corpora in two different languages and maps them into the same latent space and effectively learns to translate without using any labeled data.
What is the best multi-stage architecture for object recognition?
- Kevin Jarrett, K. Kavukcuoglu, M. Ranzato, Yann LeCun
- Computer ScienceIEEE International Conference on Computer Vision
- 1 September 2009
It is shown that using non-linearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks and that two stages of feature extraction yield better accuracy than one.
Phrase-Based & Neural Unsupervised Machine Translation
- Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, M. Ranzato
- Computer ScienceConference on Empirical Methods in Natural…
- 20 April 2018
This work investigates how to learn to translate when having access to only large monolingual corpora in each language, and proposes two model variants, a neural and a phrase-based model, which are significantly better than methods from the literature, while being simpler and having fewer hyper-parameters.
...
...