• Publications
  • Influence
An overview of gradient descent optimization algorithms
TLDR
This article looks at different variants of gradient descent, summarize challenges, introduce the most common optimization algorithms, review architectures in a parallel and distributed setting, and investigate additional strategies for optimizing gradient descent. Expand
Universal Language Model Fine-tuning for Text Classification
TLDR
This work proposes Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduces techniques that are key for fine- Tuning a language model. Expand
An Overview of Multi-Task Learning in Deep Neural Networks
TLDR
This article seeks to help ML practitioners apply MTL by shedding light on how MTL works and providing guidelines for choosing appropriate auxiliary tasks, particularly in deep neural networks. Expand
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
TLDR
The Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark is introduced, a multi-task benchmark for evaluating the cross-lingually generalization capabilities of multilingual representations across 40 languages and 9 tasks. Expand
On the Cross-lingual Transferability of Monolingual Representations
TLDR
This work designs an alternative approach that transfers a monolingual model to new languages at the lexical level and shows that it is competitive with multilingual BERT on standard cross-lingUAL classification benchmarks and on a new Cross-lingual Question Answering Dataset (XQuAD). Expand
On the Limitations of Unsupervised Bilingual Dictionary Induction
TLDR
It is shown that a simple trick, exploiting a weak supervision signal from identical words, enables more robust induction and establishes a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric. Expand
A Survey of Cross-lingual Word Embedding Models
TLDR
A comprehensive typology of cross-lingual word embedding models is provided, showing that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent modulo optimization strategies, hyper-parameters, and such. Expand
Fine-tuned Language Models for Text Classification
TLDR
Fine-tuned Language Models (FitLaM) is proposed, an effective transfer learning method that can be applied to any task in NLP, and techniques that are key for fine-tuning a state-of-the-art language model are introduced. Expand
To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks
TLDR
The empirical results across diverse NLP tasks with two state-of-the-art models show that the relative performance of fine-tuning vs. feature extraction depends on the similarity of the pretraining and target tasks. Expand
Episodic Memory in Lifelong Language Learning
TLDR
This work proposes an episodic memory model that performs sparse experience replay and local adaptation to mitigate catastrophic forgetting in a lifelong language learning setup where a model needs to learn from a stream of text examples without any dataset identifier. Expand
...
1
2
3
4
5
...