• Publications
  • Influence
Character-Level Language Modeling with Deeper Self-Attention
TLDR
In this paper, we show that a deep (64-layer) transformer model with fixed context outperforms RNN variants by a large margin, achieving state of the art on two popular benchmarks. Expand
  • 122
  • 17
  • PDF
Multilingual Universal Sentence Encoder for Semantic Retrieval
TLDR
We introduce two pre-trained retrieval focused multilingual sentence encoding models, respectively based on the Transformer and CNN model architectures. Expand
  • 63
  • 15
  • PDF
Effective Parallel Corpus Mining using Bilingual Sentence Embeddings
TLDR
We propose a novel method for training bilingual sentence embeddings that proves useful for parallel corpus mining of parallel data. Expand
  • 47
  • 3
  • PDF
Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax
TLDR
We present an approach to learn multilingual sentence embeddings using a bi-directional dual-encoder with additive margin softmax. Expand
  • 28
  • 1
  • PDF
Hierarchical Document Encoder for Parallel Corpus Mining
TLDR
We explore using multilingual document embeddings for nearest neighbor mining of parallel data. Expand
  • 8
  • 1
  • PDF
MultiReQA: A Cross-Domain Evaluation for Retrieval Question Answering Models
TLDR
We provide the first systematic retrieval based evaluation over these datasets using two supervised neural models, based on fine-tuning BERT andUSE-QA, as well as a surprisingly strong information retrieval baseline,BM25. Expand
  • 7
  • 1
  • PDF
Bridging the Gap for Tokenizer-Free Language Models
TLDR
We train a vanilla transformer network with 40 self-attention layers on the One Billion Word (lm1b) benchmark and achieve a new state of the art for tokenizer-free LMs, pushing these models to be on par with their word-based counterparts. Expand
  • 2
  • PDF
Wiki-40B: Multilingual Language Model Dataset
TLDR
We propose a new multilingual language model benchmark that is composed of 40+ languages spanning several scripts and linguistic families. Expand
  • 1
  • PDF
TextSETTR: Label-Free Text Style Extraction and Tunable Targeted Restyling
TLDR
We present a novel approach to the problem of text style transfer. Expand
Document Encoder Pooling Dot Product Sent Enc Pooling Sent Enc Pooling Sent Enc Pooling DNN DNN DNN Sentence Encoder Pooling Pooling Dot Product Sentence Encoder Document Encoder Pooling Sentence
We explore using multilingual document embeddings for nearest neighbor mining of parallel data. Three document-level representations are investigated: (i) document embeddings generated by simplyExpand