• Publications
  • Influence
Attention is All you Need
TLDR
We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Expand
  • 15,924
  • 3898
  • PDF
Natural Questions: A Benchmark for Question Answering Research
TLDR
We present the Natural Questions corpus, a question answering data set. Expand
  • 322
  • 47
  • PDF
Tensor2Tensor for Neural Machine Translation
TLDR
Tensor2Tensor is a library for deep learning models well-suited for neural machine translation and includes the reference implementation of the state-of-the-art Transformer model. Expand
  • 287
  • 32
  • PDF
Character-Level Language Modeling with Deeper Self-Attention
TLDR
In this paper, we show that a deep (64-layer) transformer model with fixed context outperforms RNN variants by a large margin, achieving state of the art on two popular benchmarks. Expand
  • 128
  • 17
  • PDF
WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia
TLDR
We present WIKIREADING, a large-scale natural language understanding task and publicly-available dataset with 18 million instances. Expand
  • 106
  • 17
  • PDF
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
TLDR
The past year has witnessed rapid advances in sequence-to-sequence (seq2seq) modeling for Machine Translation (MT). Expand
  • 243
  • 16
  • PDF
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling
TLDR
Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Expand
  • 82
  • 10
  • PDF
One Model To Learn Them All
TLDR
We present a single model that yields good results on a number of problems spanning multiple domains that can simultaneously learn multiple tasks from various domains. Expand
  • 189
  • 8
  • PDF
ProtTrans: Towards Cracking the Language of Life’s Code Through Self-Supervised Deep Learning and High Performance Computing
TLDR
We trained two auto-regressive language models (Transformer-XL, XLNet) on data from UniRef and BFD containing up to 393 billion amino acids (words) from 2.1 billion protein sequences. Expand
  • 24
  • 3
  • PDF
Accurate Supervised and Semi-Supervised Machine Reading for Long Documents
TLDR
We introduce a hierarchical architecture for machine reading capable of extracting precise information from long documents. Expand
  • 20
  • 1
  • PDF