• Publications
  • Influence
Context-Aware Neural Machine Translation Learns Anaphora Resolution
TLDR
We introduce a context-aware neural machine translation model designed in such way that the flow of information from the extended context to the translation model can be controlled and analyzed. Expand
  • 127
  • 39
  • PDF
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
TLDR
We evaluate the contribution made by individual attention heads to the overall performance of the model and analyze the roles played by them in the encoder. Expand
  • 182
  • 25
  • PDF
When a Good Translation is Wrong in Context: Context-Aware Machine Translation Improves on Deixis, Ellipsis, and Lexical Cohesion
TLDR
We perform a human study on an English-Russian subtitles dataset and identify deixis, ellipsis and lexical cohesion as three main sources of inconsistency. Expand
  • 50
  • 16
  • PDF
BPE-Dropout: Simple and Effective Subword Regularization
TLDR
We introduce BPE-dropout – a simple and effective subword regularization method based on and compatible with conventional BPE. Expand
  • 31
  • 10
  • PDF
Context-Aware Monolingual Repair for Neural Machine Translation
TLDR
We propose a monolingual DocRepair model to correct inconsistencies between sentence-level translations, refining translations of sentences in context of each other. Expand
  • 26
  • 8
  • PDF
A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine Translation
TLDR
The translation of pronouns presents a special challenge to machine translation to this day, since it often requires context outside the current sentence. Expand
  • 48
  • 7
  • PDF
The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives
TLDR
We seek to understand how the representations of individual tokens and the structure of the learned feature space evolve between layers in deep neural networks under different learning objectives. Expand
  • 39
  • 7
  • PDF
Information-Theoretic Probing with Minimum Description Length
TLDR
We propose an alternative to the standard probes, information-theoretic probing with minimum description length (MDL). Expand
  • 33
  • 4
  • PDF
Sequence Modeling with Unconstrained Generation Order
TLDR
We propose a neural sequence model that can generate the output sequence by inserting tokens in any arbitrary order via iterative insertion operations. Expand
  • 6
  • PDF
Embedding Words in Non-Vector Space with Unsupervised Graph Learning
TLDR
We introduce GraphGlove: unsupervised graph word representations which are learned end-to-end. Expand