• Publications
  • Influence
Learning Accurate, Compact, and Interpretable Tree Annotation
We present an automatic approach to tree annotation in which basic nonterminal symbols are alternately split and merged to maximize the likelihood of a training treebank. Starting with a simple X-bar
Universal Dependencies v1: A Multilingual Treebank Collection
TLDR
This paper describes v1 of the universal guidelines, the underlying design principles, and the currently available treebanks for 33 languages, as well as highlighting the needs for sound comparative evaluation and cross-lingual learning experiments.
Natural Questions: A Benchmark for Question Answering Research
TLDR
The Natural Questions corpus, a question answering data set, is presented, introducing robust metrics for the purposes of evaluating question answering systems; demonstrating high human upper bounds on these metrics; and establishing baseline results using competitive methods drawn from related literature.
A Universal Part-of-Speech Tagset
TLDR
This work proposes a tagset that consists of twelve universal part-of-speech categories and develops a mapping from 25 different treebank tagsets to this universal set, which when combined with the original treebank data produces a dataset consisting of common parts- of-speech for 22 different languages.
Grammar as a Foreign Language
TLDR
The domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers.
Globally Normalized Transition-Based Neural Networks
We introduce a globally normalized transition-based neural network model that achieves state-of-the-art part-of-speech tagging, dependency parsing and sentence compression results. Our model is a
Universal Dependency Annotation for Multilingual Parsing
TLDR
A new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean is presented, made freely available in order to facilitate research on multilingual dependency parsing.
Improved Inference for Unlexicalized Parsing
TLDR
A novel coarse-to-fine method in which a grammar’s own hierarchical projections are used for incremental pruning, including a method for efficiently computing projections of a grammar without a treebank is presented.
Multi-Source Transfer of Delexicalized Dependency Parsers
TLDR
This work demonstrates that delexicalized parsers can be directly transferred between languages, producing significantly higher accuracies than unsupervised parsers and shows that simple methods for introducing multiple source languages can significantly improve the overall quality of the resulting parsers.
CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
TLDR
This overview paper defines the task and the updated evaluation methodology, describes data preparation, report and analyze the main results, and provides a brief categorization of the different approaches of the participating systems.
...
...