• Publications
  • Influence
Graph-Based Universal Dependency Parsing in the Age of the Transformer: What Works, and What Doesn't
TLDR
It is found that pre-trained embeddings have by far the greatest and most clear-cut impact on parser performance, and the choice of factorized vs. unfactorized architectures and a multi-task training setup affect parsing accuracy in more subtle ways.
RobertNLP at the IWPT 2020 Shared Task: Surprisingly Simple Enhanced UD Parsing for English
TLDR
This paper presents a biaffine classifier architecture which operates directly on finetuned RoBERTa embeddings and achieves a very high parsing accuracy, ranking 1st out of 10 with an ELAS F1 score of 88.94%.
Applying Occam’s Razor to Transformer-Based Dependency Parsing: What Works, What Doesn’t, and What is Really Necessary
TLDR
STEPS, a new modular graph-based dependency parser, is introduced and it is found that the choice of pre-trained embeddings has by far the greatest impact on parser performance and XLM-R as a robust choice across the languages in this study.
Unifying the Treatment of Preposition-Determiner Contractions in German Universal Dependencies Treebanks
TLDR
It is shown that harmonizing corpora with regard to preposition-determiner contractions as multi-word tokens using a lookup-table leads to a considerable increase in automatic parsing performance.
RobertNLP at the IWPT 2021 Shared Task: Simple Enhanced UD Parsing for 17 Languages
TLDR
This system consists of an unfactorized biaffine classifier that operates directly on fine-tuned XLM-R embeddings and generates enhanced UD graphs by predicting the best dependency label (or absence of a dependency) for each pair of tokens.
Generalized chart constraints for efficient PCFG and TAG parsing
TLDR
These constraints accelerate both PCFG and TAG parsing, and combine effectively with other pruning techniques (coarse-to-fine and supertagging) for an overall speedup of two orders of magnitude, while improving accuracy.
A Corpus Study of Creating Rule-Based Enhanced Universal Dependencies for German
TLDR
This paper develops a rule-based system for deriving enhanced dependencies from the basic layer, covering three linguistic phenomena: relative clauses, coordination, and raising/control, and shows that the English system is in general applicable to German data, but that adapting to the particularities of the German treebanks and language increases precision and recall by up to 10%.
Annotation and Classification of Locations in Folktales
In the context of a software project dedicated to the automated classification of folk and fairy tales, we focused on their segmentation by scenes and their respective locations. In contrast to