Corpus ID: 237532574

Revisiting Tri-training of Dependency Parsers

  title={Revisiting Tri-training of Dependency Parsers},
  author={Joachim Wagner and Jennifer Foster},
We compare two orthogonal semi-supervised learning techniques, namely tri-training and pretrained word embeddings, in the task of dependency parsing. We explore languagespecific FastText and ELMo embeddings and multilingual BERT embeddings. We focus on a low resource scenario as semi-supervised learning can be expected to have the most impact here. Based on treebank size and available ELMo models, we select Hungarian, Uyghur (a zero-shot language for mBERT) and Vietnamese. Furthermore, we… Expand

Figures and Tables from this paper


Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing
We present an extensive evaluation of three recently proposed methods for contextualized embeddings on 89 corpora in 54 languages of the Universal Dependencies 2.3 in three tasks: POS tagging,Expand
Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation
This paper describes the system (HIT-SCIR) submitted to the CoNLL 2018 shared task on Multilingual Parsing from Raw Text to Universal Dependencies, which was ranked first according to LAS and outperformed the other systems by a large margin. Expand
Deep Contextualized Word Representations
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals. Expand
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
The task and evaluation methodology is defined, how the data sets were prepared, report and analyze the main results, and a brief categorization of the different approaches of the participating systems are provided. Expand
Ensemble Self-Training for Low-Resource Languages: Grapheme-to-Phoneme Conversion and Morphological Inflection
An iterative data augmentation framework, which trains and searches for an optimal ensemble and simultaneously annotates new training data in a self-training style, which works especially well on low-resource languages. Expand
Semi-Supervised Sequence Modeling with Cross-View Training
Cross-View Training (CVT), a semi-supervised learning algorithm that improves the representations of a Bi-LSTM sentence encoder using a mix of labeled and unlabeled data, is proposed and evaluated, achieving state-of-the-art results. Expand
CoNLL-X Shared Task on Multilingual Dependency Parsing
How treebanks for 13 languages were converted into the same dependency format and how parsing performance was measured is described and general conclusions about multi-lingual parsing are drawn. Expand
Semi-Supervised Learning on Meta Structure: Multi-Task Tagging and Parsing in Low-Resource Scenarios
This work proposes a semi-supervised learning approach based on multi-view models through consensus promotion, and investigates whether this improves overall performance, and proposes a single-view model on top of the unified representation. Expand
Mind the Gap: Data Enrichment in Dependency Parsing of Elliptical Constructions
This paper reports on several experiments in enrichment of training data for this specific construction, and demonstrates small improvements over the CoNLL-17 parsing shared task winning system for four of the five languages, not only restricted to the elliptical constructions. Expand
Universal dependencies for Uyghur
This paper presents the mapping of the Uyghur dependency treebank’s labelling scheme to the UD scheme, along with a clear description of the structural changes required in this conversion. Expand