Corpus ID: 237485445

Genre as Weak Supervision for Cross-lingual Dependency Parsing

  title={Genre as Weak Supervision for Cross-lingual Dependency Parsing},
  author={Max Muller-Eberstein and Rob van der Goot and Barbara Plank},
Recent work has shown that monolingual masked language models learn to represent data-driven notions of language variation which can be used for domain-targeted training data selection. Dataset genre labels are already frequently available, yet remain largely unexplored in cross-lingual setups. We harness this genre metadata as a weak supervision signal for targeted data selection in zeroshot dependency parsing. Specifically, we project treebank-level genre information to the finer-grained… Expand

Figures and Tables from this paper


Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection
Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. The annotationExpand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Universal Dependencies v1: A multilingual
  • 2016
Presenting TWITTIRÒ-UD: An Italian Twitter Treebank in Universal Dependencies
Adding the Universal Dependencies format to the fine-grained annotation for irony, that was previously applied on TWITTIRÒ, might meaningfully help in the investigation of possible relationships between syntax and semantics of the uses of figurative language, irony in particular. Expand
PoSTWITA-UD: an Italian Twitter Treebank in Universal Dependencies
The development of PoSTWITA-UD is proposed, a collection of tweets annotated according to a well-known dependency-based annotation format: the Universal Dependencies, creating a resource that can be exploited for the training of NLP systems so as to enhance their performance on social media texts. Expand
The First Komi-Zyrian Universal Dependencies Treebanks
Two Komi-Zyrian treebanks were included in the Universal Dependencies 2.2 release. This article contextualizes the treebanks, discusses the process through which they were created, and outlines theExpand
Automatic Detection of Text Genre
A theory of genres as bundles of facets, which correlate with various surface cues, are proposed, and it is argued that genre detection based on surface cues is as successful as Detection based on deeper structural properties. Expand
Massive Choice, Ample Tasks (MaChAmp): A Toolkit for Multi-task Learning in NLP
MaChAmp is presented, a toolkit for easy fine-tuning of contextualized embeddings in multi-task settings and the benefits are its flexible configuration options, and the support of a variety of natural language processing tasks in a uniform toolkit. Expand
Font Awesome Icons
  • CC-BY 4.0 License.
  • 2021
Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media
This work pretrain two models on tweets and forum text respectively, and empirically demonstrate the effectiveness of these two resources, and investigates how similarity measures can be used to nominate in-domain pretraining data. Expand