• Publications
  • Influence
Universal Dependencies 2.1
TLDR
The annotation scheme is based on (universal) Stanford dependencies, Google universal part-of-speech tags, and the Interset interlingua for morphosyntactic tagsets for morpho-lingual tagsets. Expand
Universal Dependencies 2.0 – CoNLL 2017 Shared Task Development and Test Data
TLDR
This release contains the test data used in the CoNLL 2017 shared task on parsing Universal Dependencies, and complements the UD 2.0 release with 18 new parallel test sets and 4 test sets in surprise languages. Expand
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multi-word expressions. We present the annotation methodology, focusing on changes from last year's shared task.Expand
The JOS Linguistically Tagged Corpus of Slovene
TLDR
The paper introduces these components of the JOS language resources, and concentrates on jos100k, a 100,000 word sampled balanced monolingual Slovene corpus, manually annotated for three levels of linguistic description. Expand
Cross-lingual Dependency Parsing of Related Languages with Rich Morphosyntactic Tagsets
TLDR
It is argued for the benefits of using rich morphosyntactic tagsets in cross-lingual parsing and empirically support the claim by showing large improvements over an impoverished common feature representation in form of a reduced part-of-speech tagset. Expand
Universal Dependencies 1.4
TLDR
The annotation scheme is based on (universal) Stanford dependencies, Google universal part-of-speech tags, and the Interset interlingua for morphosyntactic tagsets for morpho-lingual tagsets. Expand
A Quantitative Evaluation of Word Sketches
TLDR
This paper presents a formal evaluation for Dutch, English, Japanese and Slovene of word sketches used for lexicography by a number of publishers. Expand
Compilation, transcription and usage of a reference speech corpus: the case of the Slovene corpus GOS
TLDR
The corpus structure and fieldwork experiences with recording, labelling system, and two levels of transcription (pronunciation-based and standardized) are described, as well as the main characteristics of the corpus interface (web concordancer) and the availability of the original corpus files. Expand
Automation of lexicographic work: an opportunity for both lexicographers and crowd-sourcing
TLDR
A slightly revised version of the approach envisaged by Rundell and Kilgarriff in which the validation of data is left to lower-level linguists or crowd-sourcing, whereas high-level tasks such as meaning description remain the domain of lexicographers is proposed. Expand
A Multilingual Social Media Linguistic Corpus
This paper focuses on multilingual social media and introduces the xLiMe Twitter Corpus that contains messages in German, Italian and Spanish manually annotated with Part-of-Speech, Named Entities,Expand
...
1
2
3
4
5
...