• Publications
  • Influence
Findings of the VarDial Evaluation Campaign 2017
We present the results of the VarDial Evaluation Campaign on Natural Language Processing (NLP) for Similar Languages, Varieties and Dialects, which we organized as part of the fourth edition of theExpand
  • 114
  • 16
Compilation of a Swiss German Dialect Corpus and its Application to PoS Tagging
Swiss German is a dialect continuum whose dialects are very different from Standard German, the official language of the German part of Switzerland. However, dealing with Swiss German in naturalExpand
  • 20
  • 2
Building a Parallel Corpus on the World's Oldest Banking Magazine
We report on our processing steps to build a diachronic parallel corpus based on the world's oldest banking magazine. The magazine has been published since 1895 in German, with translations in FrenchExpand
  • 7
  • 2
A Resource for Natural Language Processing of Swiss German Dialects
Since there are only a few resources for Swiss German dialects, we compiled a corpus of 115,000 tokens, manually annotated with PoStags. The goal is to provide a basic data set for developing NLPExpand
  • 16
  • 1
Part-of-Speech Tag Disambiguation by Cross-Linguistic Majority Vote
In this paper, we present an approach to developing resources for a low-resource language, taking advantage of the fact that it is closely related to languages with more resources. In particular, weExpand
  • 8
  • 1
Parsing Approaches for Swiss German
This paper presents different approaches towards universal dependency parsing for Swiss German. Dealing with dialects is a challenging task in Natural Language Processing because of the hugeExpand
Abstracts of Oral and Poster Presentations of the 3rd Swiss Text Analytics Conference (SwissText 2018)
s of Oral and Poster Presentations of the 3rd Swiss Text Analytics Conference (SwissText 2018)
Reconstructing Complete Lemmas for Incomplete German Compounds
This paper discusses elliptical compounds, which are frequently used in German in order to avoid repetitions. This phenomenon involves truncated words, mostly truncated compounds. These words pose aExpand
Approaching SMM4H with Merged Models and Multi-task Learning
We describe our submissions to the 4th edition of the Social Media Mining for Health Applications (SMM4H) shared task. Our team (UZH) participated in two sub-tasks: Automatic classifications ofExpand
Crowdsourcing Swiss Dialect Transcriptions for Assessing Factors in Writing Variations
In this paper, we systematically analyze writing variations of Swiss German in two existing corpora with standard German glosses, a corpus of 10,000 short text messages and a corpus of transcribedExpand