• Publications
  • Influence
Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems
The performance of Chinese Whispers is measured on Natural Language Processing (NLP) problems as diverse as language separation, acquisition of syntactic word classes and word sense disambiguation.
UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures
This work uses a simple log-linear regression model, trained on the training data, to combine multiple text similarity measures of varying complexity, which range from simple character and word n-grams and common subsequences to complex features such as Explicit Semantic Analysis vector comparisons and aggregation of word similarity based on lexical-semantic resources.
Do Supervised Distributional Methods Really Learn Lexical Inference Relations?
This work investigates a collection of distributional representations of words used in supervised settings for recognizing lexical inference relations between word pairs, and shows that they do not actually learn a relation between two words, but an independent property of a single word in the pair.
WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations
WebAnno offers annotation project management, freely configurable tagsets and the management of users in different roles, and the architecture design allows adding additional modes of visualization and editing, when new kinds of annotations are to be supported.
TopicTiling: A Text Segmentation Algorithm based on LDA
This work presents a Text Segmentation algorithm called TopicTiling, which is based on the well-known TextTiling algorithm, and segments documents using the Latent Dirichlet Allocation topic model, and is computationally less expensive than other LDA-based segmentation methods.
Corpus Portal for Search in Monolingual Corpora
A simple and flexible schema for storing and presenting monolingual language resources is proposed to ease the application of algorithms for monolingUAL and interlingual studies.
A Report on the Complex Word Identification Shared Task 2018
The second CWI shared task featured multilingual and multi-genre datasets divided into four tracks, two tasks: binary classification and probabilistic classification and a total of 12 teams submitted their results in different task/track combinations.
NoSta-D Named Entity Annotation for German: Guidelines and Dataset
The approach to creating annotation guidelines based on linguistic and semantic considerations is described, and how they were iteratively refined and tested in the early stages of annotation to arrive at the largest publicly available dataset for German NER, consisting of over 31,000 manually annotated sentences from German Wikipedia and German online news.
GermEval 2014 Named Entity Recognition Shared Task
This paper describes the GermEval 2014 Named Entity Recognition (NER) Shared Task workshop at KONVENS. It provides background information on the motivation of this task, the data-set, the evaluation
TAXI at SemEval-2016 Task 13: a Taxonomy Induction Method based on Lexico-Syntactic Patterns, Substrings and Focused Crawling
This work presents a system for taxonomy construction that reached the first place in all subtasks of the SemEval 2016 challenge on Taxonomy Extraction Evaluation and shows that this method outperforms more complex and knowledge-rich approaches on most domains and languages.