Simon Clematide

Learn More
In this paper, we describe MLSA, a publicly available multi-layered reference corpus for German-language sentiment analysis. The construction of the corpus is based on the manual annotation of 270 German-language sentences considering three different layers of granularity. The sentence-layer annotation, as the most coarse-grained annotation, focuses on(More)
We have explored and implemented different approaches to named entity recognition in German, a difficult task in this language since both regular nouns and proper names are capitalized. Our goal is to identify and recognise person names, geographical names and company names in a computer magazine corpus. Our geographical name classifier works with(More)
We describe a system for the detection of mentions of protein-protein interactions in the biomedical scientific literature. The original system was developed as a part of the OntoGene project, which focuses on using advanced computational linguistic techniques for text mining applications in the biomedical domain. In this paper, we focus in particular on(More)
We introduce our incremental coreference resolution system for the BioNLP 2011 Shared Task on Protein/Gene interaction. The benefits of an incremental architecture over a mentionpair model are: a reduction of the number of candidate pairs, a means to overcome the problem of underspecified items in pair-wise classification and the natural integration of(More)
Automatic extraction of biological network information is one of the most desired and most complex tasks in biological text mining. The BioCreative track 4 provides training data and an evaluation environment for the extraction of causal relationships in Biological Expression Language (BEL). BEL is a modeling language that is easily editable by humans or by(More)
This paper presents the submissions by the University of Zurich to the SIGMORPHON 2017 shared task on morphological reinflection. The task is to predict the inflected form given a lemma and a set of morpho-syntactic features. We focus on neural network approaches that can tackle the task in a limited-resource setting. As the transduction of the lemma into(More)
The BioCreative challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. The biocurator community, as an active user of biomedical literature, provides a diverse and engaged end user group for text mining tools. Earlier BioCreative challenges involved many text mining(More)
Research scientists and companies working in the domains of biomedicine and genomics are increasingly faced with the problem of efficiently locating, within the vast body of published scientific findings, the critical pieces of information that are needed to direct current and future research investment. In this report we describe approaches taken within(More)
The need for efficient text-mining tools that support curation of the biomedical literature is ever increasing. In this article, we describe an experiment aimed at verifying whether a text-mining tool capable of extracting meaningful relationships among domain entities can be successfully integrated into the curation workflow of a major biological database.(More)
This article describes the approaches taken by the OntoGene group at the University of Zurich in dealing with two tasks of the BioCreative III competition: classification of articles which contain curatable protein-protein interactions (PPI-ACT) and extraction of experimental methods (PPI-IMT). Two main achievements are described in this paper: (a) a system(More)