Roman Schneider

Learn More
The goal of this workshop is to create a forum for researchers interested in the use of semantic annotations for information retrieval. By semantic annotations we refer to linguistic annotations (such as named entities, semantic classes, etc.) as well as user annotations such as microformats, RDF, tags, etc. The aim of this workshop is not semantic(More)
This paper describes a new approach to improve the analysis and categorization of web documents using statistical methods for template based clustering as well as semantical analysis based on terminological ontologies. A domain-specific environment serves for prove of concept. In order to demonstrate the widespread practical benefit of our approach, we(More)
Zusammenfassung Das Online-Wortschatz-Informations-system Deutsch (OWID) ist ein digitales Wörterbuchportal des Instituts für Deutsche Sprache. Alle darin zusammengeführten lexikografischen Daten sind auf XML-Basis feingranular struk-turiert. Speicherung, Verwaltung und Retrieval dieser Daten übernimmt das Oracle-basierte Electronic Dictionary(More)
The compilation of terminological vocabularies plays a central role in the organization and retrieval of scientific texts. Both simple keyword lists as well as sophisticated modellings of relationships between terminological concepts can make a most valuable contribution to the analysis, classification, and finding of appropriate digital documents, either(More)
A new diffractometer for microcrystallography has been developed for the three macromolecular crystallography beamlines of the Swiss Light Source. Building upon and critically extending previous developments realised for the high-resolution endstations of the two undulator beamlines X06SA and X10SA, as well as the super-bend dipole beamline X06DA, the new(More)
We present a novel NLP resource for the explanation of linguistic phenomena, built and evaluated exploring very large annotated language corpora. For the compilation, we use the German Reference Corpus (DeReKo) with more than 5 billion word forms, which is the largest linguistic resource worldwide for the study of contemporary written German. The result is(More)
Linguistic query systems are special purpose IR applications. We present a novel state-of-the-art approach for the efficient exploitation of very large linguistic corpora, combining the advantages of relational database management systems (RDBMS) with the functional MapReduce programming model. Our implementation uses the German DEREKO reference corpus with(More)
  • 1