• Corpus ID: 16959374

Ontology-Based Interface Specifications for a NLP Pipeline Architecture

  title={Ontology-Based Interface Specifications for a NLP Pipeline Architecture},
  author={Ekaterina Buyko and C. Chiarcos and Antonio Pareja-Lora},
  booktitle={International Conference on Language Resources and Evaluation},
The high level of heterogeneity between linguistic annotations usually complicates the interoperability of processing modules within an NLP pipeline. In this paper, a framework for the interoperation of NLP components, based on a data-driven architecture, is presented. Here, ontologies of linguistic annotation are employed to provide a conceptual basis for the tagset-neutral processing of linguistic annotations. The framework proposed here is based on a set of structured OWL ontologies: a… 

Figures from this paper

An Ontology-based Approach To Automatic Part-of-Speech Tagging Using Heterogeneously Annotated Corpora

This work successfully train an ontology-based POS tagger on corpora with different tag sets of divergent granularity and partially compatible annotations, and shows how traing on heterogeneously annotated data produces richer morphosyntactic annotation with no or only marginal loss of precision.

Towards Robust Multi-Tool Tagging. An OWL/DL-Based Approach

It is shown how annotations created by seven NLP tools are mapped onto tool-independent descriptions that are defined with reference to an ontology of linguistic annotations, and how a majority vote and ontological consistency constraints can be used to integrate multiple alternative analyses of the same token in a consistent way.

OLiA - Ontologies of Linguistic Annotation

The OLiA ontologies serve as a reference hub for annotation terminology for linguistic phenomena on a great band-width of languages and have been used to facilitate interoperability and information integration of linguistic annotations in corpora, NLP pipelines, and lexical-semantic resources.

OWL/DL formalization of the MULTEXT-East morphosyntactic specifications

This paper describes the modeling of the morphosyntactic annotations of the MULTEXT-East corpora and lexicons as an OWL/DL ontology and shows that this approach provides a top-down perspective on a large set of morphOSyntactic specifications for multiple languages.

Making UIMA Truly Interoperable with SPARQL

A conversion mechanism based on SPARQL, a query language for the data retrieval and manipulation of RDF graphs, is introduced, providing a UIMA component that serialises data coming from a source component into RDF, executes a user-defined, typeconversion query, and deserialises the updated graph into a target component.

The Pragmatic Level of OntoLingAnnot’s Ontologies and Their Use in Pragmatic Annotation for Language Teaching

This chapter presents the different units, values, attributes and relations that constitute the pragmatic level of these ontologies, which have been devised for the annotation of dialogues and texts in different contexts (e.g., the development of corpora or language teaching).

Ontologies of Linguistic Annotation: Survey and perspectives

The OLiA ontologies represent a repository of annotation terminology for various linguistic phenomena on a great band-width of languages and are summarized in this paper.

Modelling Discourse-related terminology in OntoLingAnnot’s ontologies

This paper shows the different units, values, attributes, relations, layers and strata included in the discourse annotation level of the OntoLingAnnot model, within which these ontologies are included, used and evaluated.

RANLP 2015 Second Workshop on Natural Language Processing and Linked Open Data (NLP&LOD2)

A framework for the interoperable semantic interpretation of mentions of events, participants, locations and time, as well as the relations between them is developed, using a common RDF model to represent instances of events and normalised entities and dates.

Interoperability of Corpora and Annotations

  • C. Chiarcos
  • Computer Science, Linguistics
    Linked Data in Linguistics
  • 2012
The application of OWL and RDF to address the interoperability of linguistic corpora and linguistic annotations within such corpora is described.

Avoiding Data Graveyards : Deriving an Ontology for Accessing Heterogeneous Data Collections – Extended Abstract –

In this paper, I describe derivation and practical application of an ontology of word classes manually derived from four different sources: – the EAGLES recommendations for the morphosyntactic

Incremental formalization of document annotations through ontology-based paraphrasing

This work describes animplemented approach to help users create semi-structured semantic annotations for a document according to an extensible OWL ontology, and uses a combination of off-the-shelf parsing tools and breadth-first search of expressions in the ontology to help Users create valid annotations starting from free text.

Ontology-Based XQuery’ing of XML-Encoded Language Resources on Multiple Annotation Layers

An approach for querying collections of heterogeneous linguistic corpora that are annotated on multiple layers using arbitrary XML-based markup languages and a highly flexible web-based graphical interface that can be used to query corpora with regard to several different linguistic properties.

OntoTag's Linguistic Ontologies: Enhancing Higher Level and Semantic Web Annotations

These capabilities are derived from the incorporation into the platform of a set of linguistic ontologies, which are also the main referent for the generation of multi-levelled and standardized annotations of Semantic Web documents within OntoTag.

The Semantics of Markup: Mapping Legacy Markup Schemas to a Common Semantics

A method for mapping linguistic descriptions in plain XML into semantically rich RDF/OWL is outlined and demonstrated, and the General Ontology for Linguistic Description of Farrar and Langendoen (2003) is employed as the target semantic schema.

OntoTag: XML/RDF(S)/OWL Semantic Web Page Annotation in ContentWeb

This paper is to present how this process of standardisation and integration is being achieved in ContentWeb by means of OntoTag, a multi-level (also multi-purpose and possibly multi-language) hybrid (ontologic and linguistic) platform for Semantic Web annotation.


The purpose of the research presented in this proposal is to prove that integration of results in both fields is not only possible, but also highly useful in order to make Semantic Web pages more machine-readable.

Formalising Multi-layer Corpora in OWL DL - Lexicon Modelling, Querying and Consistency Control

We present a general approach to formally modelling corpora with multi-layered annotation, thereby inducing a lexicon model in a typed logical representation language, OWL DL. This model can be

An Annotation Type System for a Data-Driven NLP Pipeline

An annotation type system for a data-driven NLP core system that covers formal document structure and document meta information, as well as the linguistic levels of morphology, syntax and semantics is introduced.

A Registry of Standard Data Categories for Linguistic Annotation

The most recent work within ISO TC37/SC 4, and in particular the development of a Data Category Registry (DCR) component of the Linguistic Annotation Framework, is described.