• Corpus ID: 8869247

The ALVIS Format for Linguistically Annotated Documents

  title={The ALVIS Format for Linguistically Annotated Documents},
  author={Adeline Nazarenko and {\'E}rick Alphonse and Julien Derivi{\`e}re and Thierry Hamon and Guillaume Vauvert and Davy Weissenbacher},
The paper describes the ALVIS annotation format and discusses the problems that we encountered for the indexing of large collections of documents for topic specific search engines. This paper is exemplified on the biological domain and on MedLine abstracts, as developing a specialized search engine for biologist is one of the ALVIS case studies. The ALVIS principle for linguistic annotations is based on existing works and standard propositions. We made the choice of stand-off annotations rather… 

Figures from this paper

A robust linguistic infrastructure for efficient web content analysis: the ALVIS project

This paper focuses on the design and the development of a text processing architecture exploiting specialized NLP tools, to produce linguistically annotated documents, using existing NLP modules and resources which need to be tuned to specific domains.

A Scalable and Distributed NLP Architecture for Web Document Annotation

A NLP architecture to linguistically annotate large collections of web documents and focuses on the efficiency of the platform by distributing linguistic processing on several machines to face the scalability aspect of Natural Language Processing.

Semantic Annotation in the Alvis Project

The poor quality of the semantic annotations of documents affects the development of new services of intelligent access to documents, including IE, Q/A or summarization and more sophisticated linguistic processing is now recognized as needed for answering needs in specific domains such as retrieving relevant documents and extracting focused information.

Developping a platform dedicated to the annotation of web documents: a case study

The performance for the annotation of web documents is compatible with the speed of the document crawling, and how the Ogmios platform has been integrated in a specialised search engine is explained.

A Robust Linguistic Platform for Efficient and Domain specific Web Content Analysis

This paper focuses on the design and the development of a text processing platform, Ogmios, which has been developed in the ALVIS project and shows how the three constraints of genericity, domain semantic awareness and performance can be handled all together.

Une infrastructure pour l'annotation linguistique de documents issus du web : le projet ALVIS

La plateforme Ogmios peut etre adaptee en fonction du domaine vise et elle permet d'analyser de maniere robuste des collections de documents qui sont heterogenes, ce qui est caracteristiques des collections construites a partir du web.

Influence des annotations imparfaites sur les systèmes de Traitement Automatique des Langues, un cadre applicatif: la résolution de l'anaphore pronominale. (Effects of imperfect annotations on Natural Language Processing systems, an applicative case study: the pronominal anaphora resolution)

Pour y repondre nous avons propose un modele d'inference probabiliste reposant sur les reseaux bayesiens (RB), un formalisme adapte pour travailler sur des donnees imparfaites dans les textes anglais and valide notre modele en evaluant de two RB sur des corpus differents.

OGMIOS : une plate-forme d’annotation linguistique de collection de documents issus du Web

Un plate-forme d’enrichissement linguistique de documents issus du Web, OGMIOS, exploitant des outils de TAL existants, permet d”analyser en masse des données issus del Web qui sont par essence très hétérogènes.

Ogmios: a scalable NLP platform for annotating large web document collections

This work proposes a configurable platform to enrich very large collections of French and English specialised documents and focuses on the robustness of the annotation process to help the creation of annotated corpora from the web.



The ALVIS Document Model for a Semantic Search Engine

The ALVIS document processing architecture is described, which supports the processing of documents for these purposes and is intended to be able to operate with heterogeneous search servers, using query topics as a routing mechanism, and using distributed methods for ranking and semantic-based processing.

Annotation graphs as a framework for multidimensional linguistic data analysis

This work motivates and illustrates the approach using discourse-level annotations of text and speech data drawn from the CALLHOME, COCONUT, MUC-7, DAMSL and TRAINS annotation schemes to show how annotation graphs can represent hybrid multi-level structures which derive from a diverse set of file formats.

International Standard for a Linguistic Annotation Framework

The Linguistic Annotation Framework is intended to serve as a basis for harmonizing existing language resources as well as developing new ones within ISO TC37 SC4 WG1.

Tipster architecture design document

  • 1997

Tipster architecture design document version 2

  • 1997

Tipster architecture design document version 2.3

  • Technical report,
  • 1997

Report on augmented document representations

  • Deliverable 5.1, ALVIS.
  • 2004

Annual European Semantic Web Conference

    Tipster architecture design document version 2 . 3 . Technical report , DARPA

    • 1997