A formal framework for linguistic annotation

@article{Bird2001AFF,
  title={A formal framework for linguistic annotation},
  author={Steven Bird and Mark Y. Liberman},
  journal={ArXiv},
  year={2001},
  volume={cs.CL/9903003}
}
Abstract `Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions – audio, video and/or physiological recordings – or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, `named entity' identification, coreference annotation, and so on. While there are several ongoing… Expand
Annotation Graphs: A Foundation for Integrating Tools, Formats and Corpora
TLDR
It is argued that a minimal formalization of this basic set of practices is a directed graph with elded records on the arcs and optional time references on the nodes that has suucient expressive capacity to encode, in a reasonably intuitive way, all of the kinds of linguistic annotations in use today. Expand
ATLAS: A Flexible and Extensible Architecture for Linguistic Annotation
TLDR
A formal model for annotating linguistic artifacts is described, from which an application programming interface (API) to a suite of tools for manipulating these annotations are derived, and a review of the current efforts towards implementing key pieces of this architecture is reviewed. Expand
An Efficient and Flexible Format for Linguistic and Semantic Annotation
TLDR
An XML annotation format and tool developed within the MUCHMORE project, which is conceptually related to stand-off annotation, and the tool for automatic semantic annotation are described. Expand
SusTEInability of linguistic resources through feature structures
TLDR
This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora, and the mapping process and representational issues are discussed as well as the advantages and drawbacks associated with the use of the TEi tag set as a storage and exchange format for linguistically annotated data. Expand
Annotation graphs as a framework for multidimensional linguistic data analysis
TLDR
This work motivates and illustrates the approach using discourse-level annotations of text and speech data drawn from the CALLHOME, COCONUT, MUC-7, DAMSL and TRAINS annotation schemes to show how annotation graphs can represent hybrid multi-level structures which derive from a diverse set of file formats. Expand
Concept-based Queries: Combining and Reusing Linguistic Corpus Formats and Query Languages
TLDR
This paper proposes a methodology for querying linguistic data represented in different corpus formats and describes an approach for overcoming these problems and a sample application. Expand
A formal framework for linguistic tree query
TLDR
This thesis identifies a comprehensive set of linguistic tree query requirements and the level of expressiveness needed to implement them and studies formalisms used by linguists and database theorists to describe tree structured data. Expand
Toward a format-neutral annotation store
  • R. Fromont
  • Computer Science
  • Comput. Speech Lang.
  • 2017
TLDR
LaBB-CAT uses Annotation Graphs with extensions to formalise annotation structure and incorporates some extensions to the model, which handle the remaining unmet requirements, and create the possibility of defining an annotation API that makes automation of conversion, querying, and manipulation of annotations easier. Expand
A Tool for Feature-Structure Stand-Off-Annotation on Transcriptions of Spoken Discourse
TLDR
The paper presents a solution consisting of a data model and an annotation tool that tries to fill this gap between „annotation science“ and the practice of transcribing spoken language in the area of discourse analysis and pragmatics, where the lack of ready-to-use annotation solutions is especially remarkable. Expand
A Flexible Representation of Heterogeneous Annotation Data
TLDR
A new flexible representation for the annotation of complex structures of metadata over heterogeneous data collections containing text and other types of media such as images or audio files is described. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 127 REFERENCES
ATLAS: A Flexible and Extensible Architecture for Linguistic Annotation
TLDR
A formal model for annotating linguistic artifacts is described, from which an application programming interface (API) to a suite of tools for manipulating these annotations are derived, and a review of the current efforts towards implementing key pieces of this architecture is reviewed. Expand
Annotation graphs as a framework for multidimensional linguistic data analysis
TLDR
This work motivates and illustrates the approach using discourse-level annotations of text and speech data drawn from the CALLHOME, COCONUT, MUC-7, DAMSL and TRAINS annotation schemes to show how annotation graphs can represent hybrid multi-level structures which derive from a diverse set of file formats. Expand
Linguistic documents synchronizing sound and text
Abstract The goal of the Langues et Civilisations a Tradition Orale (LACITO) Linguistic Archive project is to conserve and disseminate recorded and transcribed oral literature and other linguisticExpand
The MATE workbench - An annotation tool for XML coded speech corpora
TLDR
The MATE workbench is a program which provides support for the annotation of speech and text, and provides facilities for flexible display and editing of such annotations, and complex querying of a resulting corpus. Expand
Multi-level annotation in the Emu speech database management system
TLDR
This paper discusses the design of the Emu system, giving a detailed description of the annotation structures that it supports, and argues that these structures are sufficiently general to allow Emu to read potentially any time-aligned linguistic annotation. Expand
Multi-level Annotation of Speech: An Overview of The Emu Speech Database Management System
TLDR
The design of the Emu system is discussed, giving a detailed description of the annotation structures that it supports, and it is argued that these structures are suciently general to potentially allow Emu to read any time-aligned linguistic annotation. Expand
Heterogeneous relation graphs as a formalism for representing linguistic information
TLDR
This paper explains the HRG formalism in detail, and shows why it is superior to the types of “multi-level” formats normally used in speech synthesis and database annotation. Expand
The logic of typed feature structures
TLDR
The Logic of Typed Feature Structures is the first monograph that brings all the main theoretical ideas into one place where they can be related and compared in a unified setting and is an indispensable compendium for the researcher or graduate student working on constraint-based grammatical formalisms. Expand
Querying databases of annotated speech
  • S. Cassidy, Steven Bird
  • Computer Science
  • Proceedings 11th Australasian Database Conference. ADC 2000 (Cat. No.PR00528)
  • 2000
TLDR
This paper presents and harmonises two independent efforts to model annotated speech databases, one at Macquarie University, and one at the University of Pennsylvania, and describes various query languages along with illustrative applications to a variety of analytical problems. Expand
Transcriber: Development and use of a tool for assisting speech corpora production
TLDR
Transcriber was designed for the manual segmentation and transcription of long duration broadcast news recordings, including annotation of speech turns, topics and acoustic conditions and has been tested on various Unix systems and Windows. Expand
...
1
2
3
4
5
...