Learn More
Discourse segmentation is the division of a text into minimal discourse segments, which form the leaves in the trees that are used to represent discourse structures. A definition of elementary discourse segments in German is provided by adapting widely used segmenta-tion principles for English minimal units, while considering punctuation, morphology, sytax,(More)
The quality of manual annotations of linguistic data depends on the use of reliable coding schemas as well as on the ability of human annotators to handle them appropriately. As is well known from a wide range of previous experiences annotations using highly complex coding schemas often lead to unacceptable annotation quality. Reducing complexity might make(More)
A text parsing component designed to be part of a system that assists students in academic reading an writing is presented. The parser can automatically add a relational discourse structure annotation to a scientific article that a user wants to explore. The discourse structure employed is defined in an XML format and is based the Rhetorical Structure(More)
We present an approach on how to investigate what kind of semantic information is regularly associated with the structural markup of scientific articles. This approach addresses the need for an explicit formal description of the semantics of text-oriented XML-documents. The domain of our investigation is a corpus of scientific articles from psychology and(More)
1 Project framework and goals Wordnets are lexical reference systems that follow the design principles of the Princeton WordNet project  (Fellbaum, ). Domain ontologies (or domain-specific ontologies such as GOLD  , or the GENE Ontology ) represent knowledge about a specific domain in a format that supports automated reasoning about the objects in(More)
We describe a general two-stage procedure for re-using a custom corpus for spoken language system development involving a transformation from character-based markup to XML, and DSSSL stylesheet-driven XML markup enhancement with multiple lexical tag trees. The procedure was used to generate a fully tagged corpus; alternatively with greater economy of(More)