Harald Lüngen

Learn More
Discourse segmentation is the division of a text into minimal discourse segments, which form the leaves in the trees that are used to represent discourse structures. A definition of elementary discourse segments in German is provided by adapting widely used segmentation principles for English minimal units, while considering punctuation, morphology, sytax,(More)
The quality of manual annotations of linguistic data depends on the use of reliable coding schemas as well as on the ability of human annotators to handle them appropriately. As is well known from a wide range of previous experiences annotations using highly complex coding schemas often lead to unacceptable annotation quality. Reducing complexity might make(More)
Annotating multiple hierarchies with SGML-based markup systems is still one of the fundamental problems of text-technological research. Up to now, several solutions have been discussed (e.g. chapter 31 of the TEI-Guidelines (Sperberg-McQueen and Burnard 1994) and Barnard et al. (1995)). Furthermore, some non-SGML based approaches have been proposed. (cf.(More)
We present an approach on how to investigate what kind of semantic information is regularly associated with the structural markup of scientific articles. This approach addresses the need for an explicit formal description of the semantics of text-oriented XML-documents. The domain of our investigation is a corpus of scientific articles from psychology and(More)
A text parsing component designed to be part of a system that assists students in academic reading an writing is presented. The parser can automatically add a relational discourse structure annotation to a scientific article that a user wants to explore. The discourse structure employed is defined in an XML format and is based the Rhetorical Structure(More)
In the project SemDok (Generic document structures in linearly organised texts) funded by the German Research Foundation DFG, a discourse parser for a complex type (scientific articles by example), is being developed. Discourse parsing (henceforth DP) according to the Rhetorical Structure Theory (RST) (Mann and Taboada, 2005; Marcu, 2000) deals with(More)
Wordnets are lexical reference systems that follow the design principles of the Princeton WordNet project (Fellbaum, ). Domain ontologies (or domain-specific ontologies such as GOLD, or the GENE Ontology) represent knowledge about a specific domain in a format that supports automated reasoning about the objects in that domain and the relations between(More)
We describe a general two-stage procedure for re-using a custom corpus for spoken language system development involving a transformation from character-based markup to XML, and DSSSL stylesheet-driven XML markup enhancement with multiple lexical tag trees. The procedure was used to generate a fully tagged corpus; alternatively with greater economy of(More)