Data Set Used
We provide a robust and detailed annotation scheme for information status, which is easy to use, follows a semantic rather than cogni-tive motivation, and achieves reasonable inter-annotator scores. Our annotation scheme is based on two main assumptions: firstly, that information status strongly depends on (in)definiteness, and secondly, that it ought to be… (More)
We investigate the influence of information status (IS) on constituent order in Ger-man, and integrate our findings into a log-linear surface realisation ranking model. We show that the distribution of pairs of IS categories is strongly asymmetric. Moreover , each category is correlated with mor-phosyntactic features, which can be automatically detected. We… (More)
In this article we discuss some empirical results concerning the impact of different levels of information status (i.e. referents and words, respectively) on the prosodic realization of referential expressions in annotated corpora of read and spontaneous speech. Both at the referential and at the lexical level not only given and new but also intermediate… (More)
We discuss and combine representation formats for discourse structure, in particular 'd-trees' from QUD theory and SDRT graphs. QUD trees are derived from SDRT graphs, while changes must apply to QUD theory in order to allow for representations of naturalistic data. We discuss whether QUDs can replace discourse relations. We apply a new method for the… (More)
The main objective of the paper is to show that for an adequate analysis of an item's information status in spoken language two levels of givenness have to be investigated: a referential and a lexical level. This separation is a crucial step towards our goal to arrive at the best possible classification of nominal expressions occurring in natural discourse… (More)
In this paper we present DIRNDL, an annotated corpus resource comprising syntactic annotations as well as information status labels and prosodic information. We introduce each annotation layer and then focus on the linking of the data in a standoff approach. The corpus is based on data from radio news broadcasts , i.e. two sets of primary data: spoken radio… (More)
This article presents a survey of and an investigation into the notion of information status. Based on insights from DRT and presupposition theory a new variant of IS taxonomis is developed, considering issues such as accommodation and underspeci-fication of text with regard to hearer knowledge.
The article discusses several issues relevant for the annotation of written and spoken corpus data with information structure. We discuss ways to identify focus top-down (via Questions under Discussion) or bottom-up (starting from pitch accents). We introduce a two-dimensional labelling scheme for information status and propose a way to distinguish between… (More)
We present a model for automatically predicting information status labels for German referring expressions. We train a CRF on manually annotated phrases, and predict a fine-grained set of labels. We achieve an accuracy score of 69.56% on our most detailed label set, 76.62% when gold standard coreference is available.
The notions accommodation and binding of presuppositions, as used in the DRT-based framework of Van der Sandt and Geurts, are critically assessed. Examples are presented which suggest the need for a narrower interpretation of, in particular, the term accommodation and the differentiation between accommodation proper and the process of presupposing… (More)