This paper investigates automatic identification of Information Structure (IS) in texts. The experiments use the Prague Dependency Treebank which is annotated with IS following the Praguian approach of Topic Focus Articulation. We automatically detect t(opic) and f(ocus), using node attributes from the treebank as basic features and derived features… (More)
In this paper we describe a method to obtain summaries focussed on chosen characters of a free text. Summaries are extracted from discourse structures, which resemble rhetorical trees. They are obtained by exploiting cohesion and coherence properties of the text. Evaluation intends to evidence the contribution of each module in the final result.
The paper presents a framework that allows the design, realisation and validation of different anaphora resolution models on real texts. The type of processing implemented by the engine is an incremental one, simulating the reading of texts by humans. Advanced behaviour like postponed resolution and accumulation of values for features of the discourse… (More)
We introduce a modular, dependency-based formalization of Information Structure (IS) based on Steedman's prosodic account [1, 2]. We state it in terms of Extensible Dependency Grammar (XDG) , introducing two new dimensions modeling 1) prosodic structure, and 2) theme/rheme and focus/background partitionings. The approach goes without a non-standard… (More)
This paper investigates the automatic identification of aspects of Information Structure (IS) in texts. The experiments use the Prague Dependency Treebank which is annotated with IS following the Praguian approach of Topic Focus Artic-ulation. We automatically detect t(opic) and f(ocus), using node attributes from the treebank as basic features and derived… (More)