• Corpus ID: 16072411

The Leeds Arabic Discourse Treebank: Annotating Discourse Connectives for Arabic

  title={The Leeds Arabic Discourse Treebank: Annotating Discourse Connectives for Arabic},
  author={Amal Al-Saif and Katja Markert},
We present the first effort towards producing an Arabic Discourse Treebank, a news corpus where all discourse connectives are identified and annotated with the discourse relations they convey as well as with the two arguments they relate. [] Key Method We present a dedicated discourse annotation tool for Arabic and a large-scale annotation study. We show that both the human identification of discourse connectives and the determination of the discourse relations they convey is reliable. Our current annotated…

Figures and Tables from this paper

Modelling Discourse Relations for Arabic
The first algorithms to automatically identify explicit discourse connectives and the relations they signal for Arabic text are presented, and the algorithm for recognizing discourse relations performs significantly better than a baseline based on the connective surface string alone and therefore reduces the ambiguity in explicit connective interpretation.
Cross-Lingual Identification of Ambiguous Discourse Connectives for Resource-Poor Language
This paper presents the first effort towards recognizing ambiguities of discourse connectives, which is fundamental to discourse classification for resource-poor language such as Chinese, and presents a discourse corpus for Chinese, which will soon become the first Chinese discourse corpus publicly available.
The Chinese Discourse TreeBank: a Chinese corpus annotated with discourse relations
The paper first characterize the syntactic and statistical distributions of Chinese discourse connectives as well as the role of Chinese punctuation marks in discourse annotation, and then describes how the annotation strategy procedure is designed based on this characterization.
Persian Discourse Treebank and coreference corpus
This research addresses the investigation of intra-document relations based on two major approaches: discourse analysis and coreference resolution which results in building the first Persian
The CUHK Discourse TreeBank for Chinese: Annotating Explicit Discourse Connectives for the Chinese TreeBank
This work presents the first open discourse tree bank for Chinese, namely, the Discourse Treebank for Chinese (DTBC), and made adjustments to 3 essential aspects according to the previous study of Chinese linguistics, showing that the annotation scheme could achieve highly reliable results.
TCL - a Lexicon of Turkish Discourse Connectives
It is known that discourse connectives are the most salient indicators of discourse relations. State-of-the-art parsers being developed to predict explicit discourse connectives exploit annotated
Inducing Discourse Resources Using Annotation Projection
An approach that automatically creates two types of discourse resources from parallel texts: PDTB-style discourse annotated corpora and lexicons of discourse connectives and a novel approach for annotation projection that is independent of statistical word-alignment models are proposed.
Automatic Disambiguation of French Discourse Connectives
The results with the French Discourse Treebank show that syntactic and lexical features developed for English texts are as effective for French and allow the disambiguation of French discourse connectives with an accuracy of 94.2%.
The Penn Discourse Treebank: An Annotated Corpus of Discourse Relations
This chapter presents a case study of the Penn Discourse Treebank, focusing in particular on the problem of characterizing and identifying, via annotation, explicit as well as implicit signals of discourse relations, and of designing the overall annotation workflow.
Genres in the Prague Discourse Treebank
The motivation and the concept of the genre annotation are described, and the process of manual annotation of genres in the treebank is elaborate, from the annotators’ manual work to post-annotation checks and to the inter-annotator agreement measurements.


A Discourse Resource for Turkish: Annotating Discourse Connectives in the METU Corpus
This paper describes first steps towards extending the METU Turkish Corpus from a sentence-level language resource to a discourse-level resource by annotating its discourse connectives and their arguments with respect to free word order in Turkish and punctuation.
Annotating Discourse Connectives in the Chinese Treebank
It is shown that one of the most challenging issues in this type of discourse annotation is determining the textual spans of the arguments and this is partly due to the hierarchical nature of discourse relations.
Towards an Annotated Corpus of Discourse Relations in Hindi
We describe our initial efforts towards developing a large-scale corpus of Hindi texts annotated with discourse relations. Adopting the lexically grounded approach of the Penn Discourse Treebank
Easily Identifiable Discourse Relations
We present a corpus study of local discourse relations based on the Penn Discourse Tree Bank, a large manually annotated corpus of explicitly or implicitly realized relations. We show that while
Morphological Annotation of Quranic Arabic
How the unique challenge of morphological annotation of Quranic Arabic is solved using a multi-stage approach is discussed, which includes automatic morphological tagging using diacritic edit-distance, two-pass manual verification, and online collaborative annotation.
Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools
This paper addresses the following questions from the experience of developing a large-scale corpus of Arabic text annotated for morphological information, part-of-speech, English gloss, and syntactic structure.
Automated Discourse Generation Using Discourse Structure Relations
Towards a Rhetorical Parsing of Arabic Text
  • Waleed Al-SanieA. TouirH. Mathkour
  • Computer Science
    International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06)
  • 2005
This paper presents a framework of applying RST on Arabic language in order to rhetorically parse and understand the Arabic texts.
Discourse Relations: A Structural and Presuppositional Account Using Lexicalised TAG
We show that discourse structure need not bear the full burden of conveying discourse relations by showing that many of them can be explained nonstructurally in terms of the grounding of anaphoric