• Publications
  • Influence
The Penn Arabic Treebank : Building a Large-Scale Annotated Arabic Corpus
From our three year experience of developing a large-scale corpus of annotated Arabic text, our paper will address the following: (a) review pertinent Arabic language issues as they relate toExpand
  • 324
  • 82
  • PDF
Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools
In this paper we address the following questions from our experience of the last two and a half years in developing a large-scale corpus of Arabic text annotated for morphological information,Expand
  • 126
  • 16
  • PDF
A Pilot Arabic Propbank
TLDR
In this paper, we present the details of creating a pilot Arabic proposition bank (Propbank). Expand
  • 54
  • 5
  • PDF
Diacritization: A Challenge to Arabic Treebank Annotation and Parsing
Arabic diacritization (referred to sometimes as vocalization or vowelling), defined as the full or partial representation of short vowels, shadda (consonantal length or germination), tanweenExpand
  • 50
  • 5
  • PDF
Enhancing the Arabic Treebank: a Collaborative Effort toward New Annotation Guidelines
TLDR
The Arabic Treebank team at the Linguistic Data Consortium has significantly revised and enhanced its annotation guidelines and procedure over the past year. Expand
  • 39
  • 4
  • PDF
Enhanced Annotation and Parsing of the Arabic Treebank
TLDR
We propose an automatic procedure that more closely aligns the POS tags and the Treebank annotation, leading to increased parsing results and additionally providing the annotation pipeline with improved error checking and quality control. Expand
  • 21
  • 3
  • PDF
Developing an Egyptian Arabic Treebank: Impact of Dialectal Morphology on Annotation and Tool Development
TLDR
This paper describes the parallel development of an Egyptian Arabic Treebank and a morphological analyzer for Egyptian Arabic (CALIMA). Expand
  • 40
  • 3
  • PDF
Dialectal Arabic Telephone Speech Corpus : Principles , Tool design , and Transcription Conventions
The present paper presents the experience gained at LDC in the collection and transcription of a corpus of conversational telephone speech in dialectal Arabic. The paper will cover the following: (a)Expand
  • 24
  • 2
  • PDF
Consistent and Flexible Integration of Morphological Annotation in the Arabic Treebank
TLDR
The Arabic Treebank provides a resource relating the different forms of the same underlying token with varying degrees of vocalization, in terms of how they relate (1) to each other, (2) to the syntactic structure. Expand
  • 18
  • 2
  • PDF