A Multi-Dialect, Multi-Genre Corpus of Informal Written Arabic

  title={A Multi-Dialect, Multi-Genre Corpus of Informal Written Arabic},
  author={Ryan Cotterell and Chris Callison-Burch},
This paper presents a multi-dialect, multi-genre, human annotated corpus of dialectal Arabic with data obtained from both online newspaper commentary and Twitter. Most Arabic corpora are small and focus on Modern Standard Arabic (MSA). There has been recent interest, however, in the construction of dialectal Arabic corpora (Zaidan and Callison-Burch, 2011a; Al-Sabbagh and Girju, 2012). This work differs from previously constructed corpora in two ways. First, we include coverage of five dialects… CONTINUE READING
Highly Cited
This paper has 57 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.

Explore Further: Topics Discussed in This Paper


Publications citing this paper.
Showing 1-10 of 37 extracted citations

Natural Language Processing and Information Systems

Lecture Notes in Computer Science • 2017
View 6 Excerpts
Highly Influenced

58 Citations

Citations per Year
Semantic Scholar estimates that this publication has 58 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 18 references

Similar Papers

Loading similar papers…