Arabic Dialect Identification in Speech Transcripts


In this paper we describe a system developed to identify a set of four regional Arabic dialects (Egyptian, Gulf, Levantine, North African) and Modern Standard Arabic (MSA) in a transcribed speech corpus. We competed under the team name MAZA in the Arabic Dialect Identification sub-task of the 2016 Discriminating between Similar Languages (DSL) shared task. Our system achieved an F1-score of 0.51 in the closed training track, ranking first among the 18 teams that participated in the sub-task. Our system utilizes a classifier ensemble with a set of linear models as base classifiers. We experimented with three different ensemble fusion strategies, with the mean probability approach providing the best performance.

7 Figures and Tables

Citations per Year

Citation Velocity: 6

Averaging 6 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.

Cite this paper

@inproceedings{Malmasi2016ArabicDI, title={Arabic Dialect Identification in Speech Transcripts}, author={Shervin Malmasi and Marcos Zampieri}, year={2016} }