Farasa: A New Fast and Accurate Arabic Word Segmenter

Abstract

In this paper, we present Farasa (meaning insight in Arabic), which is a fast and accurate Arabic segmenter. Segmentation involves breaking Arabic words into their constituent clitics. Our approach is based on SVM using linear kernels. The features that we utilized account for: likelihood of stems, prefixes, suffixes, and their combination; presence in… (More)

Topics

3 Figures and Tables

Statistics

05010020162017
Citations per Year

Citation Velocity: 28

Averaging 28 citations per year over the last 2 years.

Learn more about how we calculate this metric in our FAQ.

Slides referencing similar topics