Exploiting Wikipedia as a Knowledge Base for the Extraction of Linguistic Resources : Application on Arabic-French Comparable Corpora and Bilingual Lexicons


We present simple and effective methods for extracting comparable corpora and bilingual lexicons from Wikipedia. We shall exploit the large scale and the structure of Wikipedia articles to extract two resources that will be very useful for natural language applications. We build a comparable corpus from Wikipedia using categories as topic restrictions and… (More)

