Benoît Crabbé

Learn More
We first describe the automatic conversion of the French Treebank (Abeillé and Barrier, 2004), a constituency treebank, into typed projective dependency trees. In order to evaluate the overall quality of the resulting dependency treebank, and to quantify the cases where the projectivity constraint leads to wrong dependencies, we compare a subset of the(More)
We present a semi-supervised method to improve statistical parsing performance. We focus on the well-known problem of lexical data sparseness and present experiments of word clustering prior to parsing. We use a combination of lexiconaided morphological clustering that preserves tagging ambiguity, and unsupervised word clustering, trained on a large(More)
In this paper we introduce a general framework for describing the lexicon of a lexicalised grammar by means of elementary descriptive fragments. The system described hereafter consists of two main components: a control device aimed at controlling how fragments are to be combined together in order to describe meaningful lexical descriptions and a composition(More)
This paper reports results on grammatical induction for French. We investigate how to best train a parser on the French Treebank (Abeillé et al., 2003), viewing the task as a trade-off between generalizability and interpretability. We compare, for French, a supervised lexicalized parsing algorithm with a semi-supervised unlexicalized algorithm (Petrov et(More)
This paper is dedicated to the compact representation of Tree Adjoining Grammars. We provide a methodology for grammatical development with eXtensible MetaGrammar (Xmg). The provided methodology has been set up together with the development of a large French Tag. Furthermore the grammatical representation language and the assorted development methodology(More)
This article introduces a novel transition system for discontinuous lexicalized constituent parsing called SR-GAP. It is an extension of the shift-reduce algorithm with an additional gap transition. Evaluation on two German treebanks shows that SR-GAP outperforms the previous best transitionbased discontinuous parser (Maier, 2015) by a large margin (it is(More)
Towards a treebank of spoken French We present the first results of an attempt to build a spoken treebank for French. It has been conducted as part of the ANR project Etape (resp. G. Gravier). Contrary to other languages such as English (see the Switchboard treebank (Meteer, 1995)), there is no sizable spoken corpus for French annotated for syntactic(More)