Learn More
We first describe the automatic conversion of the French Treebank (Abeillé and Barrier, 2004), a constituency treebank, into typed projective dependency trees. In order to evaluate the overall quality of the resulting dependency treebank, and to quantify the cases where the projectivity constraint leads to wrong dependencies, we compare a subset of the(More)
We present a semi-supervised method to improve statistical parsing performance. We focus on the well-known problem of lexical data sparseness and present experiments of word clustering prior to parsing. We use a combination of lexicon-aided morphological clustering that preserves tagging ambiguity, and unsuper-vised word clustering, trained on a large(More)
This paper reports results on grammatical induction for French. We investigate how to best train a parser on the French Treebank (Abeillé et al., 2003), viewing the task as a trade-off between generaliz-ability and interpretability. We compare, for French, a supervised lexicalized parsing algorithm with a semi-supervised un-lexicalized algorithm (Petrov et(More)
In this paper we introduce a general framework for describing the lexicon of a lexicalised grammar by means of elementary descriptive fragments. The system described hereafter consists of two main components: a control device aimed at controlling how fragments are to be combined together in order to describe meaningful lexical descriptions and a composition(More)
This paper presents preliminary investigations on the statistical parsing of French by bringing a complete evaluation on French data of the main probabilistic lexicalized and unlexicalized parsers first designed on the Penn Treebank. We adapted the parsers on the two existing treebanks of French (Abeillé et al., 2003; Schluter and van Genabith, 2007). To(More)
This paper investigates how to extend coverage of a domain independent lexicon tailored for natural language understanding. We introduce two algorithms for adding lexical entries from VERBNET to the lexicon of the TRIPS spoken dialogue system. We report results on the efficiency of the method, discussing in particular precision versus coverage issues and(More)
In this article, we introduce eXtensible MetaGrammar (XMG), a framework for specifying tree-based grammars such as Feature-Based Lexicalised Tree-Adjoining Grammars (FB-LTAG) and Interaction Grammars (IG). We argue that XMG displays three features which facilitate both grammar writing and a fast prototyping of tree-based grammars. Firstly, XMG is fully(More)
It has been extensively observed that languages minimise the distance between two related words. Dependency length min-imisation effects are explained as a means to reduce memory load and for effective communication. In this paper, we ask whether they hold in typically short spans, such as noun phrases, which could be thought of being less subject to(More)