Anne Abeillé

Learn More
We present a treebank project for French. We have annotated a newspaper corpus of 1 Million words with part of speech, inflection, compounds, lemmas and constituency. We describe the tagging and parsing phases of the project, and for each, the automatic tools, the guidelines and the validation process. We then present some uses of the corpus as well as some(More)
In this paper, we present a parsing strategy that arose from the development of an Earley-type parsing algorithm for TAGs (Schabes and Joshi 1988) and from some recent linguistic work in TAGs (Abeillé: 1988a). In our approach, each elementary structure is systematically associated with a lexical head. These structures specify extended domains of locality(More)
This paper presents the current status of the French treebank developed at Paris 7 (Abeillé et al., 2003a). The corpus comprises 1 million words from the newspaper le Monde, fully annotated and disambiguated for parts of speech, inflectional morphology, compounds and lemmas, and syntactic constituents. It is representative of contemporary normalized written(More)
We show how idioms can be parsed in lexiealized TAGs. We rely on extensive studies of frozen phrases pursued at L.A.D.L) that show that idioms are pervasive in natural language and obey, generally speaking, the same morphological and syntactical patterns as 'free' structures. By idiom we mean a structure in which some items are lexically frozen and have a(More)
according to this definition 2. Each elementary tree is constrained to have at least one terminal at its frontier which serves as 'head' (or 'anchor'). Sentences of a Tag language are derived from the composition of an S-rooted initial tree with other elementary trees by two operations: substitution (the same operation used by context free grammars) or(More)
From the parsing point of view, the derivation tree in TAG [hereafter DT] is seen as the "history" of the derivation but also as a linguistic representation, closer to semantics, that can be the basis of a further analysis. Because in TAG the elementary trees are lexicalized and localize the predicate-arguments relations, several works have compared the DT(More)
On the basis of the ordering of bare complements, modifying adjectives and certain adverbs in French, we show that certain constituents are more constrained than others, and we explain this situation in terms of weight, as one of the factors which determine word order. In addition to the distinction between heavy and non-heavy constituents, we propose that(More)
TreeLex is a subcategorization lexicon of French, automatically extracted from a syntactically annotated corpus. The lexicon comprises 2006 verbs (25076 occurrences). The goal of the project is to obtain a list of subcategorization frames of contemporary French verbs and to estimate the number of different verb frames available in French in general. A few(More)