Learn More
This paper describes the MulTra project, aiming at the development of an efficient multilingual translation technology based on an abstract and generic linguistic model as well as on object-oriented software design. In particular, we will address the issue of the rapid growth both of the transfer modules and of the bilingual databases. For the latter, we(More)
The development of robust " deep " linguistic parsers is known to be a difficult task. Few such systems can claim to satisfy the needs of large-scale NLP applications in terms of robustness, efficiency, granular-ity or precision. Adapting such systems to more than one language makes the task even more challenging. This paper describes some of the properties(More)
The IPS system is a large-scale interactive GB-based parsing system (English, French) under development at the University of Geneva. This paper starts with an overview of the system , discussing some of its basic features as well as its general architecture. We then turn to a more detailed discussion of the "right cor-ner" parsing strategy developed for(More)
This paper presents a method for extracting multi-word collocations (MWCs) from text corpora, which is based on the previous extraction of syntactically bound collocation bi-grams. We describe an iterative word linking procedure which relies on a syntactic criterion and aims at building up arbitrarily long expressions that represent multi-word collocation(More)
An impressive amount of work was devoted over the past few decades to collocation extraction. The state of the art shows that there is a sustained interest in the morphosyntactic preprocessing of texts in order to better identify candidate expressions; however, the treatment performed is, in most cases, limited (lemmatiza-tion, POS-tagging, or shallow(More)
Bien que de nombreux efforts aient été déployés pour extraire des collocations à partir de corpus de textes, seule une minorité de travaux se préoccupent aussi de rendre le résultat de l'extraction prêt à être utilisé dans les applications TAL qui pourraient en bénéficier, telles que la traduction automatique. Cet article décrit une méthode précise(More)
SwissAdmin is a new multilingual corpus of press releases from the Swiss Federal Administration, available in German, French, Italian and English. We provide SwissAdmin in three versions: (i) plain texts of approximately 6 to 8 million words per language; (ii) sentence-aligned bilingual texts for each language pair; (iii) a part-of-speech-tagged version(More)