Fabien Poulard

Learn More
In this paper we describe recent work carried out in the context of the TTC project1 towards the automatic construction of comparable corpora for multilingual terminology extraction. We focus on the communicative intention as the variable of discourse analysis that is best suited to select Web documents valuable for terminology applications and propose a(More)
This paper explores the detection of derivation links between texts (otherwise called plagiarism, near-duplication, revision, etc.) at the document level. We evaluate the use of textual elements implementing the ideas of specificity and invariance as well as their combination to characterize derivatives. We built a French press corpus based on Wikinews(More)
  • 1