Ludovic Tanguy

Learn More
Query difficulty can be linked to a number of causes. Some of these causes can be related to the query expression itself, and can therefore be detected through a linguistic analysis of the query text. Using 16 different linguistic features, automatically computed on TREC queries, we looked for significant correlations between these features and the average(More)
We describe the Annodis corpus of discourse structures for French. The corpus joins two perspectives on discourse on a variety of textual genres: a bottom-up approach and a top-down approach. The bottom-up view builds incrementally a structure from elementary discourse units, while the top-down view focuses on the selective annotation of multi-level(More)
This paper reports on the procedure and learning models we adopted for the ‘PAN 2011 Author Identification’ challenge targetting real-world email messages. The novelty of our approach lies in a design which combines shallow characteristics of the emails (words and trigrams frequencies) with a large number of ad hoc linguistically-rich features addressing(More)
Distributional semantics models can be built using simple bag-of-word representation of a word’s contexts (window-based) or using more complex syntactic information (syntaxbased). Previous studies have compared their relative efficiency without coming to a definitive conclusion, but such examination has never been performed on small and specialised corpora.(More)
Most of Information Retrieval Systems transform natural language users’ queries into bags of words that are matched to documents, also represented as bags of words. Through such process, the richness of the query is lost. In this paper we show that linguistic features of a query are good indicators to predict systems failure to answer it. The experiments(More)
RESUME: Les systèmes de recherché d’information visent à optimiser les résultats qu’ils fournissent en réponse à une requête de l’utilisateur. Les performances de ces systèmes sont généralement mesurées par rapport à des collections de test communes, comme les collections de TREC (Text REtrieval Conférence). Cette évaluation est réalisée de façon globale,(More)
This paper reports on the procedure and learning models we adopted for the ‘PAN 2011 Author Identification’ challenge targetting real-world email messages. The novelty of our approach lies in a design which combines shallow characteristics of the emails (words and trigrams frequencies) with a large number of ad hoc linguistically-rich features addressing(More)