Huyen-Trang Vu

Learn More
This paper investigates the effect of performance measures and relevance functions in comparing retrieval systems in INEX, an evaluation forum dedicated to XML retrieval. We focus on two interdependent challenges which arise when evaluating XML retrieval systems, namely weak ordering issue of retrieved lists and multivalued relevance scales. Our analysis(More)
Test collections are essential to evaluate Information Retrieval (IR) systems. The relevance assessment set has been recognized as the key bottleneck in test collection building, especially on very large sized document collections. This paper addresses the problem of efficiently selecting documents to be included in the assessment set. We will show how(More)
RÉSUMÉ. La constitution de corpus d’évaluation est une étape essentielle pour évaluer la performance des systèmes de recherche d’information. Le coût de développement de tels corpus est en général assez élevé à cause en particulier de l’effort humain nécessaire à l’évaluation de la pertinence des documents pour chaque requête. Cette difficulté devient un(More)
: We present a learning model for categorization of structured documents that takes into account both structural information and textual information. We first define a generative model of structured documents using belief networks. Then we transform the generative model into a discriminant one using the Fisher kernel. Finally, we describe an instance of(More)
We present a bayesian framework for XML document retrieval. This framework allows us to consider content only. We perform the retrieval task using inference in our network. Our model can adapt to a specific corpus through parameter learning and uses a grammar to speed up the retrieval process in big or distributed databases. We also experimented list(More)
This paper studies the sensitivity of four metasearch engines under different situations. The focus of this analysis is on trainable metasearch engines. Our main contribution is a large scale systematic analysis of the performance and behavior of these methods on several corpora. Firstly, we analyze how the choice and normalization of the relevance score(More)
In the XML retrieval paradigm, document fragments may be returned as answers to a user query. This information being more specific than whole documents may therefore reduce the user effort for finding relevant information. However, since XML documents are composed of nested elements, many of which being possibly relevant to the user information need,(More)
  • 1