Data Set Used
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale… (More)
In this paper, Garnata, an information retrieval system for XML documents is presented. This system is specifically designed for implementing Bayesian network-based models for structured documents. We show its architecture and performance from the indexing and the retrieval points of view, coming to the conclusion that the system is flexible and fast.
We propose a method which, given a document to be classified , automatically generates an ordered set of appropriate descriptors extracted from a thesaurus. The method creates a Bayesian network to model the thesaurus and uses probabilistic inference to select the set of descriptors having high posterior probability of being relevant given the available… (More)
A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for… (More)
Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark. Handbook of research on text and web mining techologies / Min Song and Yi-Fang Wu, editors. p. cm. Includes bibliographical… (More)
In this paper, we propose the V-index (or, Virtuosity index) as a novel metric to assess the scientific virtuosity of academics. This index can be applied to researchers and journals as well. In particular, we show that the V-index fills the gap of h-index and similar metrics in considering the self-citations of authors or journals. The paper provides with… (More)
We propose a method which, given a document to be classified, automatically generates an ordered set of appropriate descriptors extracted from a thesaurus. The method creates a Bayesian network to model the thesaurus and uses probabilistic inference to select the set of descriptors having high posterior probability of being relevant given the available… (More)
OBJECTIVE In the context of "network medicine", gene prioritization methods represent one of the main tools to discover candidate disease genes by exploiting the large amount of data covering different types of functional relationships between genes. Several works proposed to integrate multiple sources of data to improve disease gene prioritization, but to… (More)
This paper exposes the results of our participation in INEX'06. Two runs were submitted to the Ad Hoc Thorough track obtained with Garnata, our Information Retrieval system for structured documents. We have implemented two different models based on Influence Diagrams, the SID and CID models. The result of this first participation has been very poor. In the… (More)
In this work we propose new utility models for the struc-tured information retrieval system Garnata, and expose the results of our participation at INEX'08 in the AdHoc track using this system.