- Full text PDF available (140)
- This year (3)
- Last 5 years (40)
- Last 10 years (78)
Data Set Used
Many bioinformatics applications would benefit from comparing proteins based on their biological role rather than their sequence. This manuscript adds two new contributions. First, a study of the correlation between Gene Ontology (GO) terms and family similarity demonstrates that protein families constitute an appropriate baseline for validating GO… (More)
Many Web pages are rich in geographic information and primarily relevant to geographically limited communities. However, existing IR systems only recently began to offer local services and largely ignore geo-spatial information. This paper presents our work on automatically identifying the geographical scope of Web documents, which provides the means to… (More)
Many bioinformatics applications would benefit from comparing proteins based on their biological role rather than their sequence. In most biological databases, proteins are already annotated with ontology terms. Previous studies identified a correlation between the sequence similarity and the semantic similarity of proteins. The semantic similarity of… (More)
This paper addresses document indexing and retrieval using geographical location. It discusses possible indexing structures and result ranking algorithms, surveying known approaches and showing how they can be combined to build an effective Geo-IR system.
This paper discusses the problem of automatically identifying the language of a given Web document. Previous experiments in language guessing focused on analyzing "coherent" text sentences, whereas this work was validated on texts from the Web, often presenting harder problems. Our language "guessing" software uses a well-known <i>n</i>-gram based… (More)
In this paper, we introduce a geographic similarity operator that computes the relatedness between two geographic places and describe how it is combined with textual ranking. The effectiveness of the geographic ranking is evaluated on the GeoCLEF 2005 collection. We considered various strategies for query formulation and for combining textual ant… (More)
This article presents a characterization of the community Web of the people of Portugal. We defined criteria for delimiting this Web based on our past experience of crawling pages related to Portugal and collected over 3.2 million documents from 46,000 sites satisfying those criteria. Our characterization was derived from this crawl. We describe the rules… (More)
This paper discusses evaluation of Geo-IR systems, arguing for a separate study of the different algorithmic components involved. It presents existing resources for evaluating the different components, together with a review on previous results in the area.