Mário J. Silva

Learn More
Many bioinformatics applications would benefit from comparing proteins based on their biological role rather than their sequence. This manuscript adds two new contributions. First, a study of the correlation between Gene Ontology (GO) terms and family similarity demonstrates that protein families constitute an appropriate baseline for validating GO(More)
Many bioinformatics applications would benefit from comparing proteins based on their biological role rather than their sequence. In most biological databases, proteins are already annotated with ontology terms. Previous studies identified a correlation between the sequence similarity and the semantic similarity of proteins. The semantic similarity of(More)
Many Web pages are rich in geographic information and primarily relevant to geographically limited communities. However, existing IR systems only recently began to offer local services and largely ignore geo-spatial information. This paper presents our work on automatically identifying the geographical scope of Web documents, which provides the means to(More)
This paper discusses the problem of automatically identifying the language of a given Web document. Previous experiments in language guessing focused on analyzing "coherent" text sentences, whereas this work was validated on texts from the Web, often presenting harder problems. Our language "guessing" software uses a well-known <i>n</i>-gram based(More)
Models of web data persistency are essential tools for the designof efficient information extraction systems that repeatedlycollect and process the data. This study models the persistence ofweb data through the measurement of URL and content persistenceacross several snapshots of a national community web, collectedfor 3 years. We found that the lifetimes of(More)
A new formulation for finding the existence of a Boolean match between two functions with don’t cares is presented. An algorithm for Boolean matching is developed based on this new formulation and is used within a technology mapper as a substitute for tree matching algorithms. The new algorithm is fast and uses symmetries of the gates in the library to(More)
This article presents a characterization of the community Web of the people of Portugal. We defined criteria for delimiting this Web based on our past experience of crawling pages related to Portugal and collected over 3.2 million documents from 46,000 sites satisfying those criteria. Our characterization was derived from this crawl. We describe the rules(More)
We introduce a deep neural network for automated sarcasm detection. Recent work has emphasized the need for models to capitalize on contextual features, beyond lexical and syntactic cues present in utterances. For example, different speakers will tend to employ sarcasm regarding different subjects and, thus, sarcasm detection models ought to encode such(More)
In this paper, we introduce a geographic similarity operator that computes the relatedness between two geographic places and describe how it is combined with textual ranking. The effectiveness of the geographic ranking is evaluated on the GeoCLEF 2005 collection. We considered various strategies for query formulation and for combining textual ant(More)