Carina F. Dorneles

Learn More
TheWeb can be considered a vast repository of temporal information, as it daily receives a huge amount of new pages. Generally, users are interested in information related to a specific temporal interval. In the information retrieval area, researches have newly incorporated the temporal dimension to the search engines. This paper presents a comprehensive(More)
In this paper, we propose a set of similarity metrics for manipulating collections of values occuring in XML documents. Following the data model presented in TAX algebra, we treat an XML element as a labeled ordered rooted tree. Consider that XML nodes can be either atomic, i.e, they may contain single values such as short character strings, date, etc, or(More)
XML has been explored by both research and industry communities. More than 5500 papers were published on different aspects of XML. With so many publications, it is hard for someone to decide where to start. Hence, this paper presents some of the research topics on XML, namely: XML on relational databases, query processing, views, data matching, and schema(More)
The Web is the largest repository of data available, with over 150 million high-quality tables. Several works have combined efforts to allow queries on these tables, but there are still challenges, like the various different types of structures found on the Web. In this paper, we propose a taxonomy for the tabular structures and formalize the ones used with(More)
In this paper, we propose an approach for providing support to temporal queries on XML keyword search engines. Our proposal is based on identifying temporal constraints in a keyword query and intercepting the query processing, executed by a conventional XML search engine, in order to evaluate those constraints. Our approach allows users to find the temporal(More)
The goal of approximate data matching is to assess whether two distinct data instances represent the same real world object. This is usually achieved through the use of a similarity function, which returns a score that defines how similar two data instances are. If this score surpasses a given threshold, both data instances are considered as representing(More)
Document classification is critical to optimize information retrieval tasks, especially over the web. In this environment, the open domain nature and growing volume of available data remain a challenge for the classification task. In this paper, we deal with these problems by only using knowledge resources. Our approach relies on concepts instances derived(More)