Giansalvatore Mecca

Learn More
The paper investigates techniques for extracting data from HTML sites through the use of automatically generated wrappers. To automate the wrapper generation and the data extraction process, the paper develops a novel technique to compare HTML pages and generate a wrapper based on their similarities and differences. Experimental results on real-life(More)
Many Web sites include signi cant and substantial pieces of information, in a way that is often di cult to share, correlate and maintain. In many cases the management of a Web site can greatly bene t from the adoption of methods and techniques borrowed from the database eld. This paper introduces a methodology for designing and maintaining large Web sites(More)
The paper discusses the issue of views in the Web context. We introduce a set of languages for managing and restructuring data coming from the World Wide Web. We present a specific data model, called the ARANEUS Data Model, inspired to the structures typically present in Web sites. The model allows us to describe the scheme of a Web hypertext, in the spirit(More)
The paper describes the ARANEUS Wel-Base Management System [l, 5, 4, 61, a system developed at Universitb di Roma Tre, which represents a proposal towards the definition of a new kind of data-repository, designed to manage Web data in the database style. We call a WebBase a collection of data of heterogeneous nature, and more specifically: (i) highly(More)
Data extraction from web pages is performed by software modules called wrappers. Recently, some systems for the automatic generation of wrappers have been proposed in the literature. These systems are based on unsupervised inference techniques: taking as input a small set of sample pages, they can produce a common wrapper to extract relevant data. However,(More)
Data-intensive Web sites are large sites based on a back-end database, with a fairly complex hypertext structure. The paper develops two main contributions: (a) a specific design methodology for data-intensive Web sites, composed of a set of steps and design transformations that lead from a conceptual specification of the domain of interest to the actual(More)
Data-cleaning (or data-repairing) is considered a crucial problem in many database-related tasks. It consists in making a database consistent with respect to a set of given constraints. In recent years, repairing methods have been proposed for several classes of constraints. However, these methods rely on ad hoc decisions and tend to hard-code the strategy(More)
The paper develops Editor, a language for manipulating semi-structured documents, such as the ones typically available on the Web. Editor programs are based on two simple ideas, taken from text editors: \search" instructions are used to select regions of interest in a document, and \cut & paste" to restructure them. We study the expressive power and the(More)
Schema mapping algorithms rely on value correspondences - i.e., correspondences among semantically related attributes - to produce complex transformations among data sources. These correspondences are either manually specified or suggested by separate modules called schema matchers. The quality of mappings produced by a mapping generation tool strongly(More)