Chris Wolf

Learn More
OpenII ( is a collaborative effort to create a suite of open-source tools for information integration (II). The project is leveraging the latest developments in II research to create a platform on which integration tools can be built and further research conducted. In addition to a scalable, extensible platform, OpenII includes(More)
A key aspect of any data integration endeavor is determining the relationships between the source schemata and the target schema. This schema integration task must be tackled regardless of the integration architecture or mapping formalism. In this paper, we provide a task model for schema integration. We use this breakdown to motivate a workbench for schema(More)
Large, dynamic, and ad-hoc organizations must frequently initiate data integration and sharing efforts with insufficient awareness of how organizational data sources are related. Decision makers need to reason about data model interactions much as they do about data instance interactions in OLAP: at multiple levels of granularity. We demonstrate an(More)
This paper describes a Name Matching Evaluation Laboratory that is a joint effort across multiple projects. The lab houses our evaluation infrastructure as well as multiple name matching engines and customized analytical tools. Included is an explanation of the methodology used by the lab to carry out evaluations. This methodology is based on standard(More)
We have analyzed system rankings for person name search algorithms using a data set for which several versions of ground truth were developed by employing different means of resolving adjudicator conflicts. Thirteen algorithms were ranked by F-score, using bootstrap resampling for significance testing, on a dataset containing 70,000 romanized names from(More)
Many data sharing communities create data standards (" hub " schemata) to speed information integration by increasing reuse of both data definitions and mappings. Unfortunately, creation of these standards and the mappings to the enterprise's implemented systems is both time consuming and expensive. This paper presents Unity, a novel tool for speeding the(More)
This demonstration presents Galaxy, a schema manager that facilitates easy and correct data sharing among autonomous but related, evolving data sources. Galaxy reduces heterogeneity by helping database developers identify, reuse, customize, and advertise related schema components. The central idea is that as sche-mata are customized, Galaxy maintains a(More)
Whereas strategies for discovering content on the surface web are commonplace, similar strategies for the private web are nonexistent. In this paper we first establish a formal framework for advertising the existence of private web resources that subsumes many existing summarization strategies based on succinct statistical summaries (which we call digests).(More)
The functional simulator Simics provides a co-simulation integration path with a SystemC simulation environment to create Virtual Platforms. With increasing complexity of the SystemC models, this platform suffers from performance degradation due to the single threaded nature of the integrated Virtual Platform. In this paper, we present a multi-threaded(More)
  • 1