Learn More
OpenII (openintegration.org) is a collaborative effort to create a suite of open-source tools for information integration (II). The project is leveraging the latest developments in II research to create a platform on which integration tools can be built and further research conducted. In addition to a scalable, extensible platform, OpenII includes(More)
A key aspect of any data integration endeavor is determining the relationships between the source schemata and the target schema. This schema integration task must be tackled regardless of the integration architecture or mapping formalism. In this paper, we provide a task model for schema integration. We use this breakdown to motivate a workbench for schema(More)
This demonstration presents Galaxy, a schema manager that facilitates easy and correct data sharing among autonomous but related, evolving data sources. Galaxy reduces heterogeneity by helping database developers identify, reuse, customize, and advertise related schema components. The central idea is that as schemata are customized, Galaxy maintains a(More)
Large, dynamic, and ad-hoc organizations must frequently initiate data integration and sharing efforts with insufficient awareness of how organizational data sources are related. Decision makers need to reason about data model interactions much as they do about data instance interactions in OLAP: at multiple levels of granularity. We demonstrate an(More)
This paper describes a Name Matching Evaluation Laboratory that is a joint effort across multiple projects. The lab houses our evaluation infrastructure as well as multiple name matching engines and customized analytical tools. Included is an explanation of the methodology used by the lab to carry out evaluations. This methodology is based on standard(More)
Many data sharing communities create data standards (“hub” schemata) to speed information integration by increasing reuse of both data definitions and mappings. Unfortunately, creation of these standards and the mappings to the enterprise's implemented systems is both time consuming and expensive. This paper presents Unity, a novel tool for(More)
In this demonstration, we exhibit a new type of provenance system, one that is not tied to any particular domain, closed-world system or use. The PLUS provenance system was inspired by government requirements to enable provenance capture, storage and use across multi-organizational systems. PLUS is general enough to interact across open-world distributed(More)
We have analyzed system rankings for person name search algorithms using a data set for which several versions of ground truth were developed by employing different means of resolving adjudicator conflicts. Thirteen algorithms were ranked by F-score, using bootstrap resampling for significance testing, on a dataset containing 70,000 romanized names from(More)
This paper presents a methodology for extracting useful spatial signals using the motor drive as the sensor during servo operation. Spatially dependent phenomena yield nonstationary frequency content during servo operation. With variable frequency and frequency-dependent amplitude in the current and torque signals, the underlying valuable spatial(More)
Whereas strategies for discovering content on the surface web are commonplace, similar strategies for the private web are nonexistent. In this paper we first establish a formal framework for advertising the existence of private web resources that subsumes many existing summarization strategies based on succinct statistical summaries (which we call digests).(More)