Learn More
Edit distance based string similarity join is a fundamental operator in string databases. Increasingly, many applications in data cleaning, data integration, and scientific computing have to deal with fuzzy information in string attributes. Despite the intensive efforts devoted in processing (deterministic) string joins and managing probabilistic data(More)
The problem of scaling up data integration, such that new sources can be quickly utilized as they are discovered, remains elusive: global schemas for integrated data are difficult to develop and expand , and schema and record matching techniques are limited by the fact that data and metadata are often under-specified and must be disambiguated by data(More)
The database field has increasingly broadened past carefully controlled, closed-world data, to consider the much more complex space of data resources on the Web. In this area of " open " Web and contributed data, there are vast quantities of raw data — but there is a limited understanding about real or realistic usage scenarios and problems. In turn, this(More)
The problem of scaling up data integration, such that new sources can be quickly utilized as they are discovered, remains elusive: Global schemas for integrated data are difficult to develop and expand, and schema and record matching techniques are limited by the fact that data and metadata are often under-specified and must be disambiguated by data(More)
Provenance is well-understood for relational query operators. Increasingly , however, data analytics is incorporating operations expressed through linear algebra: machine learning operations, network centrality measures, and so on. In this paper, we study prove-nance information for matrix data and linear algebra operations. Our core technique builds upon(More)
  • 1