Learn More
The problem of scaling up data integration, such that new sources can be quickly utilized as they are discovered, remains elusive: global schemas for integrated data are difficult to develop and expand, and schema and record matching techniques are limited by the fact that data and metadata are often under-specified and must be disambiguated by data(More)
The problem of scaling up data integration, such that new sources can be quickly utilized as they are discovered, remains elusive: Global schemas for integrated data are difficult to develop and expand, and schema and record matching techniques are limited by the fact that data and metadata are often under-specified and must be disambiguated by data(More)
The database field has increasingly broadened past carefully controlled, closed-world data, to consider the much more complex space of data resources on the Web. In this area of “open” Web and contributed data, there are vast quantities of raw data — but there is a limited understanding about real or realistic usage scenarios and problems. In turn, this has(More)
Provenance is well-understood for relational query operators. Increasingly, however, data analytics is incorporating operations expressed through linear algebra: machine learning operations, network centrality measures, and so on. In this paper, we study provenance information for matrix data and linear algebra operations. Our core technique builds upon(More)
  • 1