Approximate Data Exchange

@inproceedings{Rougemont2007ApproximateDE,
  title={Approximate Data Exchange},
  author={Michel de Rougemont and Adrien Vieilleribi{\`e}re},
  booktitle={ICDT},
  year={2007}
}
We introduce approximate data exchange, by relaxing classical data exchange problems such as Consistency and Typechecking to their approximate versions based on Property Testing. It provides a natural framework for consistency and safety questions, which first considers approximate solutions and then exact solutions obtained with a Corrector. We consider a model based on transducers of words and trees, and study e-Consistency, i.e., the problem of deciding whether a given source instance I… 
Data exchange in the presence of arithmetic comparisons
TLDR
A novel chase procedure called AC-chase is defined which is a tree and it is proved that it produces a universal solution (appropriately defined to deal with arithmetic comparisons), which is the right tool for query answering in the case of unions of CQACs.
Data Exchange with Arithmetic Comparisons ∗
TLDR
It is shown that AC-chase computes a universal solution which can be used to compute certain answers for unions of conjunctive queries with arithmetic comparisons (UCQAC), and the complexity of existence of a solution is shown to be in NP.
Approximate Structural Consistency
TLDR
An approximate algorithm is described which decides if I is close to a target regular schema (DTD) and this property is testable, i.e. can be solved in time independent of the size of the input document, by just sampling I.
Approximate consistency for transformations on words and trees
Approximate Membership for Words and Trees
TLDR
An objective is to obtain sublinear algorithms for deciding properties of XML databases approximatively by investigating the properties of whether an unranked tree is valid for a DTD, or more generally, whether it is recognized by a tree automaton.
Providing best-effort services in dataspace systems
TLDR
This dissertation studies how to provide best-effort search, querying and browsing services in a dataspace system, even when precise schema mappings are not present, and proposes the concept of probabilistic schema mapping, with which it can return approximate answers even if precise mappings do not exist.
Query Relaxation across Heterogeneous Data Sources
TLDR
This paper proposes a technique to compute query relaxations of an input query that can be rewritten and evaluated in an environment of collaborating autonomous and heterogeneous data sources, and proposes both an exhaustive and an optimized heuristic algorithm to compute and evaluate these relaxations.
Approximate Validity of XML Streaming Data
TLDR
A SAX implementation of the statistical embedding associated with XML data allows to efficiently decide eps-validity to any DTD or Schema, for the Edit Distance with Moves and associates a generalized k-gram to unranked labelled trees from which any regular property can be approximately decided.
Approximate Queries on Big Heterogeneous Data
  • Verena Kantere
  • Computer Science
    2015 IEEE International Congress on Big Data
  • 2015
TLDR
Traditional techniques for query rewriting are extended, and heuristic algorithms to compute query approximations of an input query that can be rewritten and evaluated in an environment of collaborating autonomous and heterogeneous big data sources are proposed.
Data integration with uncertainty
TLDR
The concept of probabilistic schema mappings is introduced and it is shown that there are two possible semantics for such mappings: by-table semantics assumes that there exists a correct mapping but the author does not know what it is; by-tuple semantics assuming that the correct mapping may depend on the particular tuple in the source data.
...
...

References

SHOWING 1-10 OF 19 REFERENCES
Data exchange: semantics and query answering
TLDR
This paper gives an algebraic specification that selects, among all solutions to the data exchange problem, a special class of solutions that is called universal and shows that a universal solution has no more and no less data than required for data exchange and that it represents the entire space of possible solutions.
XML data exchange: consistency and query answering
TLDR
This paper starts looking into the basic properties of XML data exchange, that is, restructuring of XML documents that conform to a source DTD under a target DTD, and answering queries written over the target schema, and proves a dichotomy theorem that classifies data exchange settings into those over which query answering is tractable, and those overWhich it is coNP-complete.
Approximate Satisfiability and Equivalence
TLDR
The geometrical embedding is extended to extend the geometric embedding, and hence the tester algorithms, to infinite regular languages and to context-free languages, and can also test the equivalence between two regular properties on words, defined by monadic second order formulas.
Regular languages are testable with a constant number of queries
TLDR
This paper discusses testability of more complex languages and shows that the query complexity required for testing context free languages cannot be bounded by any function of /spl epsiv/.
Word problems requiring exponential time(Preliminary Report)
TLDR
A number of similar decidable word problems from automata theory and logic whose inherent computational complexity can be precisely characterized in terms of time or space requirements on deterministic or nondeterministic Turing machines are considered.
Composing schema mappings: Second-order dependencies to the rescue
TLDR
It is shown that the composition of finite sets of source-to-target tgds is always definable by a second-order tgd, and that second-orders possess good properties for data exchange, and introduces a class of existential second- order formulas with function symbols, which are made a case that they are the "right" language for composing schema mappings.
Correctors for XML Data
TLDR
It is shown how testers and correctors for regular trees can be used to estimate distances between a document and a set of DTDs, a useful operation to rank XML documents.
Property testing and its connection to learning and approximation
TLDR
The authors study the question of determining whether an unknown function has a particular property or is /spl epsiv/-far from any function with that property, and devise algorithms to test whether a graph has properties such as being k-colorable or having a /spl rho/-clique.
XML stream processing using tree-edit distance embeddings
TLDR
These are the first algorithmic results on low-distortion embeddings for tree-edit distance metrics, and on correlating XML data in the streaming model.
Robust Characterizations of Polynomials with Applications to Program Testing
TLDR
The characterizations provide results in the area of coding theory by giving extremely fast and efficient error-detecting schemes for some well-known codes and play a crucial role in subsequent results on the hardness of approximating some NP-optimization problems.
...
...