Learn More
Information describing the origin of data, generally referred to as <i>provenance</i>, is important in scientific and curated databases where it is the basis for the trust one puts in their contents. Since such databases are constructed using operations of both query and update languages, it is of paramount importance to describe the effect of these(More)
Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning <i>deterministic</i> regular expressions from sets of positive example words. Unfortunately, there is no algorithm capable of learning the complete class of deterministic regular expressions from positive examples only, as we(More)
An intrinsic part of information extraction is the creation and manipulation of relations extracted from text. In this paper, we develop a foundational framework where the central construct is what we call a <i>spanner</i>. A spanner maps an input string into relations over the spans (intervals specified by bounding indices) of the string. The focus of this(More)
Motivated by both established and new applications, we study navigational query languages for graphs (binary relations). The simplest language has only the two operators union and composition, together with the identity relation. We make more powerful languages by adding any of the following operators: intersection; set difference; projection; coprojection;(More)
We consider the problem of inferring a concise Document Type Definition (DTD) for a given set of XML-documents, a problem that basically reduces to learning <i>concise</i> regular expressions from positive examples strings. We identify two classes of concise regular expressions&#8212;the single occurrence regular expressions (SOREs) and the chain regular(More)
Science, industry, and society are being revolutionized by radical new capabilities for information sharing, distributed computation, and collaboration offered by the World Wide Web. This revolution promises dramatic benefits but also poses serious risks due to the fluid nature of digital information. One important cross-cutting issue is managing and(More)
Provenance has been studied extensively in both database and workflow management systems, so far with little convergence of definitions or models. Provenance in databases has generally been defined for relational or complex object data, by propagating fine-grained annotations or algebraic expressions from the input to the output. This kind of provenance has(More)