Learn More
Information describing the origin of data, generally referred to as <i>provenance</i>, is important in scientific and curated databases where it is the basis for the trust one puts in their contents. Since such databases are constructed using operations of both query and update languages, it is of paramount importance to describe the effect of these(More)
If the XML data file doesn't refer to a schema, Excel infers the schema from the XML Note If you're importing multiple XML files that don't define a namespace, these In this case, Excel doesn't infer a schema, and you can't use an XML Map. SchemaScope: a system for inferring and cleaning XML schemas. SIGMOD Conference Inferring XML Schema Definitions from(More)
Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning <i>deterministic</i> regular expressions from sets of positive example words. Unfortunately, there is no algorithm capable of learning the complete class of deterministic regular expressions from positive examples only, as we(More)
Science, industry, and society are being revolutionized by radical new capabilities for information sharing, distributed computation, and collaboration offered by the World Wide Web. This revolution promises dramatic benefits but also poses serious risks due to the fluid nature of digital information. One important cross-cutting issue is managing and(More)
Regular expression patterns provide a natural, declarative way to express constraints on semistructured data and to extract relevant information from it. Indeed, it is a core feature of the programming language Perl, surfaces in various UNIX tools such as sed and awk, and has recently been proposed in the context of the XML programming language XDuce. Since(More)
An intrinsic part of information extraction is the creation and manipulation of relations extracted from text. In this paper, we develop a foundational framework where the central construct is what we call a <i>spanner</i>. A spanner maps an input string into relations over the spans (intervals specified by bounding indices) of the string. The focus of this(More)
Motivated by both established and new applications, we study navigational query languages for graphs (binary relations). The simplest language has only the two operators union and composition, together with the identity relation. We make more powerful languages by adding any of the following operators: intersection; set difference; projection; coprojection;(More)