Learn More
Information describing the origin of data, generally referred to as <i>provenance</i>, is important in scientific and curated databases where it is the basis for the trust one puts in their contents. Since such databases are constructed using operations of both query and update languages, it is of paramount importance to describe the effect of these(More)
If the XML data file doesn't refer to a schema, Excel infers the schema from the XML Note If you're importing multiple XML files that don't define a namespace, these In this case, Excel doesn't infer a schema, and you can't use an XML Map. SchemaScope: a system for inferring and cleaning XML schemas. SIGMOD Conference Inferring XML Schema Definitions from(More)
Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning <i>deterministic</i> regular expressions from sets of positive example words. Unfortunately, there is no algorithm capable of learning the complete class of deterministic regular expressions from positive examples only, as we(More)
Science, industry, and society are being revolutionized by radical new capabilities for information sharing, distributed computation, and collaboration offered by the World Wide Web. This revolution promises dramatic benefits but also poses serious risks due to the fluid nature of digital information. One important cross-cutting issue is managing and(More)
An intrinsic part of information extraction is the creation and manipulation of relations extracted from text. In this paper, we develop a foundational framework where the central construct is what we call a <i>spanner</i>. A spanner maps an input string into relations over the spans (intervals specified by bounding indices) of the string. The focus of this(More)
Motivated by both established and new applications, we study navigational query languages for graphs (binary relations). The simplest language has only the two operators union and composition, together with the identity relation. We make more powerful languages by adding any of the following operators: intersection; set difference; projection; coprojection;(More)
We consider the problem of inferring a concise Document Type Definition (DTD) for a given set of XML-documents, a problem that basically reduces to learning <i>concise</i> regular expressions from positive examples strings. We identify two classes of concise regular expressions&#8212;the single occurrence regular expressions (SOREs) and the chain regular(More)