Learn More
Curated databases are databases that are populated and updated with a great deal of human effort. Most reference works that one traditionally found on the reference shelves of libraries – dictionaries, encyclopedias, gazetteers etc. – are now curated databases. Since it is now easy to publish databases on the web, there has been an explosion in the number(More)
Information describing the origin of data, generally referred to as <i>provenance</i>, is important in scientific and curated databases where it is the basis for the trust one puts in their contents. Since such databases are constructed using operations of both query and update languages, it is of paramount importance to describe the effect of these(More)
If the XML data file doesn't refer to a schema, Excel infers the schema from the XML Note If you're importing multiple XML files that don't define a namespace, these In this case, Excel doesn't infer a schema, and you can't use an XML Map. SchemaScope: a system for inferring and cleaning XML schemas. SIGMOD Conference Inferring XML Schema Definitions from(More)
Inferring an appropriate DTD or XML Schema Definition (XSD) for a given collection of XML documents essentially reduces to learning <i>deterministic</i> regular expressions from sets of positive example words. Unfortunately, there is no algorithm capable of learning the complete class of deterministic regular expressions from positive examples only, as we(More)
An intrinsic part of information extraction is the creation and manipulation of relations extracted from text. In this paper, we develop a foundational framework where the central construct is what we call a <i>spanner</i>. A spanner maps an input string into relations over the spans (intervals specified by bounding indices) of the string. The focus of this(More)
Motivated by both established and new applications, we study navigational query languages for graphs (binary relations). The simplest language has only the two operators union and composition, together with the identity relation. We make more powerful languages by adding any of the following operators: intersection; set difference; projection; coprojection;(More)
We consider the problem of inferring a concise Document Type Definition (DTD) for a given set of XML-documents, a problem that basically reduces to learning <i>concise</i> regular expressions from positive examples strings. We identify two classes of concise regular expressions&#8212;the single occurrence regular expressions (SOREs) and the chain regular(More)
Science, industry, and society are being revolutionized by radical new capabilities for information sharing, distributed computation, and collaboration offered by the World Wide Web. This revolution promises dramatic benefits but also poses serious risks due to the fluid nature of digital information. One important cross-cutting issue is managing and(More)