- Full text PDF available (12)
The task Task: Map a finite string α over an infinite alphabet onto a finite word w over a binary alphabet such that w reflects the structure of α "optimally".
Most modern implementations of regular expression engines allow the use of variables (also called backreferences). The resulting extended regular expressions (which, in the literature, are also called practical regular expressions, rewbr, or regex) are able to express non-regular languages. The present paper demonstrates that extended regular-expressions… (More)
We study the expressiveness and the complexity of static analysis of extended conjunctive regular path queries (ECRPQs), introduced by Barceló et al. (PODS '10). ECRPQs are an extension of con-junctive regular path queries (CRPQs), a well-studied language for querying graph structured databases. Our first main result shows that query containment and… (More)
We study the problem of generalizing from a finite sample to a language taken from a predefined language class. The two language classes we consider are subsets of the regular languages and have significance in the specification of XML documents (the classes corresponding to so called <i>chain regular expressions</i>, Chares, and to <i>single occurrence… (More)
We examine document spanners, a formal framework for information extraction that was introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015). A document spanner is a function that maps an input string to a relation over spans (intervals of positions of the string). We focus on document spanners that are defined by regex formulas, which… (More)
• This is the author's version of a work that was accepted for publication in the journal, Theoretical Computer Science. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reected in this document. Changes may have been made to this work since it… (More)
Document spanners are a formal framework for information extraction that was introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015). One of the central models in this framework are core spanners, which are based on regular expressions with variables that are then extended with an algebra. As shown by Freydenberger and Holldack (ICDT… (More)