A methodology for measuring structure similarity of fuzzy XML documents
In this paper, we focus on XML data integration by studying rewritings of XML target schemas in terms of source schemas. Rewriting is very important in data integration systems where the system is asked to find and assemble XML documents from the data sources and produce documents which satisfy a target schema. As schema representation, we consider Visibly Pushdown Automata (VPAs) which accept Visibly Pushdown Languages (VPLs). The latter have been shown to coincide with the family of (word-encoded) regular tree languages which are the basis of formalisms for specifying XML schemas. Furthermore, practical semi-formal XML schema specifications (defined by simple pattern conditions on XML) compile into VPAs which are exponentially more concise than other representations based on tree automata. Notably, VPLs enjoy a "well-behavedness" which facilitates us in addressing rewriting problems for XML data integration. Based on VPAs, we positively solve these problems, and present detailed complexity analyses.