Gregory Cobena

Learn More
We present a change-centric method to manage versions in a Web WareHouse of XML data. The starting points is a sequence of snapshots of XML documents we obtain from the web. By running a diff algorithm, we compute the changes between two consecutive versions. We then represent the sequence using a novel representation of changes based on completed deltas(More)
The advent of XML as a universal exchange format, and of Web services as a basis for distributed computing, has fostered the apparition of a new class of documents: <i>dynamic XML documents</i>. These are XML documents where some data is given explicitly while other parts are given only intensionally by means of embedded calls to web services that can be(More)
We consider the monitoring of a flow of incoming documents. More precisely, we present here the monitoring used in a very large warehouse built from XML documents found on the web. The flow of documents consists in XML pages (that are warehoused) and HTML pages (that are not). Our contributions are the following:<ul><li>a subscription language which(More)
The web is a more and more valuable source of information and organizations are involved in archiving (portions of) it for various purposes, e.g., the Internet Archive www.archive.org. A new mission of the French National Library (BnF) is the “dépôt légal” (legal deposit) of the French web. We describe here some preliminary work on the topic conducted by(More)
<realEstate> <property ID="101" type="studio"> <location> Rue de Clignancourt, Paris 18</location> <surface> 30 sqm </surface> <descr> Nice view on Montmartre </descr> <price> 40.000</price> <sc>getTransportInfo("../location")</sc> <transport>Nearest metro station: Barbes Buses: 68 <sc>getRoute("68")</sc> </transport> </property> ... </realEstate>(More)