The XML web: a first study


Although originally designed for large-scale electronic publishing, XML plays an increasingly important role in the exchange of data on the Web. In fact, it is expected that XML will become the lingua franca of the Web, eventually replacing HTML. Not surprisingly, there has been a great deal of interest on XML both in industry and in academia. Nevertheless, to date no comprehensive study on the XML Web (i.e., the subset of the Web made of XML documents only) nor on its contents has been made. This paper is the first attempt at describing the XML Web and the documents contained in it. Our results are drawn from a sample of a repository of the publicly available XML documents on the Web, consisting of about 200,000 documents. Our results show that, despite its short history, XML already permeates the Web, both in terms of generic domains and geographically. Also, our results about the contents of the XML Web provide valuable input for the design of algorithms, tools and systems that use XML in one form or another.

DOI: 10.1145/775152.775223

Extracted Key Phrases

18 Figures and Tables


Citations per Year

144 Citations

Semantic Scholar estimates that this publication has 144 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Mignet2003TheXW, title={The XML web: a first study}, author={Laurent Mignet and Denilson Barbosa and Pierangelo Veltri}, booktitle={WWW}, year={2003} }