Demand-based document dissemination to reduce traffic and balance load in distributed information systems
Research on replication techniques to reduce traac and minimize the latency of information retrieval in a distributed system has concentrated on client-based caching, whereby recently/frequently accessed information is cached at a client (or at a proxy thereof) in anticipation of future accesses. We believe that such myopic solutions|focussing exclusively on a particular client or set of clients|are likely to have a limited impact. Instead, we ooer a solution that allows the replication of information to be done on a global sup-ply/demand basis. We propose a hierarchical demand-based replication strategy that optimally disseminates information from its producer to servers that are closer to its consumers. The level of dissemination depends on the relative popularity of documents, and on the expected reduction in traac that results from such dissemination. We used extensive HTTP logs to validate an analytical model of server popularity and le access prooles. Using that model we show that by disseminating the most popular documents on servers closer to clients, network traac could be reduced considerably , while servers are load-balanced. We argue that this process could be generalized to provide for an automated server-based information dissemination protocol that will be more eeective in reducing both network bandwidth and document retrieval times than client-based caching protocols.