Learn More
We compare the performance of two database selection algorithms reported in the literature. Their performance is compared using a common testbed designed specifically for database selection techniques. The testbed is a decomposition of the TREC/TIPSTER data into 236 subcol-lections. The databases from our testbed were ranked using boththegGlOSS and CORI(More)
We describe a testbed for database selection techniques and an experiment conducted using this testbed. The testbed is a decomposition of the TREC/TIPSTER data that allows analysis of the data along multiple dimensions , including collection-based and temporal-based analysis. We characterize the subcollections in this testbed in terms of number of(More)
Using the vector space information retrieval model, we show that the update of term weights under document insertions is computationally expensive for weighting schemes that use collection statistics and normalization by document vector lengths. In the dynamic setting, we argue that strict adherence to such schemes is impractical and unnecessary x long as(More)
• collection management; • organizing and indexing the materials for storage We find that dissemination of collection-wide information (CWI) in a distributed collection of documents is needed to and retrieval; achieve retrieval effectiveness comparable to that of a central-• user interfaces and human-computer interaction; and ized collection. Complete(More)
In this paper we introduce the notion of content locality in distributed document collections. Content locality is the degree to which content-similar documents are colocated in a distributed collection. We propose two metrics for measurement of content locality, one based on topic signatures and the other based on collection statistics. We provide(More)
The proliferation of online information resources increases the importance of effective and efficient distributed searching. Distributed searching is cast in three parts — database selection, query processing, and results merging. In this paper we examine the effect of database selection on retrieval performance. We look at retrieval performance in(More)
This paper describes the design and implementation of the Legion run-time library LRTL, focusing speciically on facilities that enable extensibility and conngurability. These facilities include management of heterogeneous communication , an event-based m e chanism for inter-component communication, and automated memory management. The paper provides several(More)
During a 90-day period in 1994, we measured the availability and connection latency of HTTP (hypertext transfer protocol) information servers. These measurements 'were made from a site in the Eastern United States. The list of servers included 189 servers from Europe and 324 servers from North America. Our measurements indicate that on average, 5.0 percent(More)
We describe the conceptual architecture of a Personalized Information Environment or \PIE". A PIE allows uniied, highly customizable access to distributed information resources by providing users the tools to compose personalized collections from a palette of information resources. The architecture also provides for the eecient \exchange" of inter-resource(More)