Learn More
We compare the performance of two database selection algorithms reported in the literature. Their performance is compared using a common testbed designed specifically for database selection techniques. The testbed is a decomposition of the TREC/TIPSTER data into 236 subcol-lections. The databases from our testbed were ranked using boththegGlOSS and CORI(More)
We describe a testbed for database selection techniques and an experiment conducted using this testbed. The testbed is a decomposition of the TREC/TIPSTER data that allows analysis of the data along multiple dimensions , including collection-based and temporal-based analysis. We characterize the subcollections in this testbed in terms of number of(More)
This paper describes an algorithm for calculating the biovolume of cells with simple shapes, such as bacteria, flagellates, and simple ciliates, from a 2-dimensional digital image. The method can be adapted to any image analysis system which allows access to the binary cell image--(i.e., the pixels, or (x,y) points, composing the cell. The cell image is(More)
Using the vector space information retrieval model, we show that the update of term weights under document insertions is computationally expensive for weighting schemes that use collection statistics and normalization by document vector lengths. In the dynamic setting, we argue that strict adherence to such schemes is impractical and unnecessary x long as(More)
• collection management; • organizing and indexing the materials for storage We find that dissemination of collection-wide information (CWI) in a distributed collection of documents is needed to and retrieval; achieve retrieval effectiveness comparable to that of a central-• user interfaces and human-computer interaction; and ized collection. Complete(More)
In this paper we introduce the notion of content locality in distributed document collections. Content locality is the degree to which content-similar documents are colocated in a distributed collection. We propose two metrics for measurement of content locality, one based on topic signatures and the other based on collection statistics. We provide(More)
The proliferation of online information resources increases the importance of effective and efficient distributed searching. Distributed searching is cast in three parts — database selection, query processing, and results merging. In this paper we examine the effect of database selection on retrieval performance. We look at retrieval performance in(More)
Accurate measurement of the biomass and size distribution of picoplankton cells (0.2 to 2.0 microns) is paramount in characterizing their contribution to the oceanic food web and global biogeochemical cycling. Image-analyzed fluorescence microscopy, usually based on video camera technology, allows detailed measurements of individual cells to be taken. The(More)
This paper describes the design and implementation of the Legion run-time library LRTL, focusing speciically on facilities that enable extensibility and conngurability. These facilities include management of heterogeneous communication , an event-based m e chanism for inter-component communication, and automated memory management. The paper provides several(More)