Learn More
We present a novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under <i>l</i><sub>p</sub> norm, based on <i>p</i>-stable distributions.Our scheme improves the running time of the earlier algorithm for the case of the <i>l</i><sub>p</sub> norm. It also yields the first known provably efficient approximate NN algorithm for(More)
In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying <i>data streams.</i> In addition to reviewing past work relevant to data stream systems and current projects in(More)
STREAM is a general-purpose relational Data Stream Management System (DSMS). STREAM supports a declarative query language and flexible query execution plans. It is designed to cope with high data rates and large numbers of continuous queries through careful resource allocation and use, and by degrading gracefully to approximate answers as necessary. A(More)
Several approaches to collaborative filtering have been studied but seldom have studies been reported for large (several millionusers and items) and dynamic (the underlying item set is continually changing) settings. In this paper we describe our approach to collaborative filtering for generating personalized recommendations for users of Google News. We(More)
This paper describes our ongoing work developing the Stanford Stream Data Manager (STREAM), a system for executing continuous queries over multiple continuous data streams. The STREAM system supports a declarative query language, and it copes with high data rates and query workloads by providing approximate answers when resources are limited. This paper(More)
Association-rule mining has heretofore relied on the conditionof high support to do its work efficiently. In particular, the well-known a-priori algorithm is only effective when the only rules of interest are relationships that occur very frequently. However, there are a number of applications, such as data mining, identification of similar web documents,(More)
Systems for processing continuous monitoring queries over data streams must be adaptive because data streams are often bursty and data characteristics may vary over time. We focus on one particular type of adaptivity: the ability to gracefully degrade performance via "load shedding" (dropping unprocessed tuples to reduce system load) when the demands placed(More)
We introduce the problem of sampling from a moving window of recent items from a data stream and develop two algorithms for this problem. The first algorithm, "chain-sample", extends reservoir sampling to deal with the expiration of data elements from the sample. The expected memory usage of our algorithm is <i>O</i>(<i>k</i>) when maintaining a sample of(More)
In many applications involving continuous data streams, data arrival is bursty and data rate fluctuates over time. Systems that seek to give rapid or real-time query responses in such an environment must be prepared to deal gracefully with bursts in data arrival without compromising system performance. We discuss one strategy for processing bursty streams(More)