Sampling in Space Restricted Settings

  title={Sampling in Space Restricted Settings},
  author={Anup Bhattacharya and Davis Issac and Ragesh Jaiswal and Amit Kumar},
Space efficient algorithms play an important role in dealing with large amount of data. In such settings, one would like to analyze the large data using small amount of “working space”. One of the key steps in many algorithms for analyzing large data is to maintain a (or a small number) random sample from the data points. In this paper, we consider two space restricted settings—(i) the streaming model, where data arrives over time and one can use only a small amount of storage, and (ii) the… Expand
3 Citations
Streaming PTAS for Constrained k-Means
A D^2-sampling based algorithm running in a single iteration allows us to design a 2-pass, logspace streaming algorithm for the list-$k$-means problem, and significantly improves the running time of the known algorithm. Expand
Technical Report Column
A glossary of fundamental principles and practical applications of quantum mechanics, as well as some suggestions for future research, is provided. Expand
Bulk-synchronous pseudo-streaming algorithms for many-core accelerators
The bulk-synchronous parallel (BSP) model is extended to support pseudo-streaming algorithms for accelerators, and the BSP cost function is generalized to these algorithms, so that it is possible to predict the running time for programs targeting many-core accelerators and to identify possible bottlenecks. Expand


Succinct sampling from discrete distributions
The results improve upon the space requirement of the classic solution for a fundamental sampling problem and provide the strongest known separation between the systematic and non-systematic case for any data structure problem. Expand
Sampling from a moving window over streaming data
This work introduces the problem of sampling from a moving window of recent items from a data stream and develops two algorithms, the first of which, "chain-sample", extends reservoir sampling to deal with the expiration of data elements from the sample and the second, "priority- sample", works even when the number of elements in the window can vary dynamically over time. Expand
Faster methods for random sampling
The main result of this paper is the design and analysis of Algorithm D, which does the sampling in O(n) time, on the average; roughly n uniform random variates are generated, and approximately n exponentiation operations are performed during the sampling. Expand
Efficient Sampling Methods for Discrete Distributions
Efficient preprocessing algorithms that allow for asymptotically optimal querying, and prove almost matching lower bounds for their complexity are presented. Expand
Sampling streaming data with replacement
A with-replacement reservoir sampling algorithm of sub-linear time complexity is introduced and a thorough complexity analysis of several approaches to the with- Replacement reservoir sampling problem is provided. Expand
Random sampling with a reservoir
Theoretical and empirical results indicate that Algorithm Z outperforms current methods by a significant margin, and an efficient Pascal-like implementation is given that incorporates these modifications and that is suitable for general use. Expand
A Simple D2-Sampling Based PTAS for k-Means and Other Clustering Problems
The power of D2-sampling is demonstrated by giving a simple randomized (1+ϵ)-approximation algorithm that uses the D2 to have nice properties with respect to the k-means clustering problem. Expand
A Simple D 2-Sampling Based PTAS for k-Means and other Clustering Problems
This paper studies the objective function d(x,C)2, which denotes the distance between x and the closest center in C, which is one of the most prominent objective functions that have been studied with respect to clustering. Expand
Reservoir-sampling algorithms of time complexity O(n(1 + log(N/n)))
Vitter's reservoir-sampling algorithm, Vitter's Z, is modified to give a more efficient algorithm, algorithm K, and two new algorithms, algorithm L and algorithm M, are proposed. Expand
On the Alias Method for Generating Random Variables From a Discrete Distribution
Abstract The alias method of Walker is a clever, new, fast method for generating random variables from an arbitrary, specified discrete distribution. A simple probabilistic proof is given, in termsExpand