• Publications
  • Influence
Design Tradeoffs for SSD Performance
TLDR
It is found that SSD performance and lifetime is highly workload-sensitive, and that complex systems problems that normally appear higher in the storage stack, or even in distributed systems, are relevant to device firmware.
Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web
TLDR
A family of caching protocols for distrib-uted networks that can be used to decrease or eliminate the occurrence of hot spots in the network, based on a special kind of hashing that is called consistent hashing.
The smallest grammar problem
TLDR
This paper shows that every efficient algorithm for the smallest grammar problem has approximation ratio at least 8569/8568 unless P=NP, and bound approximation ratios for several of the best known grammar-based compression algorithms, including LZ78, B ISECTION, SEQUENTIAL, LONGEST MATCH, GREEDY, and RE-PAIR.
Spamming botnets: signatures and characteristics
TLDR
An in-depth analysis of the identified botnets revealed several interesting findings regarding the degree of email obfuscation, properties of botnet IP addresses, sending patterns, and their correlation with network scanning traffic.
Heuristics for Vector Bin Packing
TLDR
This work systematically studies variants of the First Fit Decreasing (FFD) algorithm for the Vector Bin Packing problem, and proposes new geometric heuristics that run nearly as fast as FFD for reasonable values of n and d.
An Improved Construction for Counting Bloom Filters
TLDR
A simple hashing-based alternative based on d- left hashing called a d-left CBF (dlCBF), which offers the same functionality as a CBF, but uses less space, generally saving a factor of two or more.
Achieving anonymity via clustering
TLDR
This is the first set of algorithms for the anonymization problem where the performance is independent of the anonymity parameter k, and extends the algorithms to allow an ε fraction of points to remain unclustered, i.e., deleted from the anonymized publication.
Anonymizing Tables
TLDR
It is shown that the k-Anonymity problem is NP-hard even when the attribute values are ternary, and an O(k)-approximation algorithm is provided for the problem, which improves upon the previous best-known O( klog k)- approximation.
Entropy based nearest neighbor search in high dimensions
TLDR
The problem of finding the approximate nearest neighbor of a query point in the high dimensional space is studied, focusing on the Euclidean space, and it is shown that the <i>c</i> nearest neighbor can be computed in time and near linear space where <i*p</i><sup> ≈ 2.06/<i*c—i> becomes large.
Learning Polynomials with Neural Networks
TLDR
This paper shows that for a randomly initialized neural network with sufficiently many hidden units, the generic gradient descent algorithm learns any low degree polynomial, assuming the authors initialize the weights randomly, and shows that if they use complex-valued weights, there are no "robust local minima".
...
...