• Corpus ID: 220055683

STORM: Foundations of End-to-End Empirical Risk Minimization on the Edge

  title={STORM: Foundations of End-to-End Empirical Risk Minimization on the Edge},
  author={Benjamin Coleman and Gaurav Gupta and John Chen and Anshumali Shrivastava},
Empirical risk minimization is perhaps the most influential idea in statistical learning, with applications to nearly all scientific and technical domains in the form of regression and classification models. To analyze massive streaming datasets in distributed computing environments, practitioners increasingly prefer to deploy regression models on edge rather than in the cloud. By keeping data on edge devices, we minimize the energy, communication, and data security risk associated with the… 

Figures and Tables from this paper

A One-Pass Distributed and Private Sketch for Kernel Sums with Applications to Machine Learning at Scale

This work proposes a general purpose private sketch, or small summary of the dataset, that supports machine learning tasks such as regression, classification, density estimation, and more and is ideal for large-scale distributed settings because it is simple to implement, mergeable, and can be created with a one-pass streaming algorithm.

Fast Rotation Kernel Density Estimation over Data Streams

A novel Rotation Kernel is proposed, based on a Rotation Hash method and is much faster to compute, which compresses high dimensional data streams into a small array of integer counters and achieves memory-efficient kernel density estimation over data streams.



Asymptotics for Sketching in Least Squares Regression

The limits of the accuracy loss (for estimation and test error) incurred by popular sketching methods are found, and separation between different methods is shown, so that SRHT is better than Gaussian projections.

Convexity, Classification, and Risk Bounds

A general quantitative relationship between the risk as assessed using the 0–1 loss and the riskAs assessed using any nonnegative surrogate loss function is provided, and it is shown that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function.

Sub-linear Memory Sketches for Near Neighbor Search on Streaming Data

This work presents the first sublinear memory sketch that can be queried to find the nearest neighbors in a dataset, and its sketch, which consists entirely of short integer arrays, has a variety of attractive features in practice.

Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area

JetStream is presented, a system that allows real-time analysis of large, widely-distributed changing data sets, and its adaptive control mechanisms are responsive enough to keep end-to-end latency within a few seconds, even when available bandwidth drops by a factor of two.

Privacy via the Johnson-Lindenstrauss Transform

This work shows that distance computations with privacy is an achievable goal by projecting each user's representation into a random, lower-dimensional space via a sparse Johnson-Lindenstrauss transform and then adding Gaussian noise to each entry of the lower- dimensional representation.

Arrays of (locality-sensitive) Count Estimators (ACE): High-Speed Anomaly Detection via Cache Lookups

This paper proposes ACE (Arrays of (locality-sensitive) Count Estimators) algorithm that can be 60x faster than the ELKI package~\cite{DBLP:conf/ssd/AchtertBKSZ09}, which has the fastest implementation of the unsupervised anomaly detection algorithms.

Online Row Sampling

This work presents an extremely simple algorithm that approximates A up to multiplicative error $\epsilon$ and additive error $\delta$ using O(d \log d \log(\epSilon||A||_2/\delta)/\ep silon^2)$ online samples, with memory overhead proportional to the cost of storing the spectral approximation.

Numerical linear algebra in the streaming model

Near-optimal space bounds are given in the streaming model for linear algebra problems that include estimation of matrix products, linear regression, low-rank approximation, and approximation of matrix rank; results for turnstile updates are proved.

More Data Can Hurt for Linear Regression: Sample-wise Double Descent

A surprising phenomenon in overparameterized linear regression, where the dimension exceeds the number of samples is described: there is a regime where the test risk of the estimator found by gradient descent increases with additional samples, due to an unconventional type of bias-variance tradeoff in the over parameterized regime.

Similarity Search in High Dimensions via Hashing

Experimental results indicate that the novel scheme for approximate similarity search based on hashing scales well even for a relatively large number of dimensions, and provides experimental evidence that the method gives improvement in running time over other methods for searching in highdimensional spaces based on hierarchical tree decomposition.