# Numerically stable, scalable formulas for parallel and online computation of higher-order multivariate central moments with arbitrary weights

@article{Pbay2016NumericallySS, title={Numerically stable, scalable formulas for parallel and online computation of higher-order multivariate central moments with arbitrary weights}, author={P. P{\'e}bay and Timothy B. Terriberry and H. Kolla and Janine Bennett}, journal={Computational Statistics}, year={2016}, volume={31}, pages={1305-1325} }

Formulas for incremental or parallel computation of second order central moments have long been known, and recent extensions of these formulas to univariate and multivariate moments of arbitrary order have been developed. Such formulas are of key importance in scenarios where incremental results are required and in parallel and distributed systems where communication costs are high. We survey these recent results, and improve them with arbitrary-order, numerically stable one-pass formulas which… Expand

#### 13 Citations

An Empirical Study of Moment Estimators for Quantile Approximation

- Computer Science
- ACM Trans. Database Syst.
- 2021

This work empirically evaluates lightweight moment estimators for the single-pass quantile approximation problem, including maximum entropy methods and orthogonal series with Fourier, Cosine, Legendre, Chebyshev and Hermite basis functions, and provides an algorithm for GPU-accelerated quantiles approximation based on parallel tree reduction. Expand

Sequential Estimation of Nonparametric Correlation using Hermite Series Estimators.

- Computer Science
- 2020

A new Hermite series based sequential estimator for the Spearman's rank correlation coefficient is described and an exponentially weighted estimator is introduced for the non-stationary setting, which allows the local nonparametric correlation of a bivariate data stream to be tracked. Expand

Algorithm for error-free determination of the variance of all contiguous subsequences and fixed-length contiguous subsequences for a sequence of industrial measurement data

- Computer Science
- 2021

The article presents an algorithm for fast and error-free determination of statistics such as the arithmetic mean and variance of all contiguous subsequences and fixed-length contiguous subsequences… Expand

Calculation of second order statistics of uncertain linear systems applying reduced order models

- Computer Science, Mathematics
- Reliab. Eng. Syst. Saf.
- 2019

Numerical examples comprising stochastic finite element models suggest that the proposed approach can produce estimates of the second order statistics with reduced variability. Expand

Multilevel modeling for data streams with dependent observations

- Computer Science
- 2017

This dissertation introduces online learning, a method to update the result of an analysis while the data are entering, without revisiting the previous data points, and develops an online-learning algorithm that updates the multilevel model, while new data enter and without passing over all the data repeatedly. Expand

A Novel Shard-Based Approach for Asynchronous Many-Task Models for In Situ Analysis

- Computer Science
- ISAV@SC
- 2017

The goal of this article is to describe the SPMD-Legion methodology, and compare the data aggregation technique deployed herein to the approach taken within the authors' previous work. Expand

Anomaly detection in scientific data using joint statistical moments

- Computer Science, Mathematics
- J. Comput. Phys.
- 2019

An anomaly detection method for multi-variate scientific data based on analysis of high-order joint moments is proposed and an algorithm to identify the occurrence of a spatial and/or temporal anomalous event in scientific phenomena is developed. Expand

Online estimation of individual-level effects using streaming shrinkage factors

- Computer Science
- Comput. Stat. Data Anal.
- 2019

Five computationally-efficient estimation methods which do not revise “old” data but do account for the nested data structure are developed and evaluated and differ in accuracy between the novel shrinkage factors and the existing methods. Expand

Online Anomaly Detection Leveraging Stream-Based Clustering and Real-Time Telemetry

- Computer Science
- IEEE Transactions on Network and Service Management
- 2021

This work implements an anomaly detection engine that leverages DenStream, an unsupervised clustering technique, and applies it to features collected from a large-scale testbed comprising tens of routers traversed up to 3Terabit/s worth of real application traffic, and results testify that DenStream achieves detection results on par with RRCF, the best performing algorithm. Expand

HFM: Hierarchical Feature Moment Extraction for Multi-Omic Data Visualization

- Computer Science
- 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
- 2019

This work integrates a parallel out-of-core feature extraction algorithm with a disk-based hierarchical data store that provides several orders of magnitude speed-up for common analysis and visualization tasks and detail the open-source Cython/Python based implementation as well as the prototype web-based visualization tool. Expand

#### References

SHOWING 1-10 OF 44 REFERENCES

Numerically stable, single-pass, parallel statistics algorithms

- Computer Science
- 2009 IEEE International Conference on Cluster Computing and Workshops
- 2009

This paper derives a series of formulas that allow for single-pass, yet numerically robust, pairwise parallel and incremental updates of both arbitrary-order centered statistical moments and co-moments and builds an open source parallel statistics framework that performs principal component analysis (PCA) in addition to computing descriptive, correlative, and multi-correlative statistics. Expand

Computing Contingency Statistics in Parallel: Design Trade-Offs and Limiting Cases

- Computer Science
- 2010 IEEE International Conference on Cluster Computing
- 2010

This paper presents the design trade-offs which were made to implement the computation of contingency tables in parallel and observes optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse. Expand

Computing contingency statistics in parallel.

- Computer Science
- 2010

The design trade-offs which were made to implement the computation of contingency tables in parallel are presented and optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse. Expand

Updating formulae and a pairwise algorithm for computing sample variances

- Mathematics
- 1979

A general formula is presented for computing the simple variance for a sample of size m + n given the means and variances for two subsamples of sizes m and n. This formula is used in the construction… Expand

Design and Performance of a Scalable, Parallel Statistics Toolkit

- Computer Science
- 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum
- 2011

This paper surveys a collection of parallel implementations of statistics algorithm developed as part of a common framework over the last 3 years and employs a design pattern specifically targeted for distributed-memory parallelism, where architectural advances in large-scale high-performance computing have been focused. Expand

Accurate Sum and Dot Product

- Mathematics, Computer Science
- SIAM J. Sci. Comput.
- 2005

Algorithms for summation and dot product of floating-point numbers are presented which are fast in terms of measured computing time. We show that the computed results are as accurate as if computed… Expand

How to Ensure a Faithful Polynomial Evaluation with the Compensated Horner Algorithm

- Computer Science
- 18th IEEE Symposium on Computer Arithmetic (ARITH '07)
- 2007

This work addresses here how to compute a faithfully rounded result, that is one of the two floating point neighbors of the exact evaluation, and proposes an a priori sufficient condition on the condition number to ensure that the compensated evaluation is faithfully rounded. Expand

Recursive estimation of fourth-order cumulants with application to identification

- Mathematics, Computer Science
- Signal Process.
- 1998

A recursive formula for estimating the fourth-order cumulants of a real- or complex-valued, zero mean, stationary scalar stochastic process is developed by using the ergodicity assumption. Expand

Note on a Method for Calculating Corrected Sums of Squares and Products

- Mathematics
- 1962

In many problems the "corrected sum of squares" of a set of values must be calculated i.e. the sum of squares of the deviations of the values about their mean. The most usual way is to calculate the… Expand

State of the Art in Parallel Computing with R

- Computer Science
- 2009

An overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing is presented, comparing sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Expand