Numerically stable, scalable formulas for parallel and online computation of higher-order multivariate central moments with arbitrary weights

@article{Pbay2016NumericallySS,
  title={Numerically stable, scalable formulas for parallel and online computation of higher-order multivariate central moments with arbitrary weights},
  author={P. P{\'e}bay and Timothy B. Terriberry and H. Kolla and Janine Bennett},
  journal={Computational Statistics},
  year={2016},
  volume={31},
  pages={1305-1325}
}
Formulas for incremental or parallel computation of second order central moments have long been known, and recent extensions of these formulas to univariate and multivariate moments of arbitrary order have been developed. Such formulas are of key importance in scenarios where incremental results are required and in parallel and distributed systems where communication costs are high. We survey these recent results, and improve them with arbitrary-order, numerically stable one-pass formulas which… Expand
An Empirical Study of Moment Estimators for Quantile Approximation
TLDR
This work empirically evaluates lightweight moment estimators for the single-pass quantile approximation problem, including maximum entropy methods and orthogonal series with Fourier, Cosine, Legendre, Chebyshev and Hermite basis functions, and provides an algorithm for GPU-accelerated quantiles approximation based on parallel tree reduction. Expand
Sequential Estimation of Nonparametric Correlation using Hermite Series Estimators.
TLDR
A new Hermite series based sequential estimator for the Spearman's rank correlation coefficient is described and an exponentially weighted estimator is introduced for the non-stationary setting, which allows the local nonparametric correlation of a bivariate data stream to be tracked. Expand
Algorithm for error-free determination of the variance of all contiguous subsequences and fixed-length contiguous subsequences for a sequence of industrial measurement data
The article presents an algorithm for fast and error-free determination of statistics such as the arithmetic mean and variance of all contiguous subsequences and fixed-length contiguous subsequencesExpand
Calculation of second order statistics of uncertain linear systems applying reduced order models
TLDR
Numerical examples comprising stochastic finite element models suggest that the proposed approach can produce estimates of the second order statistics with reduced variability. Expand
Multilevel modeling for data streams with dependent observations
TLDR
This dissertation introduces online learning, a method to update the result of an analysis while the data are entering, without revisiting the previous data points, and develops an online-learning algorithm that updates the multilevel model, while new data enter and without passing over all the data repeatedly. Expand
A Novel Shard-Based Approach for Asynchronous Many-Task Models for In Situ Analysis
TLDR
The goal of this article is to describe the SPMD-Legion methodology, and compare the data aggregation technique deployed herein to the approach taken within the authors' previous work. Expand
Anomaly detection in scientific data using joint statistical moments
TLDR
An anomaly detection method for multi-variate scientific data based on analysis of high-order joint moments is proposed and an algorithm to identify the occurrence of a spatial and/or temporal anomalous event in scientific phenomena is developed. Expand
Online estimation of individual-level effects using streaming shrinkage factors
TLDR
Five computationally-efficient estimation methods which do not revise “old” data but do account for the nested data structure are developed and evaluated and differ in accuracy between the novel shrinkage factors and the existing methods. Expand
Online Anomaly Detection Leveraging Stream-Based Clustering and Real-Time Telemetry
TLDR
This work implements an anomaly detection engine that leverages DenStream, an unsupervised clustering technique, and applies it to features collected from a large-scale testbed comprising tens of routers traversed up to 3Terabit/s worth of real application traffic, and results testify that DenStream achieves detection results on par with RRCF, the best performing algorithm. Expand
HFM: Hierarchical Feature Moment Extraction for Multi-Omic Data Visualization
  • T. Becker, D. Shin
  • Computer Science
  • 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
  • 2019
TLDR
This work integrates a parallel out-of-core feature extraction algorithm with a disk-based hierarchical data store that provides several orders of magnitude speed-up for common analysis and visualization tasks and detail the open-source Cython/Python based implementation as well as the prototype web-based visualization tool. Expand
...
1
2
...

References

SHOWING 1-10 OF 44 REFERENCES
Numerically stable, single-pass, parallel statistics algorithms
TLDR
This paper derives a series of formulas that allow for single-pass, yet numerically robust, pairwise parallel and incremental updates of both arbitrary-order centered statistical moments and co-moments and builds an open source parallel statistics framework that performs principal component analysis (PCA) in addition to computing descriptive, correlative, and multi-correlative statistics. Expand
Computing Contingency Statistics in Parallel: Design Trade-Offs and Limiting Cases
TLDR
This paper presents the design trade-offs which were made to implement the computation of contingency tables in parallel and observes optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse. Expand
Computing contingency statistics in parallel.
TLDR
The design trade-offs which were made to implement the computation of contingency tables in parallel are presented and optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse. Expand
Updating formulae and a pairwise algorithm for computing sample variances
A general formula is presented for computing the simple variance for a sample of size m + n given the means and variances for two subsamples of sizes m and n. This formula is used in the constructionExpand
Design and Performance of a Scalable, Parallel Statistics Toolkit
TLDR
This paper surveys a collection of parallel implementations of statistics algorithm developed as part of a common framework over the last 3 years and employs a design pattern specifically targeted for distributed-memory parallelism, where architectural advances in large-scale high-performance computing have been focused. Expand
Accurate Sum and Dot Product
Algorithms for summation and dot product of floating-point numbers are presented which are fast in terms of measured computing time. We show that the computed results are as accurate as if computedExpand
How to Ensure a Faithful Polynomial Evaluation with the Compensated Horner Algorithm
TLDR
This work addresses here how to compute a faithfully rounded result, that is one of the two floating point neighbors of the exact evaluation, and proposes an a priori sufficient condition on the condition number to ensure that the compensated evaluation is faithfully rounded. Expand
Recursive estimation of fourth-order cumulants with application to identification
TLDR
A recursive formula for estimating the fourth-order cumulants of a real- or complex-valued, zero mean, stationary scalar stochastic process is developed by using the ergodicity assumption. Expand
Note on a Method for Calculating Corrected Sums of Squares and Products
In many problems the "corrected sum of squares" of a set of values must be calculated i.e. the sum of squares of the deviations of the values about their mean. The most usual way is to calculate theExpand
State of the Art in Parallel Computing with R
TLDR
An overview of techniques for parallel computing with R on computer clusters, on multi-core systems, and in grid computing is presented, comparing sixteen different packages, comparing them on their state of development, the parallel technology used, as well as on usability, acceptance, and performance. Expand
...
1
2
3
4
5
...