# Numerically stable parallel computation of (co-)variance

@article{Schubert2018NumericallySP, title={Numerically stable parallel computation of (co-)variance}, author={Erich Schubert and M. Gertz}, journal={Proceedings of the 30th International Conference on Scientific and Statistical Database Management}, year={2018} }

With the advent of big data, we see an increasing interest in computing correlations in huge data sets with both many instances and many variables. Essential descriptive statistics such as the variance, standard deviation, covariance, and correlation can suffer from a numerical instability known as "catastrophic cancellation" that can lead to problems when naively computing these statistics with a popular textbook equation. While this instability has been discussed in the literature already 50… CONTINUE READING

#### Supplemental Presentations

#### Figures, Tables, and Topics from this paper

9 Citations

ELKI: A large open-source library for data analysis - ELKI Release 0.7.5 "Heidelberg"

- Computer Science, Mathematics
- ArXiv
- 2019

24- PDF

Fast and Eager k-Medoids Clustering: O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms.

- Computer Science, Mathematics
- 2020

- PDF

Estimation of Gaussian mixture models via tensor moments with application to online learning

- Mathematics, Computer Science
- Pattern Recognit. Lett.
- 2020

Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms

- Computer Science, Mathematics
- SISAP
- 2019

39- PDF

Operon C++: an efficient genetic programming framework for symbolic regression

- Computer Science
- GECCO Companion
- 2020

#### References

SHOWING 1-7 OF 7 REFERENCES

Note on a Method for Calculating Corrected Sums of Squares and Products

- Mathematics
- 1962

424 Highly Influential

Stably updating mean and standard deviation of data

- Mathematics, Computer Science
- CACM
- 1975

22 Highly Influential

Letters to the editor: Dealing with Neely's algorithms

- Computer Science
- Commun. ACM
- 1968

9 Highly Influential

Comparison of several algorithms for computation of means, standard deviations and correlation coefficients

- Computer Science
- CACM
- 1966

30 Highly Influential

Some Results Relevant to Choice of Sum and Sum-of-Product Algorithms

- Computer Science
- 1971

41 Highly Influential

Updating mean and variance estimates: an improved method

- Mathematics, Computer Science
- CACM
- 1979

106 Highly Influential- PDF

Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs

- Technical Report. Copenhagen University College of Engineering
- 2017