Numerically stable parallel computation of (co-)variance
@article{Schubert2018NumericallySP, title={Numerically stable parallel computation of (co-)variance}, author={Erich Schubert and M. Gertz}, journal={Proceedings of the 30th International Conference on Scientific and Statistical Database Management}, year={2018} }
With the advent of big data, we see an increasing interest in computing correlations in huge data sets with both many instances and many variables. Essential descriptive statistics such as the variance, standard deviation, covariance, and correlation can suffer from a numerical instability known as "catastrophic cancellation" that can lead to problems when naively computing these statistics with a popular textbook equation. While this instability has been discussed in the literature already 50… CONTINUE READING
Supplemental Presentations
Figures, Tables, and Topics from this paper
9 Citations
ELKI: A large open-source library for data analysis - ELKI Release 0.7.5 "Heidelberg"
- Computer Science, Mathematics
- ArXiv
- 2019
- 24
- PDF
Fast and Eager k-Medoids Clustering: O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms.
- Computer Science, Mathematics
- 2020
- PDF
Estimation of Gaussian mixture models via tensor moments with application to online learning
- Mathematics, Computer Science
- Pattern Recognit. Lett.
- 2020
Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms
- Computer Science, Mathematics
- SISAP
- 2019
- 39
- PDF
Operon C++: an efficient genetic programming framework for symbolic regression
- Computer Science
- GECCO Companion
- 2020
References
SHOWING 1-7 OF 7 REFERENCES
Note on a Method for Calculating Corrected Sums of Squares and Products
- Mathematics
- 1962
- 424
- Highly Influential
Stably updating mean and standard deviation of data
- Mathematics, Computer Science
- CACM
- 1975
- 22
- Highly Influential
Letters to the editor: Dealing with Neely's algorithms
- Computer Science
- Commun. ACM
- 1968
- 9
- Highly Influential
Comparison of several algorithms for computation of means, standard deviations and correlation coefficients
- Computer Science
- CACM
- 1966
- 30
- Highly Influential
Some Results Relevant to Choice of Sum and Sum-of-Product Algorithms
- Computer Science
- 1971
- 41
- Highly Influential
Updating mean and variance estimates: an improved method
- Mathematics, Computer Science
- CACM
- 1979
- 106
- Highly Influential
- PDF
Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs
- Technical Report. Copenhagen University College of Engineering
- 2017