Numerically stable parallel computation of (co-)variance

@article{Schubert2018NumericallySP,
  title={Numerically stable parallel computation of (co-)variance},
  author={Erich Schubert and M. Gertz},
  journal={Proceedings of the 30th International Conference on Scientific and Statistical Database Management},
  year={2018}
}
  • Erich Schubert, M. Gertz
  • Published 2018
  • Computer Science
  • Proceedings of the 30th International Conference on Scientific and Statistical Database Management
  • With the advent of big data, we see an increasing interest in computing correlations in huge data sets with both many instances and many variables. Essential descriptive statistics such as the variance, standard deviation, covariance, and correlation can suffer from a numerical instability known as "catastrophic cancellation" that can lead to problems when naively computing these statistics with a popular textbook equation. While this instability has been discussed in the literature already 50… CONTINUE READING
    9 Citations
    BETULA: Numerically Stable CF-Trees for BIRCH Clustering
    • 1
    • PDF
    Improving the PAM, CLARA, and CLARANS Algorithms
    Fast and Eager k-Medoids Clustering: O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms.
    • PDF
    Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms
    • 39
    • PDF
    Operon C++: an efficient genetic programming framework for symbolic regression
    Mapping platforms into a new open science model for machine learning

    References

    SHOWING 1-7 OF 7 REFERENCES
    Note on a Method for Calculating Corrected Sums of Squares and Products
    • 424
    • Highly Influential
    Stably updating mean and standard deviation of data
    • R. Hanson
    • Mathematics, Computer Science
    • CACM
    • 1975
    • 22
    • Highly Influential
    Letters to the editor: Dealing with Neely's algorithms
    • 9
    • Highly Influential
    Comparison of several algorithms for computation of means, standard deviations and correlation coefficients
    • 30
    • Highly Influential
    Some Results Relevant to Choice of Sum and Sum-of-Product Algorithms
    • 41
    • Highly Influential
    Updating mean and variance estimates: an improved method
    • 106
    • Highly Influential
    • PDF
    Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs
    • Technical Report. Copenhagen University College of Engineering
    • 2017