How not to lie with statistics: the correct way to summarize benchmark results

@article{Fleming1986HowNT,
  title={How not to lie with statistics: the correct way to summarize benchmark results},
  author={Philip J. Fleming and John J. Wallace},
  journal={Commun. ACM},
  year={1986},
  volume={29},
  pages={218-221}
}
Using the arithmetic mean to summarize normalized benchmark results leads to mistaken conclusions that can be avoided by using the preferred method: the geometric mean. 

Tables and Topics from this paper

The geometric mean?
The sample geometric mean (SGM) introduced by Cauchy in 1821, is a measure of central tendency with many applications in the natural and social sciences including environmental monitoring, scientom...
How to assess and report the performance of a stochastic algorithm on a benchmark problem: mean or best result on a number of runs?
TLDR
This short note analyzes and refute the main argument brought in favor of this statement that reporting the best result obtained by a stochastic algorithm in a number of runs is more meaningful than reporting some central statistic. Expand
The harmonic or geometric mean: does it really matter?
TLDR
It is concluded that for the SPEC CPU2000 benchmark suite, the choice of averaging has very little influence on the relative standing of different machines, and the decision to purchase one system rather then another should not be influenced by the type of averaging used. Expand
Issues in Benchmark Metric Selection
TLDR
The case of the TPC-D metric, which used the much debated geometric mean for the single-stream test, confirms that the "real" measure for a decision-support benchmark is the arithmetic mean. Expand
Characterizing computer performance with a single number
The controversy surrounding single number performance reduction is examined and solutions are suggested through a comparison of measures.
Fast Sampling of Perfectly Uniform Satisfying Assignments
TLDR
An algorithm for perfectly uniform sampling of satisfying assignments, based on the exact model counter sharpSAT and reservoir sampling, is presented, which is faster than the state of the art by 10 to over 100,000 times. Expand
A compumetrical approach to summarize benchmark results
  • Magid Igbaria, Milton Silver
  • Computer Science
  • Proceedings of the 5th Jerusalem Conference on Information Technology, 1990. 'Next Decade in Information Technology'
  • 1990
The authors suggest a metric-based approach that emphasizes the necessity for analyzing computer performance variables and shows how to normalize the performance variables to a known machine andExpand
The precision of the arithmetic mean, geometric mean and percentiles for citation data: An experimental simulation modelling approach
  • M. Thelwall
  • Mathematics, Computer Science
  • J. Informetrics
  • 2016
TLDR
The results show that the geometric mean citation count is the most precise, closely followed by the percentage of a country's articles in the top 50% most cited articles for a field, year and document type. Expand
Assessing Probabilistic Inference by Comparing the Generalized Mean of the Model and Source Probabilities
TLDR
An approach to the assessment of probabilistic inference is described which quantifies the performance on the probability scale by plotting the reported model probabilities versus the histogram calculated source probabilities. Expand
Performance variation across benchmark suites
The performance ratio between two systems tends to vary across different benchmarks. Here we study this variation as a "signature" or "fingerprint" of the systems under consideration. ThisExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 10 REFERENCES
Re-evaluation of the RISC I
TLDR
This paper hopes to more completely evaluate the reduced Instruction Set Computer, a relatively new concept in c(mput-er architecture, by removing extraneous factors and re-evaluating the RISC I. Expand
A VLSI RISC
TLDR
The hypothesis is that by reducing the instruction set one can design a suitable VLSI architecture that uses scarce resources more effectively than a CISC, and expects this approach to reduce design time, design errors, and the execution time of individual instructions. Expand
Re-evaluation of RISC 1
  • Comput. Archit. News
  • 1984
Re-evaluation of RISC 1. Comput. Archit. News 12. 1 (Mar
  • 1984
6-21. The landmark paper formally introducing the RISC approach to computer architecture
  • Computer
  • 1982
A comprehensive textbook on functional equations
  • A comprehensive textbook on functional equations
  • 1966
Funcfional Equations
  • A comprehensive textbook on functional equations
  • 1966
Authors' Present Addresses: Philip J. Fleming, AT&T Information Systems
  • East Warrenville Road. Naperville. IL
  • 1100
Performance of Systems]: measurement techniques, performance attribufes General Terms: Measurement. Performance Additional Key Words and Phrases: benchmarking, geometric mean Received 5/65
  • Performance of Systems]: measurement techniques, performance attribufes General Terms: Measurement. Performance Additional Key Words and Phrases: benchmarking, geometric mean Received 5/65
The Foxboro Company, Foxboro. MA 02035; Electronic mail: foxvax5!jjw
  • The Foxboro Company, Foxboro. MA 02035; Electronic mail: foxvax5!jjw