How not to lie with statistics: the correct way to summarize benchmark results

@article{Fleming1986HowNT,
  title={How not to lie with statistics: the correct way to summarize benchmark results},
  author={Philip J. Fleming and John J. Wallace},
  journal={Commun. ACM},
  year={1986},
  volume={29},
  pages={218-221}
}
Using the arithmetic mean to summarize normalized benchmark results leads to mistaken conclusions that can be avoided by using the preferred method: the geometric mean. 

Tables from this paper

How to assess and report the performance of a stochastic algorithm on a benchmark problem: mean or best result on a number of runs?
TLDR
This short note analyzes and refute the main argument brought in favor of this statement that reporting the best result obtained by a stochastic algorithm in a number of runs is more meaningful than reporting some central statistic.
How Not to Lie with Statistics: Avoiding Common Mistakes in Quantitative Political Science
This article identifies a set of serious theoretical mistakes appearing with troublingly high frequency throughout the quantitative political science literature. These mistakes are all based on
Issues in Benchmark Metric Selection
TLDR
The case of the TPC-D metric, which used the much debated geometric mean for the single-stream test, confirms that the "real" measure for a decision-support benchmark is the arithmetic mean.
Characterizing computer performance with a single number
The controversy surrounding single number performance reduction is examined and solutions are suggested through a comparison of measures.
Fast Sampling of Perfectly Uniform Satisfying Assignments
TLDR
An algorithm for perfectly uniform sampling of satisfying assignments, based on the exact model counter sharpSAT and reservoir sampling, is presented, which is faster than the state of the art by 10 to over 100,000 times.
A compumetrical approach to summarize benchmark results
  • M. IgbariaMilton Silver
  • Computer Science
    Proceedings of the 5th Jerusalem Conference on Information Technology, 1990. 'Next Decade in Information Technology'
  • 1990
The authors suggest a metric-based approach that emphasizes the necessity for analyzing computer performance variables and shows how to normalize the performance variables to a known machine and
Performance variation across benchmark suites
The performance ratio between two systems tends to vary across different benchmarks. Here we study this variation as a "signature" or "fingerprint" of the systems under consideration. This
War of the benchmark means: time for a truce
TLDR
The geometric mean (GM) predicts the mean relative performance of programs, not of workloads, and is the back-transformed average of a lognormal distribution, as can be seen by the mathematical identity below.
...
...

References

SHOWING 1-10 OF 10 REFERENCES
Re-evaluation of the RISC I
TLDR
This paper hopes to more completely evaluate the reduced Instruction Set Computer, a relatively new concept in c(mput-er architecture, by removing extraneous factors and re-evaluating the RISC I.
A VLSI RISC
TLDR
The hypothesis is that by reducing the instruction set one can design a suitable VLSI architecture that uses scarce resources more effectively than a CISC, and expects this approach to reduce design time, design errors, and the execution time of individual instructions.
Re-evaluation of RISC 1. Comput. Archit. News 12. 1 (Mar
  • 1984
Authors' Present Addresses: Philip J. Fleming, AT&T Information Systems
  • East Warrenville Road. Naperville. IL
  • 1100
Re-evaluation of RISC 1
  • Comput. Archit. News
  • 1984
Performance of Systems]: measurement techniques, performance attribufes General Terms: Measurement. Performance Additional Key Words and Phrases: benchmarking, geometric mean Received 5/65
  • Performance of Systems]: measurement techniques, performance attribufes General Terms: Measurement. Performance Additional Key Words and Phrases: benchmarking, geometric mean Received 5/65
A comprehensive textbook on functional equations
  • A comprehensive textbook on functional equations
  • 1966
6-21. The landmark paper formally introducing the RISC approach to computer architecture
  • Computer
  • 1982
The Foxboro Company, Foxboro. MA 02035; Electronic mail: foxvax5!jjw
  • The Foxboro Company, Foxboro. MA 02035; Electronic mail: foxvax5!jjw
Funcfional Equations
  • A comprehensive textbook on functional equations
  • 1966