The (black) art of runtime evaluation: Are we comparing algorithms or implementations?

@article{Kriegel2016TheA,
  title={The (black) art of runtime evaluation: Are we comparing algorithms or implementations?},
  author={Hans-Peter Kriegel and Erich Schubert and Arthur Zimek},
  journal={Knowledge and Information Systems},
  year={2016},
  volume={52},
  pages={341-378}
}
Any paper proposing a new algorithm should come with an evaluation of efficiency and scalability (particularly when we are designing methods for “big data”). However, there are several (more or less serious) pitfalls in such evaluations. We would like to point the attention of the community to these pitfalls. We substantiate our points with extensive experiments, using clustering and outlier detection methods with and without index acceleration. We discuss what we can learn from evaluations… 

DBSCAN Revisited, Revisited

TLDR
In new experiments, it is shown that the new SIGMOD 2015 methods do not appear to offer practical benefits if the DBSCAN parameters are well chosen and thus they are primarily of theoretical interest.

Realization of Random Forest for Real-Time Evaluation through Tree Framing

TLDR
This paper introduces a method that optimizes the execution of Decision Trees (DT), a probabilistic view of decision tree execution, and presents a theoretically well-founded memory layout which maximizes locality during execution in both cases.

The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search

TLDR
Different visualization concepts are introduced that allow to get a more fine-grained overview of the inner workings of nearest neighbor search principles: results on a single dataset predict results on all other datasets well.

Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework

TLDR
This paper re-implemented and evaluated 21 models in the PyKEEN software package, and performed a large-scale benchmarking on four datasets, providing evidence that several architectures can obtain results competitive to the state of the art when configured carefully.

Efficient Algorithms For Fair Clustering with a New Fairness Notion

TLDR
A new notion of fairness is proposed, which is called τ -ratio fairness, that strictly generalizes the Balance property and enables a fine-grained efficiency vs. fairness trade-off.

Numerically stable parallel computation of (co-)variance

TLDR
This paper studies a popular incremental technique originally proposed by Welford, which is extended to weighted covariance and correlation and showcases applications from the classic computation of variance as well as advanced applications such as stock market analysis with exponentially weighted moving models and Gaussian mixture modeling for cluster analysis that all benefit from this approach.

Redundancies in Data and their Effect on the Evaluation of Recommendation Systems: A Case Study on the Amazon Reviews Datasets

TLDR
A collection of datasets crawled from Amazon, “Amazon reviews”, is popular in the evaluation of recommendation systems, however, it is observed that these datasets contain redundancies and their impact depends on the complexity of the methods.

Benchmarking Nearest Neighbor Search: Influence of Local Intrinsic Dimensionality and Result Diversity in Real-World Datasets

TLDR
Different visualization concepts are introduced that allow to get a more fine-grained overview of the inner workings of nearest neighbor search principles: results on a single dataset predict results on all other datasets well.

Statistically Rigorous Testing of Clustering Implementations

TLDR
This work conducts statistical hypothesis testing on the outcome of differential clustering to reveal problematic outcomes and indicates that there are statistically significant differences in clustering outcomes in a variety of scenarios where users might not expect clustering outcome variation.

Similarity Search and Applications

TLDR
This paper analyzes the problem of understanding how the strategy for searching through an index tree, also called scheduling policy, can influence costs, and characterize the policies’ behavior through an analytical cost model, in which a major role is played by parameterized local distance distributions.
...

References

SHOWING 1-10 OF 100 REFERENCES

Frequent Subgraph Miners : Runtimes Don ’ t Say Everything

TLDR
This paper presents results of an additional experimental comparison of several graph miners, which differs in the following aspects from this previous study: (1) they compare original implementations; (2) these implementations are compared on a larger set of measures than runtimes, thus providing further insight in the benefits of the algorithms.

Frequent subgraph miners: runtimes don't say everything

TLDR
This paper presents results of an additional experimental comparison of several graph miners, which differs in the following aspects from this previous study: (1) they compare original implementations; (2) these implementations are compared on a larger set of measures than runtimes, thus providing further insight in the benefits of the algorithms.

Making k-means Even Faster

TLDR
This paper proposes a new acceleration for exact k-means that gives the same answer, but is much faster in practice, and uses one novel lower bound for point-center distances.

Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection

TLDR
A formalized method of analysis is provided to allow for a theoretical comparison and generalization of many existing methods and improves understanding of the shared properties and of the differences of outlier detection models.

An Experimental Analysis of Iterated Spatial Joins in Main Memory

TLDR
Surprisingly, it is found that when queries and updates can be batched, repeatedly re-computing the join result from scratch outperforms using a moving object index in all but the most extreme cases.

A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston

TLDR
This paper has re-implemented the subgraph miners MoFa, gSpan, FFSM, and Gaston within a common code base and with the same level of programming expertise and optimization effort.

A fast APRIORI implementation

TLDR
It is shown that the effect of implementation can be more important than the selection of the algorithm, and an implementation of APRIORI is described that outperforms all implementations known to us.

STR: a simple and efficient algorithm for R-tree packing

Presents the results from an extensive comparison study of three R-tree packing algorithms: the Hilbert and nearest-X packing algorithms, and an algorithm which is very simple to implement, called

ELKI: A Software System for Evaluation of Subspace Clustering Algorithms

TLDR
A software framework implementing many prominent algorithms and, thus, allowing for a fair and thorough evaluation of newly proposed algorithms is proposed for the prolific field of subspace clustering.

Spatial Joins in Main Memory: Implementation Matters!

TLDR
This study demonstrates that in main memory, where no time-consuming I/O can mask variations in implementation, implementation details are very important; and it offers a concrete illustration of how it is difficult to make conclusions from empirical running time performance findings in main-memory settings about data structures and algorithms studied.
...