Evaluation-as-a-Service for the Computational Sciences

@article{Hopfgartner2018EvaluationasaServiceFT,
  title={Evaluation-as-a-Service for the Computational Sciences},
  author={Frank Hopfgartner and Allan Hanbury and Henning M{\"u}ller and Ivan Eggel and Krisztian Balog and Torben Brodt and Gordon V. Cormack and Jimmy J. Lin and Jayashree Kalpathy-Cramer and N. Kando and Makoto P. Kato and Anastasia Krithara and Tim Gollub and Martin Potthast and Evelyne Viegas and Simon Mercer},
  journal={Journal of Data and Information Quality (JDIQ)},
  year={2018},
  volume={10},
  pages={1 - 32}
}
Evaluation in empirical computer science is essential to show progress and assess technologies developed. Several research domains such as information retrieval have long relied on systematic evaluation to measure progress: here, the Cranfield paradigm of creating shared test collections, defining search tasks, and collecting ground truth for these tasks has persisted up until now. In recent years, however, several new challenges have emerged that do not fit this paradigm very well: extremely… 

Tables from this paper

repro_eval: A Python Interface to Reproducibility Measures of System-Oriented IR Experiments

This work introduces repro eval - a tool for reactive reproducibility studies of system-oriented Information Retrieval (IR) experiments and develops an easily extensible interface to stimulate common practices when conducting a reproducecibility study ofSystem-oriented IR experiments.