The Dagstuhl Perspectives Workshop on Performance Modeling and Prediction
@article{Ferro2018TheDP, title={The Dagstuhl Perspectives Workshop on Performance Modeling and Prediction}, author={N. Ferro and Norbert Fuhr and Gregory Grefenstette and Joseph A. Konstan and Pablo Castells and Elizabeth M. Daly and Thierry Declerck and Michael D. Ekstrand and Werner Geyer and Julio Gonzalo and Tsvi Kuflik and Krister Lind{\'e}n and Bernardo Magnini and Jian-Yun Nie and R. Perego and Bracha Shapira and Ian Soboroff and Nava Tintarev and Karin M. Verspoor and Martijn C. Willemsen and Justin Zobel}, journal={SIGIR Forum}, year={2018}, volume={52}, pages={91-101} }
This paper reports the findings of the Dagstuhl Perspectives Workshop 17442 on performance modeling and prediction in the domains of Information Retrieval, Natural language Processing and Recommender Systems. We present a framework for further research, which identifies five major problem areas: understanding measures, performance analysis, making underlying assumptions explicit, identifying application features determining performance, and the development of prediction models describing the…
Figures from this paper
16 Citations
Report on GLARE 2018: 1st Workshop on Generalization in Information Retrieval
- Computer ScienceSIGF
- 2019
This is a report on the first edition of the International Workshop on Generalization in Information Retrieval (GLARE 2018), co-located with the 27th ACM International Conference on Information and…
Using Collection Shards to Study Retrieval Performance Effect Sizes
- Computer Science, BusinessACM Trans. Inf. Syst.
- 2019
This work uses the general linear mixed model framework and presents a model that encompasses the experimental factors of system, topic, shard, and their interaction effects and discovers that the topic*shard interaction effect is a large effect almost globally across all datasets.
Towards Unified Metrics for Accuracy and Diversity for Recommender Systems
- Computer ScienceRecSys
- 2021
This work proposes a novel adaptation of a unified metric, derived from one commonly used for search system evaluation, to Recommender Systems, and shows that the metric respects the desired theoretical constraints and behaves as expected when performing offline evaluation.
The Information Retrieval Group at the University of Duisburg-Essen
- PsychologyDatenbank-Spektrum
- 2018
This document describes the IR research group at the University of Duisburg-Essen, which works on quantitative models of interactive retrieval, social media analysis, multilingual argument retrieval…
Memorization vs. Generalization : Quantifying Data Leakage in NLP Performance Evaluation
- Computer ScienceEACL
- 2021
This work identifies leakage of training data into test data on several publicly available datasets used to evaluate NLP tasks, including named entity recognition and relation extraction, and study them to assess the impact of that leakage on the model’s ability to memorize versus generalize.
Tasks as needs: reframing the paradigm of clinical natural language processing research for real-world decision support
- MedicineJ. Am. Medical Informatics Assoc.
- 2022
Abstract Electronic medical records are increasingly used to store patient information in hospitals and other clinical settings. There has been a corresponding proliferation of clinical natural…
CLEF 2019 : Overview of the Replicability and Reproducibility Tasks
- Computer Science
- 2019
The aim of CENTRE is to run both a replicability and reproducibility challenge across all the major IR evaluation campaigns and to provide the IR community with a venue where previous research results can be explored and discussed.
CENTRE@CLEF2019: Overview of the Replicability and Reproducibility Tasks
- Computer ScienceCLEF
- 2019
The aim of CENTRE is to run both a replicability and reproducibility challenge across all the major IR evaluation campaigns and to provide the IR community with a venue where previous research results can be explored and discussed.
Reproducibility and Validity in CLEF
- Computer ScienceInformation Retrieval Evaluation in a Changing World
- 2019
It is shown that CLEF has not only produced test collections that can be re-used by other researchers, but also undertaken various efforts in enabling reproducibility.
Overview of CENTRE@CLEF 2019: Sequel in the Systematic Reproducibility Realm
- Computer ScienceCLEF
- 2019
The aim of CENTRE is to run both a replicability and reproducibility challenge across all the major IR evaluation campaigns and to provide the IR community with a venue where previous research results can be explored and discussed.
References
SHOWING 1-10 OF 19 REFERENCES
Blind Men and Elephants: Six Approaches to TREC data
- Computer ScienceInformation Retrieval
- 2004
The paper reviews six recent efforts to better understand performance measurements on information retrieval (IR) systems within the framework of the Text REtrieval Conferences (TREC): analysis of…
Increasing Reproducibility in IR: Findings from the Dagstuhl Seminar on "Reproducibility of Data-Oriented Experiments in e-Science"
- Computer ScienceSIGIR Forum
- 2016
This paper discusses, summarize, and adapt the main findings of the Dagstuhl seminar to the context of IR evaluation -- both system-oriented and user-oriented -- in order to raise awareness in the community and stimulate the fields towards and increased reproducibility of the authors' experiments.
Reproducibility Challenges in Information Retrieval Evaluation
- Computer ScienceACM J. Data Inf. Qual.
- 2017
Experimental evaluation relies on the Cranfield paradigm, which makes use of experimental collections, consisting of documents, sampled from a real domain of interest; topics, representing real user information needs in that domain; and relevance judgements, determining which documents are relevant to which topics.
On per-topic variance in IR evaluation
- Computer ScienceSIGIR '12
- 2012
This work explores the notion, put forward by Cormack & Lynam and Robertson, that a document collection used for Cranfield-style experiments should be considered as a sample from some larger population of documents by simulating other samples from the same large population.
Toward an anatomy of IR system component performances
- Computer ScienceJ. Assoc. Inf. Sci. Technol.
- 2018
A methodology based on the General Linear Mixed Model (GLMM) and analysis of variance (ANOVA) is proposed to develop statistical models able to isolate system variance and component effects as well as their interaction, by relying on a grid of points containing all the combinations of the analyzed components.
A Statistical Analysis of the TREC-3 Data
- PsychologyTREC
- 1994
A statistical analysis of the TREC-3 data shows that performance differences across queries is greater than performance differences across participants runs. Generally, groups of runs which do not…
Using Replicates in Information Retrieval Evaluation
- Computer ScienceACM Trans. Inf. Syst.
- 2017
A method for more accurately estimating the main effect of the system in a typical test-collection-based evaluation of information retrieval systems, thus increasing the sensitivity of system comparisons and robust against small changes in the number of partitions used.
Rank-biased precision for measurement of retrieval effectiveness
- Computer ScienceTOIS
- 2008
A new effectiveness metric, rank-biased precision, is introduced that is derived from a simple model of user behavior, is robust if answer rankings are extended to greater depths, and allows accurate quantification of experimental uncertainty, even when only partial relevance judgments are available.
Are IR Evaluation Measures on an Interval Scale?
- Computer ScienceICTIR
- 2017
In this paper, we formally investigate whether, or not, IR evaluation measures are on an interval scale, which is needed to safely compute the basic statistics, such as mean and variance, we daily…
Expected reciprocal rank for graded relevance
- Computer ScienceCIKM
- 2009
This work presents a new editorial metric for graded relevance which overcomes this difficulty and implicitly discounts documents which are shown below very relevant documents and calls it Expected Reciprocal Rank (ERR).