• Corpus ID: 16099133

Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Music Domain

@inproceedings{Urbano2011InformationRM,
  title={Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Music Domain},
  author={Juli{\'a}n Urbano},
  booktitle={ISMIR},
  year={2011}
}
The Music Information Retrieval field has acknowledged the need for rigorous scientific evaluations for so me time now. Several efforts were set out to develop and pr ovide the necessary infrastructure, technology and methodologies to carry out these evaluations, out of which the annua l Music Information Retrieval Evaluation eXchange emerged. The community as a whole has enormously gained from this evaluation forum, but very little attention has bee n paid to reliability and correctness issues… 

Tables from this paper

Towards minimal test collections for evaluation of audio music similarity and retrieval

This paper shows a first approach towards the application of Minimal Test Collection algorithms to the evaluation of the Audio Music Similarity and Retrieval task, run by the annual MIREX evaluation campaign.

The need for music information retrieval with user-centered and multimodal strategies

It is argued that Music-IR approaches with multimodal and user-centered strategies are necessary to serve real-life usage patterns and maintain and improve accessibility of digital music data.

How Significant is Statistically Significant? The case of Audio Music Similarity and Retrieval

It is shown that indicators of statistical significance are eventually of secondary importance and researchers who want to predict the realworld implications of formal evaluations should properly report upon practical significance (i.e., large effect-size) rather than reaching statistical significance in the evaluation results.

Minimal test collections for low-cost evaluation of Audio Music Similarity and Retrieval systems

This paper shows the application of Minimal Test Collections to the evaluation of the Audio Music Similarity and Retrieval task, run by the annual MIREX evaluation campaign, and presents a method to rank systems without making any annotations.

Music Information Retrieval: An Inspirational Guide to Transfer from Related Disciplines

This contribution starts with concrete examples for methodology transfer between speech and music processing, oriented on the building blocks of pattern recognition: preprocessing, feature extraction, and classification/decoding, and assumes a higher level viewpoint when describing sources of mutual inspiration derived from text and image information retrieval.

The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval

A case study of all published research using the most-used benchmark dataset in MGR during the past decade shows that none of the evaluations in these many works is valid to produce conclusions with respect to recognizing genre, i.e. that a system is using criteria relevant for recognizing genre.

Audio Music Similarity and Retrieval: Evaluation Power and Stability

The reliability of the res ults in the evaluation of Audio Music Similarity and Retrieval systems is analyzed and it is concluded that experimenters can be very confident that if a significant difference is found between systems, the difference is indeed real.

Overview of EIREX 2012: Social Media

This overview paper summarizes the results of the EIREX 2012 track, focusing on the creation of the test collection and the analysis to assess its reliability.

Overview of EIREX 2011: Crowdsourcing

This overview paper summarizes the results of the EIREX 2011 track, focusing on the creation of the test collection and the analysis to assess its reliability.

Overview of EIREX 2010: Computing

This overview paper summarizes the results of the EIREX 2010 track, focusing on the creation of the test collection and the analysis to assess its reliability.

References

SHOWING 1-10 OF 86 REFERENCES

The Music Information Retrieval Evaluation eXchange: Some Observations and Insights

This chapter outlines some of the major highlights of the past four years of MIREX evaluations, including its organizing principles, the selection of evaluation metrics, and the evolution of evaluation tasks.

Whither Music IR Evaluation Infrastructure : Lessons to be Learned from TREC

The processes used in the Text REtrieval Conference (TREC) evaluations to create information retrieval evaluation infrastructure are described, assessments of how appropriate the evaluation methodology is for TREC tasks are reviewed, and suggestions are made regarding the development of an MIR/MDL evaluation framework based on TREC experience.

The Scientific Evaluation of Music Information Retrieval Systems: Foundations and Future

This article provides an overview of the current scientific problem facing MIR research and reports upon the findings of the Music Information Retrieval (MIR)/ Music Digital Library (MDL) Evaluation Frameworks Project.

Improving the Generation of Ground Truths Based on Partially Ordered Lists

It is shown that it is not possible to ensure lists completely consistent, and a measure of consistency based on Average Dynamic Recall is developed and several alternatives to arrange the lists prove to be more consistent than the original method.

Cumulated gain-based evaluation of IR techniques

This article proposes several novel measures that compute the cumulative gain the user obtains by examining the retrieval result up to a given ranked position, and test results indicate that the proposed measures credit IR methods for their ability to retrieve highly relevant documents and allow testing of statistical significance of effectiveness differences.

On information retrieval metrics designed for evaluation with incomplete relevance assessments

This article compares the robustness of IR metrics to incomplete relevance assessments, using four different sets of graded-relevance test collections with submitted runs—the TREC 2003 and 2004 robust track data and the NTCIR-6 Japanese and Chinese IR data from the crosslingual task.

Audio Music Similarity and Retrieval: Evaluation Power and Stability

The reliability of the res ults in the evaluation of Audio Music Similarity and Retrieval systems is analyzed and it is concluded that experimenters can be very confident that if a significant difference is found between systems, the difference is indeed real.

The Philosophy of Information Retrieval Evaluation

The fundamental assumptions and appropriate uses of the Cranfield paradigm, especially as they apply in the context of the evaluation conferences, are reviewed.

Information Retrieval System Evaluation

This module introduces the evaluation in information retrieval by focusing on the standard measurement of system effectiveness through relevance judgments, and requires the Lucidworks software to perform the exercises.

Ranking retrieval systems without relevance judgments

The initial results of a new evaluation methodology which replaces human relevance judgments with a randomly selected mapping of documents to topics are proposed, which are referred to aspseudo-relevance judgments.
...