A new rank correlation coefficient for information retrieval

  title={A new rank correlation coefficient for information retrieval},
  author={Emine Yilmaz and Javed A. Aslam and Stephen E. Robertson},
  booktitle={SIGIR '08},
In the field of information retrieval, one is often faced with the problem of computing the correlation between two ranked lists. The most commonly used statistic that quantifies this correlation is Kendall's Τ. Often times, in the information retrieval community, discrepancies among those items having high rankings are more important than those among items having low rankings. The Kendall's Τ statistic, however, does not make such distinctions and equally penalizes errors both at high and low… 

Figures from this paper

Weighted Rank Correlation in Information Retrieval Evaluation
A family *** * of rank correlation coefficients for IR has been introduced for the rank correlation according to the rank of the items, provided by the notion of gain previously utilized in retrieval effectiveness measurement.
On rank correlation and the distance between rankings
This work introduces an alternative measure of distance between rankings that corrects this by explicitly accounting for correlations between systems over a sample of topics, and moreover has a probabilistic interpretation for use in a test of statistical significance.
A Weighted Correlation Index for Rankings with Ties
This work proposes to extend Kendall's definition of correlation in a natural way to take into account weights in the presence of ties and proves the usefulness of the weighted measure of correlation using experimental data on social networks and web graphs.
A Head-Weighted Gap-Sensitive Correlation Coefficient
A new measure is introduced, τGAP, which combines both features of τAP and ρ, and is based on system comparisons from the TREC 5 Ad Hoc track and shows the differences in emphasis achieved.
Score Aggregation Techniques in Retrieval Experimentation
Using past TREC runs, it is shown that an adjusted geometric mean provides more consistent system rankings than the arithmetic mean when a significant fraction of the individual topic scores are close to zero, and that score standardization achieves the same outcome in a more consistent manner.
When Rank Order Isn't Enough: New Statistical-Significance-Aware Correlation Measures
This work proposes two statistical-significance-aware rank correlation measures, one of which is a head-weighted version of the other, and shows that use of these measures can lead to different experimental conclusions regarding reliability of alternative low-cost evaluation methods.
Generalized distances between rankings
This work extends Spearman's footrule and Kendall's tau to those with position and element weights, and shows that a variant of the Diaconis-Graham inequality still holds - the generalized two measures remain within a constant factor of each other for all permutations.
Toward Estimating the Rank Correlation between the Test Collection Results and the True System Performance
The Kendall ? and AP rank correlation coefficients have become mainstream in Information Retrieval research for comparing the rankings of systems produced by two different evaluation conditions, such
A mutual information-based framework for the analysis of information retrieval systems
A probabilistic framework for evaluation is proposed which is used to develop new information-theoretic evaluation metrics that are powerful and generalizable, enabling evaluations heretofore not possible.
Toward Rank Correlation as a Measure of Confidence in Information Retrieval Experiment Results
This thesis uses an approach that estimates the accuracy of test collections by estimating rank correlation between the observed and true mean scores of systems to provide a better sense ofence on the system evaluation results by accounting for the inherent variability in sampling topics.


On rank correlation in information retrieval evaluation
The paper then focuses on rank correlation between webpage lists ordered by PageRank for applying the general reflections on these test statistics and an interpretation of PageRank behaviour is provided.
Problems with Kendall's tau
Results are presented showing that basing decisions on thresholds for Kendall's Tau rank correlation coefficient is not as reliable as has been assumed.
Methods for ranking information retrieval systems without relevance judgments
The experimental results showed that the proposed methods are effective, and in many cases are more effective than Soboroff at al.'s method.
Ranking retrieval systems without relevance judgments
The initial results of a new evaluation methodology which replaces human relevance judgments with a randomly selected mapping of documents to topics are proposed, which are referred to aspseudo-relevance judgments.
Incremental test collections
An algorithm that intelligently selects documents to be judged and decides when to stop in such a way that with very little work there can be a high degree of confidence in the result of the evaluation is presented.
1. In psychological work the problem of comparing two different rankings of the same set of individuals may be divided into two types. In the first type the individuals have a given order A which is
Estimating average precision with incomplete and imperfect judgments
This work proposes three evaluation measures that are approximations to average precision even when the relevance judgments are incomplete and are more robust to incomplete or imperfect relevance judgments than bpref, and proposes estimates of average precision that are simple and accurate.
Evaluating strategies for similarity search on the web
A technique for automatically evaluating strategies using Web hierarchies, such as Open Directory, in place of user feedback, is presented, applying this evaluation methodology to a mix of document representation strategies, including the use of text, anchor-text, and links.
A unified model for metasearch, pooling, and system evaluation
A unified model is presented which simultaneously solves the problems of fusing the ranked lists of documents in order to obtain a high-quality combined list (metasearch); generating document collections likely to contain large fractions of relevant documents (pooling); and accurately evaluating the underlying retrieval systems with small numbers of relevance judgments (efficient system assessment).