Estimation of Fair Ranking Metrics with Incomplete Judgments

  title={Estimation of Fair Ranking Metrics with Incomplete Judgments},
  author={{\"O}mer Kirnap and Fernando Diaz and Asia J. Biega and Michael D. Ekstrand and Ben Carterette and Emine Yilmaz},
  journal={Proceedings of the Web Conference 2021},
There is increasing attention to evaluating the fairness of search system ranking decisions. These metrics often consider the membership of items to particular groups, often identified using protected attributes such as gender or ethnicity. To date, these metrics typically assume the availability and completeness of protected attribute labels of items. However, the protected attributes of individuals are rarely present, limiting the application of fair ranking metrics in large scale systems. In… Expand
1 Citations

Figures and Tables from this paper

Investigating and Mitigating Biases in Crowdsourced Data
This workshop aims to foster discussion on ongoing research around biases in crowdsourced data and to identify future research directions to detect, quantify and mitigate biases before, during and after the labelling process such that both task requesters and crowd workers can benefit. Expand


FARE: Diagnostics for Fair Ranking using Pairwise Error Metrics
This work designs a fair auditing mechanism which captures group treatment throughout the entire ranking, generating in-depth yet nuanced diagnostics, and demonstrates the efficacy of the error metrics using real-world scenarios, exposing trade-offs among fairness criteria and providing guidance in the selection of fair-ranking algorithms. Expand
Measuring Fairness in Ranked Outputs
A data generation procedure is developed that allows for systematically control the degree of unfairness in the output, and the proposed fairness measures for ranked outputs are applied to several real datasets, and results show potential for improving fairness of ranked outputs while maintaining accuracy. Expand
Ranking with Fairness Constraints
This work studies the following variant of the traditional ranking problem when the objective satisfies properties that appear in common ranking metrics such as Discounted Cumulative Gain, Spearman's rho or Bradley-Terry. Expand
Fairness of Exposure in Rankings
This work proposes a conceptual and computational framework that allows the formulation of fairness constraints on rankings in terms of exposure allocation, and develops efficient algorithms for finding rankings that maximize the utility for the user while provably satisfying a specifiable notion of fairness. Expand
Equity of Attention: Amortizing Individual Fairness in Rankings
The challenge of achieving amortized individual fairness subject to constraints on ranking quality as an online optimization problem is formulated and solved as an integer linear program and it is demonstrated that the method can improve individual fairness while retaining high ranking quality. Expand
FA*IR: A Fair Top-k Ranking Algorithm
This work defines and solves the Fair Top-k Ranking problem, and presents an efficient algorithm, which is the first algorithm grounded in statistical tests that can mitigate biases in the representation of an under-represented group along a ranked list. Expand
Fairness in Recommendation Ranking through Pairwise Comparisons
This paper offers a set of novel metrics for evaluating algorithmic fairness concerns in recommender systems and shows how measuring fairness based on pairwise comparisons from randomized experiments provides a tractable means to reason about fairness in rankings fromRecommender systems. Expand
Overview of the TREC 2019 Fair Ranking Track
An overview of the TREC Fair Ranking track is presented, including the task definition, descriptions of the data and the annotation process, as well as a comparison of the performance of submitted systems. Expand
Rank-biased precision for measurement of retrieval effectiveness
A new effectiveness metric, rank-biased precision, is introduced that is derived from a simple model of user behavior, is robust if answer rankings are extended to greater depths, and allows accurate quantification of experimental uncertainty, even when only partial relevance judgments are available. Expand
A statistical method for system evaluation using incomplete judgments
This work considers the problem of large-scale retrieval evaluation, and proposes a statistical method for evaluating retrieval systems using incomplete judgments based on random sampling, which produces unbiased estimates of the standard measures themselves. Expand