Relevance assessment: are judges exchangeable and does it matter

  title={Relevance assessment: are judges exchangeable and does it matter},
  author={Peter Bailey and Nick Craswell and Ian Soboroff and Paul Thomas and Arjen P. de Vries and Emine Yılmaz},
We investigate to what extent people making relevance judgements for a reusable IR test collection are exchangeable. We consider three classes of judge: "gold standard" judges, who are topic originators and are experts in a particular information seeking task; "silver standard" judges, who are task experts but did not create topics; and "bronze standard" judges, who are those who did not define topics and are not experts in the task. Analysis shows low levels of agreement in relevance… CONTINUE READING
Highly Influential
This paper has highly influenced 11 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 176 citations. REVIEW CITATIONS

From This Paper

Figures, tables, and topics from this paper.
113 Citations
10 References
Similar Papers


Publications citing this paper.
Showing 1-10 of 113 extracted citations

176 Citations

Citations per Year
Semantic Scholar estimates that this publication has 176 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 10 references

The effect of variations in relevance assessments in comparative experimental tests of index languages

  • C. W. Cleverdon
  • Technical Report ASLIB part 2, Cranfield…
  • 1970
Highly Influential
6 Excerpts

Nonparametric Statistics for the Behavioral Sciences

  • S. Sigel, N. J. Castellan
  • McGraw-Hill
  • 1988
Highly Influential
4 Excerpts

Similar Papers

Loading similar papers…