• Publications
  • Influence
Overview of the TREC 2020 Deep Learning Track
The Deep Learning Track is a new track for TREC 2019, with the goal of studying ad hoc ranking in a large data regime. It is the first track with large human-labeled training sets, introducing two
A simple and efficient sampling method for estimating AP and NDCG
We consider the problem of large scale retrieval evaluation. Recently two methods based on random sampling were proposed as a solution to the extensive effort required to judge tens of thousands of
Estimating average precision with incomplete and imperfect judgments
This work proposes three evaluation measures that are approximations to average precision even when the relevance judgments are incomplete and are more robust to incomplete or imperfect relevance judgments than bpref, and proposes estimates of average precision that are simple and accurate.
A statistical method for system evaluation using incomplete judgments
This work considers the problem of large-scale retrieval evaluation, and proposes a statistical method for evaluating retrieval systems using incomplete judgments based on random sampling, which produces unbiased estimates of the standard measures themselves.
Extending average precision to graded relevance judgments
This work proposes a new measure of retrieval effectiveness, the Graded Average Precision (GAP), and shows that GAP can reliably be used as an objective metric in learning to rank by illustrating that optimizing for GAP using SoftRank and LambdaRank leads to better performing ranking functions than the ones constructed by algorithms tuned to optimize for AP or NDCG even when using AP orNDCG as the test metrics.
A new rank correlation coefficient for information retrieval
A new rank correlation coefficient, AP correlation (Τap), is proposed that is based on average precision and has a probabilistic interpretation and is shown to give more weight to the errors at high rankings and has nice mathematical properties which make it easy to interpret.
Relevance assessment: are judges exchangeable and does it matter
It appears that test collections are not completely robust to changes of judge when these judges vary widely in task and topic expertise, and both system scores and system rankings are subject to consistent but small differences across the three assessment sets.
Inferring and using location metadata to personalize web search
This paper shows how to infer a more general location relevance which uses not only physical location but a moregeneral notion of locations of interest for Web pages, and shows how location information can be incorporated into Web search ranking.
Self-Attentive Hawkes Process
SAHP employs self-attention to summarise the influence of history events and compute the probability of the next event and is more interpretable than RNN-based counterparts because the learnt attention weights reveal contributions of one event type to the happening of another type.
Dynamic Clustering of Streaming Short Documents
A new dynamic clustering topic model - DCT - is proposed that enables tracking the time-varying distributions of topics over documents and words over topics, and overcomes the difficulty of handling short text by assigning a single topic to each short document.