• Publications
  • Influence
Rank-biased precision for measurement of retrieval effectiveness
TLDR
A new effectiveness metric, rank-biased precision, is introduced that is derived from a simple model of user behavior, is robust if answer rankings are extended to greater depths, and allows accurate quantification of experimental uncertainty, even when only partial relevance judgments are available.
Bandage: interactive visualization of de novo genome assemblies
TLDR
Bandage (a Bioinformatics Application for Navigating De novo Assembly Graphs Easily) is a tool for visualizing assembly graphs with connections that presents new possibilities for analyzing de novo assemblies that are not possible through investigation of contigs alone.
Inverted files for text search engines
TLDR
This tutorial introduces the key techniques in the area of text indexing, describing both a core implementation and how the core can be enhanced through a range of extensions.
SRST2: Rapid genomic surveillance for public health and hospital microbiology labs
TLDR
This work presents SRST2, a read mapping-based tool for fast and accurate detection of genes, alleles and multi-locus sequence types (MLST) from WGS data, which is highly accurate and outperforms assembly-based methods in terms of both gene detection and allele assignment.
A similarity measure for indefinite rankings
TLDR
A new measure of the similarity between incomplete rankings, namely rank-biased overlap (RBO), is proposed, based on a simple probabilistic user model and extended to handle tied ranks and rankings of different lengths.
How reliable are the results of large-scale information retrieval experiments?
  • J. Zobel
  • Computer Science
    SIGIR '98
  • 1 August 1998
TLDR
A detailed empirical investigation of the TREC results shows that the measured relative performance of systems appears to be reliable, but that recall is overestimated: it is likely that many relevant documents have not been found.
Self-indexing inverted files for fast text retrieval
TLDR
This work shows that the CPU component of query response time for conjunctive Boolean queries and for informal ranked queries can be similarly reduced, at little cost in terms of storage, by the inclusion of an internal index in each compressed inverted list.
Information retrieval system evaluation: effort, sensitivity, and reliability
TLDR
It is found that the t-test is highly reliable (more so than the sign or Wilcoxon test), and is far more reliable than simply showing a large percentage difference in effectiveness measures between IR systems.
Methods for Identifying Versioned and Plagiarized Documents
TLDR
The identity measure and the best fingerprinting technique are both able to accurately identify coderivative documents, and it is demonstrated that the identity measure is clearly superior for fingerprinting parameters.
Inverted files versus signature files for text indexing
TLDR
A detailed comparison of inverted files and signature files in the context of text indexing shows that inverted files are distinctly superior to signature files, and shows that a synthetic text database can provide a realistic indication of the behavior of an actual text database.
...
...