• Publications
  • Influence
Topic Detection and Tracking Pilot Study Final Report
Topic Detection and Tracking (TDT) is a DARPA-sponsored initiative to investigate the state of the art in finding and following new events in a stream of broadcast news stories. The TDT problem
Topic detection and tracking: event-based information organization
TLDR
This collection of technical papers from leading researchers in the field not only provides several chapters devoted to the research program and its evaluation paradigm, but also presents the most current research results and describes some of the remaining open challenges.
A comparison of statistical significance tests for information retrieval evaluation
TLDR
It is discovered that there is little practical difference between the randomization, bootstrap, and t tests and their use should be discontinued for measuring the significance of a difference between means.
UMass at TREC 2004: Novelty and HARD
TLDR
The primary findings for passage retrieval are that document retrieval methods performed better than passage retrieval methods on the passage evaluation metric of binary preference at 12,000 characters, and that clarification forms improved passage retrieval for every retrieval method explored.
Retrieval and novelty detection at the sentence level
TLDR
This study investigates the more difficult two-part task defined by the TREC 2002 novelty track: given a topic and a group of documents relevant to that topic, find the relevant sentences from the documents, and 2) find the novel sentence from the collection of relevant sentences.
On-line new event detection and tracking
We define and describe the related problems of new event detection and event tracking within a stream of broadcast news stories. We focus on a strict on-line setting-i.e., the system must make
Automatic Query Expansion Using SMART: TREC 3
TLDR
This work continues the work in TREC 3, performing runs in the routing, ad-hoc, and foreign language environments, with a major focus on massive query expansion, adding from 300 to 530 terms to each query.
Text classification and named entities for new event detection
TLDR
This paper shows how performance on New Event Detection (NED) can be improved by the use of text classification techniques as well as by using named entities in a new way, and explores modifications to the document representation in a vector space-based NED system.
Introduction to topic detection and tracking
TLDR
This chapter defines the basic concepts of TDT and provides historical context for the concepts and provides an overview of the technical approaches that have been used and that have succeeded in evaluation tasks and workshops.
HARD Track Overview in TREC 2003: High Accuracy Retrieval from Documents
Abstract : The High Accuracy Retrieval from Documents (HARD) track explores methods for improving the accuracy of document retrieval systems. It does so by considering three questions. Can additional
...
1
2
3
4
5
...