Stephen M. Harding

Learn More
— The Topic Detection and Tracking (TDT) research community investigates information retrieval methods for organizing a constantly arriving stream of news articles by the events that they discuss. Our best system for the open evaluations of TDT has used an approach that turned out to be problematic when the cluster detection technology was deployed in a(More)
_ The major responsibilities of the National Marine Fisheries Service (NMFS) are 10 monitor and assess the abundance and geographic distribution of fishery resources. 10 understand and predict fluctuations in the quantity and distribution of these resources, and 10 establish levels fOr their optimum use. NMFS is also charged with the development and(More)
The retrieval of OCR degraded text using n-gram formulations within a probabilistic retrieval system is examined in this paper. Direct retrieval of documents using n-gram databases of 2 and 3-grams or 2, 3, 4 and 5-grams resulted in improved retrieval performance over standard (word based) queries on the same data when a level of 10 percent degradation or(More)
Optical Character Recognition (OCR) is a critical part of many text-based applications. Although some commercial systems use the output from OCR devices to index documents without editing, there is very little quantitative data on the impact of OCR errors on the accuracy of a text retrieval system. Because of the diiculty of constructing test collections to(More)
  • 1