Andries Kruger

Learn More
Users looking for documents within specific categories may have a difficult time locating valuable documents using general purpose search engines. We present an automated method for learning query modifications that can dramatically improve precision for locating pages within specified categories using web search engines. We also present a classification(More)
The web has greatly improved the accessibility of scientific information, however the role of the web in formal scientific publishing has been debated. Some argue that the lack of persistence of web resources means that they should not be cited in scientific research. We analyze references to web resources in computer science publications, finding that the(More)
We analyze the persistence of information on the web, looking at the percentage of invalid URLs contained in academic articles within the CiteSeer database. The number of URLs contained in the papers has increased from an average of 0.06 in 1993 to 1.6 in 1999. We found that a significant percentage of URLs are now invalid, ranging from 23% for 1999(More)
We present DEADLINER, a search engine that catalogs conference and workshop announcements, and ultimately will monitor and extract a wide range of academic convocation material from the web. The system currently extracts speakers, locations, dates, paper submission (and other) deadlines, topics, program committees, abstracts, and affiliations. A user or(More)
Increases in the frequency, duration and intensity of heat waves are frequently evoked in climate change predictions. However, there is no universal definition of a heat wave. Recent, intense hot weather events have caused mass mortalities of birds, bats and even humans, making the definition and prediction of heat wave events that have the potential to(More)
Corpus-based translation research emerged in the late 1990s as a new area of research in the discipline of translation studies. It is informed by a specific area of linguistics known as corpus linguistics which involves the analysis of large corpora of authentic running text by means of computer software. Within linguistics, this methodology has(More)
R esearchers have long desired immediate access to all scientific knowledge. Although there are still major hurdles to overcome, the Internet has brought this goal closer to reality. Scientists use the Internet to communicate their findings to a broader audience than ever before, and formal references to information on the Web are increasingly common.(More)
We present a methodology for rapid implementation of specialized search engines. To catalog data, these search engines interpret and classify the content of web material to identify different representations of common domain-related elements. While designers can typically develop multiple partial solutions for interpreting the data, acceptable relevance(More)
  • 1