David S. Day

Learn More
As with several other veteran Muc participants, MITRE'S Alembic system has undergone a major trans formation in the past two years. The genesis of this transformation occurred during a dinner conversation at the last Muc conference, MUC-5 . At that time, several of us reluctantly admitted that our major impediment towards improved performance was reliance(More)
Historically, tailoring language processing systems to specific domains and languages for which they were not originally built has required a great deal of effort. Recent advances in corpus-based manual and automatic training methods have shown promise in reducing the time and cost of this porting process. These developments have focused even greater(More)
In this paper we present a statistical profile of the Named Enti ty task, a specific information extraction task for which corpora in several languages are available. Using the results of the statistical analysis, we propose an algorithm for lower bound est imation for Named Enti ty corpora and discuss the significance of the cross-lingual comparisons(More)
Introduction Tomorrow’s question answering systems will need to have the ability to process information about beliefs, opinions, and evaluations—the perspective of an agent. Answers to many simple factual questions—even yes/no questions—are affected by the perspective of the information source. For example, a questioner asking question (1) might be(More)
In order to support a range of textual annotation tasks, we have developed a new annotation tool called Callisto. To promote taskspecific specialization of the interface and associated constraint checking, Callisto provides a facility for the independent development, compilation and installation of task module plug-ins (in the form of Java Archive jar(More)
We present a novel approach to parsing phrase grammars based on Eric Brill's notion of rule sequences. The basic framework we describe has somewhat less power than a finite-state machine, and yet achieves high accuracy on standard phrase parsing tasks. The rule language is simple, which makes it easy to write rules. Further, this simplicity enables the(More)
MiTAP (MITRE Text and Audio Processing) is a prototype system available for monitoring infectious disease outbreaks and other global events. MiTAP focuses on providing timely, multi-lingual, global information access to medical experts and individuals involved in humanitarian assistance and relief work. Multiple information sources in multiple languages are(More)
For several years, chunking has been an integral part of MITRE's approach to information extraction. Our work exploits chunking in two principal ways. First, as part of our extraction system (Alembic) (Aberdeen et al., 1995), the chunker delineates descriptor phrases for entity extraction. Second, as part of our ongoing research in parsing, chunks provide(More)
Alembic is a comprehensive information extraction system that has been applied to a range of tasks. These include the now-standard components of the formal MOC evaluations: name tagging (NE in MUC-6), name normalization (WE), and template generation (ST). The system has also been exploited to help segment and index broadcast video and was used for early(More)