Learn More
A prerequisite for all higher level information extraction tasks is the identification of unknown names in text. Today, when large corpora can consist of billions of words, it is of utmost importance to develop accurate techniques for the automatic detection, extraction and categorization of named entities in these corpora. Although named entity recognition(More)
Divide-and-Conquer (DAC) and Separate-and-Conquer (SAC) are two strategies for rule induction that have been used extensively. When searching for rules DAC is maximally conservative w.r.t. decisions made during search for previous rules. This results in a very eecient strategy, which however suuers from diiculties in eeectively inducing disjunctive concepts(More)
The paper describes a set of experiments involving the application of three state-ofthe-art part-of-speech taggers to Ethiopian Amharic, using three different tagsets. The taggers showed worse performance than previously reported results for English, in particular having problems with unknown words. The best results were obtained using a Maximum Entropy(More)
Dramatic improvements in sensor and image acquisition technology have created a demand for automated tools that can aid in the analysis of large image databases. We describe the development of JARtool, a trainable software system that learns to recognize volcanoes in a large data set of Venusian imagery. A machine learning approach is used because it is(More)
Documents can be assigned keywords by frequency analysis of the terms found in the document text, which arguably is the primary source of knowledge about the document itself. By including a hierarchically organised domain speciic thesaurus as a second knowledge source the quality of such keywords was improved considerably, as measured by match to previously(More)
Lars Asker Richard Maclin Jet Propulsion Laboratory Department of Computer Science M/S 525-3660 University of Minnesota Pasadena, California 91109-8099 Duluth, Minnesota 55812-2496 Abstract An ensemble is a classi er created by combining the predictions of multiple component classi ers. We present a new method for combining classi ers into an ensemble based(More)
This paper presents work on a method to detect names of proteins in running text. Our system Yapex uses a combination of lexical and syntactic knowledge, heuristic lters and a local dynamic dictionary. The syntactic information given by a general-purpose o -theshelf parser supports the correct identi cation of the boundaries of protein names, and the local(More)
As machine learning has graduated from toy problems to \real world" applications, users are nding that \real world" problems require them to perform aspects of problem solving that are not currently addressed by much of the machine learning literature. Speciically, users are nding that the tasks of selecting a set of features to deene a problem and(More)
We present two approaches to the Amharic – English bilingual track in CLEF 2004. Both experiments use a dictionary based approach to translate the Amharic queries into English Bags-of-words, but while one approach removes non-content bearing words from the Amharic queries based on their IDF value, the other uses a list of English stop words to perform the(More)