Learn More
A prerequisite for all higher level information extraction tasks is the identification of unknown names in text. Today, when large corpora can consist of billions of words, it is of utmost importance to develop accurate techniques for the automatic detection, extraction and categorization of named entities in these corpora. Although named entity recognition(More)
This paper presents work on a method to detect names of proteins in running text. Our system Yapex uses a combination of lexical and syntactic knowledge, heuristic lters and a local dynamic dictionary. The syntactic information given by a general-purpose o-the-shelf parser supports the correct identication of the boundaries of protein names, and the local(More)
Dramatic improvements in sensor and image acquisition technology have created a demand for automated tools that can aid in the analysis of large image databases. We describe the development of JARtool, a trainable software system that learns to recognize volcanoes in a large data set of Venusian imagery. A machine learning approach is used because it is(More)
Documents can be assigned keywords by frequency analysis of the terms found in the document text, which arguably is the primary source of knowledge about the document itself. By including a hierarchically organised domain speciic thesaurus as a second knowledge source the quality of such keywords was improved considerably, as measured by match to previously(More)
An ensemble is a classiier created by combining the predictions of multiple component clas-siiers. We present a new method for combining classiiers into an ensemble based on a simple estimation of each classiier's competence. The classiiers are grouped into an ordered list where each classiier has a corresponding threshold. To classify an example, the rst(More)
As machine learning has graduated from toy problems to \real world" applications, users are nding that \real world" problems require them to perform aspects of problem solving that are not currently addressed by much of the machine learning literature. Speciically, users are nding that the tasks of selecting a set of features to deene a problem and(More)
The paper describes a set of experiments involving the application of three state-of-the-art part-of-speech taggers to Ethiopian Amharic, using three different tagsets. The taggers showed worse performance than previously reported results for Eng-lish, in particular having problems with unknown words. The best results were obtained using a Maximum Entropy(More)
H ogdalenverket is a combined heating and power station located in Stockholm, Swe-den. At H ogdalenverket, waste from Stock-holm households is burned to produce heat and power for the Stockholm area. H ogdalenverket has been instructed by the Swedish National Environment Protection Board to reduce its emissions of nitrogen oxides (NO x). One way to(More)