Data Set Used
Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions of all… (More)
In this paper I present the EMILE 3.0 algorithm 1. It can learn shallow context-free grammars eeciently. It does so under circumstances that, from a perspective of complexity, come resonably close to the conditions under which human beings learn a language. A language is shallow in its descriptive length if all the relevant constructions we need to know to… (More)
In this paper we describe an ecient and scalable implementation for grammar induction based on the EMILE approach (2], 3],,4], 5], 6]). The current EMILE 4.1 implementation ((11]) is one of the rst eecient grammar induction algorithms that work on free text. Although EMILE 4.1 is far from perfect, it enables researchers to do empirical grammar induction… (More)
This introductory paper to the special issue on Data Mining Lessons Learned presents lessons from data mining applications, including experience from science, business, and knowledge management in a collaborative data mining setting.
Large scale scientific applications require extensive support from middleware and frameworks that provide the capabilities for distributed execution in the Grid environment. In particular, one of the examples of such frameworks is a Grid-enabled workflow management system. In this paper we present WS-VLAM workflow management system, describe its current… (More)
This paper describes how binary associations in databases of items can be organised and clustered. Two similarity measures are presented that can be used to generate a weighted graph of associations. Each measure focuses on different kinds of regularities in the database. By calculating a Minimum Spanning Tree on the graph of associations, the most… (More)
We point out a potential weakness in the application of the celebrated minimum description length (MDL) principle for model selection. Specifically, it is shown that (although the index of the model class which actually minimizes a two-part code has many desirable properties) a model which has a shorter two- part code-length than another is not necessarily… (More)