Peter Reutemann

Learn More
More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4(More)
WEKA is a popular machine learning workbench with a development life of nearly two decades. This article provides an overview of the factors that we believe to be important to its success. Rather than focussing on the software’s functionality, we review aspects of project management and historical development decisions that likely had an impact on the(More)
The development of data-mining applications such as textclassification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the unlabeled examples greatly outnumber the labeled examples. In this paper we present a two-stage classifier that improves its predictive(More)
Multi-label classification has rapidly attracted interest in the machine learning literature, and there are now a large number and considerable variety of methods for this type of learning. We present Meka: an open-source Java framework based on the well-known Weka library. Meka provides interfaces to facilitate practical application, and a wealth of(More)
• uses SQL aggregate functions like SUM, MIN, MAX, AVG and computed standard deviation, quartile and range to capture relational information • for each value of a nominal column a new attribute is introduced, containing the number of occurrences • pairs of attributes (one is nominal) are used as GROUP BY conditions for additional aggregations • determines(More)
Databases predominantly employ the relational model for data storage. To use this data in a propositional learner, a propositionalization step has to take place. Similarly, the data has to be transformed to be amenable to a multi-instance learner. The Proper Toolbox contains an extended version of RELAGGS, the Multi-Instance Learning Kit MILK, and can also(More)
The process of extracting information from a dataset and transforming it into an understandable structure for further use is called as data mining. A number of important techniques such as preprocessing, classification, clustering are performed in data mining using WEKA tool. In medical diagnoses the role of data mining approaches is being increased.(More)
Data mining means to find out some useful information from a big warehouse of data and the process is aimed at unfolding old records and identifying novel patterns from the data. Data mining is used for classification and prediction. Many techniques and algorithms are available for mining the data. Out of many techniques, the decision tree is the simplest.(More)
Domains like text classification can easily supply large amounts of unlabeled data, but labeling itself is expensive. Semi-supervised learning tries to exploit this abundance of unlabeled training data to improve classification. Unfortunately most of the theoretically well-founded algorithms that have been described in recent years are cubic or worse in the(More)