• Publications
  • Influence
The WEKA data mining software: an update
TLDR
This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Classifier chains for multi-label classification
TLDR
This paper presents a novel classifier chains method that can model label correlations while maintaining acceptable computational complexity, and illustrates the competitiveness of the chaining method against related and state-of-the-art methods, both in terms of predictive performance and time complexity.
MOA: Massive Online Analysis
TLDR
MOA includes a collection of offline and online methods as well as tools for evaluation that implements boosting, bagging, and Hoeffding Trees, all with and without Naive Bayes classifiers at the leaves.
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining
TLDR
A benchmark comparison of several attribute selection methods for supervised classification by cross-validating the attribute rankings with respect to a classification learner to find the best attributes.
New ensemble methods for evolving data streams
TLDR
A new experimental data stream framework for studying concept drift, and two new variants of Bagging: ADWIN Bagging and Adaptive-Size Hoeffding Tree (ASHT) Bagging are proposed.
Data mining in bioinformatics using Weka
UNLABELLED The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in
Active Learning With Drifting Streaming Data
TLDR
This paper presents a theoretically supported framework for active learning from drifting data streams and develops three active learning strategies for streaming data that explicitly handle concept drift, based on uncertainty, dynamic allocation of labeling efforts over time, and randomization of the search space.
Adaptive random forests for evolving data stream classification
TLDR
This work presents the adaptive random forest (ARF) algorithm, which includes an effective resampling method and adaptive operators that can cope with different types of concept drifts without complex optimizations for different data sets.
Leveraging Bagging for Evolving Data Streams
TLDR
A new variant of bagging is proposed, called leveraging bagging, which combines the simplicity of baging with adding more randomization to the input, and output of the classifiers.
Multinomial Naive Bayes for Text Categorization Revisited
TLDR
It is shown how the performance of multinomial naive Bayes can be improved using locally weighted learning, and that support vector machines are still the method of choice if the aim is to maximize accuracy.
...
...