• Publications
  • Influence
The WEKA data mining software: an update
This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003. Expand
Data mining: practical machine learning tools and techniques, 3rd Edition
This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Expand
Data mining: practical machine learning tools and techniques with Java implementations
This presentation discusses the design and implementation of machine learning algorithms in Java, as well as some of the techniques used to develop and implement these algorithms. Expand
Data mining - practical machine learning tools and techniques, Second Edition
  • I. Witten, Eibe Frank
  • Computer Science
  • The Morgan Kaufmann series in data management…
  • 22 June 2005
This book describes a body of practical techniques that can extract useful information from raw data and shows how they work. Expand
Learning to link with wikipedia
This paper explains how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to the appropriate Wikipedia articles, and performs very well, with recall and precision of almost 75%. Expand
Managing Gigabytes: Compressing and Indexing Documents and Images
A guide to the MG system and its applications, as well as a comparison to the NZDL reference index, are provided. Expand
Generating Accurate Rule Sets Without Global Optimization
This paper presents an algorithm for inferring rules by repeatedly generating partial decision trees, thus combining the two major paradigms for rule generation—creating rules from decision trees and the separate-and-conquer rule-learning technique. Expand
KEA: practical automatic keyphrase extraction
This paper uses a large test corpus to evaluate Kea’s effectiveness in terms of how many author-assigned keyphrases are correctly identified, and describes the system, which is simple, robust, and publicly available. Expand
Data Compression Using Adaptive Coding and Partial String Matching
This paper describes how the conflict can be resolved with partial string matching, and reports experimental results which show that mixed-case English text can be coded in as little as 2.2 bits/ character with no prior knowledge of the source. Expand
Arithmetic coding for data compression
The state of the art in data compression is arithmetic coding, not the better-known Huffman method. Arithmetic coding gives greater compression, is faster for adaptive models, and clearly separatesExpand