The WEKA data mining software: an update
- M. Hall, Eibe Frank, G. Holmes, B. Pfahringer, P. Reutemann, I. Witten
- Computer ScienceSKDD
- 16 November 2009
This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Data mining: practical machine learning tools and techniques, 3rd Edition
This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Data mining: practical machine learning tools and techniques with Java implementations
This presentation discusses the design and implementation of machine learning algorithms in Java, as well as some of the techniques used to develop and implement these algorithms.
Learning to link with wikipedia
- David N. Milne, I. Witten
- Computer ScienceInternational Conference on Information and…
- 26 October 2008
This paper explains how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to the appropriate Wikipedia articles, and performs very well, with recall and precision of almost 75%.
Data mining - practical machine learning tools and techniques, Second Edition
This book describes a body of practical techniques that can extract useful information from raw data and shows how they work.
Generating Accurate Rule Sets Without Global Optimization
This paper presents an algorithm for inferring rules by repeatedly generating partial decision trees, thus combining the two major paradigms for rule generation—creating rules from decision trees and the separate-and-conquer rule-learning technique.
Managing Gigabytes: Compressing and Indexing Documents and Images
A guide to the MG system and its applications, as well as a comparison to the NZDL reference index, are provided.
Data Compression Using Adaptive Coding and Partial String Matching
This paper describes how the conflict can be resolved with partial string matching, and reports experimental results which show that mixed-case English text can be coded in as little as 2.2 bits/ character with no prior knowledge of the source.
KEA: practical automatic keyphrase extraction
- I. Witten, G. Paynter, Eibe Frank, C. Gutwin, C. Nevill-Manning
- Computer ScienceDigital library
- 4 February 1999
This paper uses a large test corpus to evaluate Kea’s effectiveness in terms of how many author-assigned keyphrases are correctly identified, and describes the system, which is simple, robust, and publicly available.
Arithmetic coding for data compression
The state of the art in data compression is arithmetic coding, not the better-known Huffman method. Arithmetic coding gives greater compression, is faster for adaptive models, and clearly separates…