Learn More
We describe the results of extensive experiments using optimized rule-based induction methods on large document collections. The goal of these methods is to discover automatically classification patterns that can be used for general document categorization or personalized filtering of free text. Previous reports indicate that human-engineered rule-based(More)
WITH THE ADVENT OF CENTRALized data warehouses, where data might be stored as electronic documents or as text fields in databases, text mining has increased in importance and economic value. One important goal in text mining is automatic classification of electronic documents. Computer programs scan text in a document and apply a model that assigns the(More)
Classification methods from statistical pattern recognition, neural nets, and machine learning were applied to four real-world data sets. Each of these data sets has been previously analyzed and reported in the statistical, medical, or machine learning literature. The data sets are characterized by statisucal uncertainty; there is no completely accurate(More)
We describe a machine learning method for predicting the value of a real-valued function , given the values of multiple input variables. The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules. A central objective of the method and representation is the induction of compact, easily interpretable(More)
As one of the most comprehensive machine learning texts around, this book does justice to the field's incredible richness, but without losing sight of the unifying principles. Peter Flach's clear, example-based approach begins by discussing how a spam filter works, which gives an immediate introduction to machine learning in action, with a minimum of(More)
A lightweight rule induction method is described that generates compact Disjunctive Normal Form (DNF) rules. Each class has an equal numb e r o f u n weighted rules. A new example is classiied by applying all rules and assigning the example to the class with the most satissed rules. The induction method attempts to minimize the training error with no(More)
We consider the automated identification of transmembrane domains in membrane protein sequences. 324 proteins (containing 1585 segments) were examined, representing every protein in the PIR database having the transmembrane domain feature annotation. Machine learning techniques were used to evaluate the efficacy of alternative hydrophobicity measures and(More)