Sholom M. Weiss

Learn More
We describe the results of extensive experiments using optimized rule-based induction methods on large document collections. The goal of these methods is to discover automatically classification patterns that can be used for general document categorization or personalized filtering of free text. Previous reports indicate that human-engineered rule-based(More)
Classification methods from statistical pattern recognition, neural nets, and machine learning were applied to four real-world data sets. Each of these data sets has been previously analyzed and reported in the statistical, medical, or machine learning literature. The data sets are characterized by statisucal uncertainty; there is no completely accurate(More)
WITH THE ADVENT OF CENTRALized data warehouses, where data might be stored as electronic documents or as text fields in databases, text mining has increased in importance and economic value. One important goal in text mining is automatic classification of electronic documents. Computer programs scan text in a document and apply a model that assigns the(More)
We describe the results of extensive machine learning experiments on large collections of Reuters’ English and German newswires. The goal of these experiments was to automatically discover classification patterns that can be used for assignment of topics to the individual newswires. Our results with the English newswire collection show a very large gain in(More)
We describe a machine learning method for predicting the value of a real-valued function, given the values of multiple input variables. The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules. A central objective of the method and representation is the induction of compact, easily interpretable(More)
This paper describes the use of decision tree and rule induction in data mining applications. Of methods for classi cation and regression that have been developed in the elds of pattern recognition, statistics, and machine learning, these are of particular interest for data mining since they utilize symbolic and interpretable representations. Symbolic(More)
A lightweight rule induction method is described that generates compact Disjunctive Normal Form (DNF) rules. Each class has an equal number of unweighted rules. A new example is classi ed by applying all rules and assigning the example to the class with the most satis ed rules. The induction method attempts to minimize the training error with no pruning. An(More)