Learn More
Much recent attention, both experimental and theoretical, has been focussed on classi-cation algorithms which produce voted combinations of classiiers. Recent theoretical work has shown that the impressive generalization performance of algorithms like Ad-aBoost can be attributed to the classiier having large margins on the training data. We present an(More)
This study compares five well-known association rule algorithms using three real-world datasets and an artificial dataset. The experimental results confirm the performance improvements previously claimed by the authors on the artificial data, but some of these gains do not carry over to the real datasets, indicating overfitting of the algorithms to the IBM(More)
Recent theoretical results have shown that the generalization performance of thresh-olded convex combinations of base classiiers is greatly improved if the underlying convex combination has large margins on the training data (correct examples are classiied well away from the decision boundary). Neural network algorithms and AdaBoost have been shown to(More)
We describe KDD-Cup 2000, the yearly competition in data mining. For the first time the Cup included insight problems in addition to prediction problems, thus posing new challenges in both the knowledge discovery and the evaluation criteria, and highlighting the need to " peel the onion " and drill deeper into the reasons for the initial patterns found. We(More)
The architecture of Blue Martini Software's e-commerce suite has supported data collection, transformation, and data mining since its inception. With clickstreams being collected at the application-server layer, high-level events being logged, and data automatically transformed into a data warehouse using meta-data, common problems plaguing data mining(More)
We show that the e-commerce domain can provide all the right ingredients for successful data mining and claim that it is a killer domain for data mining. We describe an integrated architecture, based on our experience at Blue Martini Software, for supporting this integration. The architecture can dramatically reduce the pre-processing, cleaning, and data(More)
Recent theoretical results for pattern classiication with thresh-olded real-valued functions (such as support vector machines, sig-moid networks, and boosting) give bounds on misclassiication probability that do not depend on the size of the classiier, and hence can be considerably smaller than the bounds that follow from the VC theory. In this paper, we(More)