Learn More
The naive Bayesian classifier provides a simple and effective approach to classifier learning, but its attribute independence assumption is often violated in the real world. A number of approaches have sought to alleviate this problem. A Bayesian tree learning algorithm builds a decision tree, and generates a local naive Bayesian classifier at each leaf.(More)
We describe KDD-Cup 2000, the yearly competition in data mining. For the first time the Cup included insight problems in addition to prediction problems, thus posing new challenges in both the knowledge discovery and the evaluation criteria, and highlighting the need to “peel the onion” and drill deeper into the reasons for the initial patterns found. We(More)
This paper explores two techniques for boosting cost-sensitive trees. The two techniques diier in whether the misclassiication cost information is utilized during training. We demonstrate that each of these techniques is good at diierent aspects of cost-sensitive classiications. We also show that both techniques provide a means to overcome the weaknesses of(More)
The architecture of Blue Martini Software's e-commerce suite has supported data collection, data transformation, and data mining since its inception. With clickstreams being collected at the application-server layer, high-level events being logged, and data automatically transformed into a data warehouse using meta-data, common problems plaguing data mining(More)
We show that the e-commerce domain can provide all the right ingredients for successful data mining and claim that it is a killer domain for data mining. We describe an integrated architecture, based on our experience at Blue Martini Software, for supporting this integration. The architecture can dramatically reduce the pre-processing, cleaning, and data(More)
The top web search result is crucial for user satisfaction with the web search experience. We argue that the importance of the relevance at the top position necessitates special handling of the top web search result for some queries. We propose an effective approach of leveraging millions of past user interactions with a web search engine to automatically(More)
Most constructive induction researchers focus only on new boolean attributes This paper reports a new constructive induction algor i thm, called XofN, that constructs new nominal attributes in the form of X-of-N representations An X-of-N is a Bet containing one or more attribute-value pairs For a given instance, its value corresponds to the number of its at(More)
In this article, we report our efforts in mining the information encoded as clickthrough data in the server logs to evaluate and monitor the relevance ranking quality of a commercial web search engine. We describe a metric called pSkip that aims to quantify the ranking quality by estimating the probability of users encountering non relevant results that(More)