Jerzy Stefanowski

Learn More
Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search results list returned from a search engine. In this paper we present Lingo—a novel algorithm for clustering search results, which emphasizes cluster description quality. We describe methods used in the algorithm: algebraic transformations of the(More)
Data stream mining has been receiving increased attention due to its presence in a wide range of applications, such as sensor networks, banking, and telecommunication. One of the most important challenges in learning from data streams is reacting to concept drift, i.e., unforeseen changes of the stream's underlying data distribution. Several classification(More)
Consideration of preference-orders requires the use of an extended rough set model called Dominance-based Rough Set Approach (DRSA). The rough approximations defined within DRSA are based on consistency in the sense of dominance principle. It requires that objects having not-worse evaluation with respect to a set of considered criteria than a referent(More)
The paper discusses problems of constructing classifiers from imbalanced data. Re-sampling approaches that change the original class distribution are often used to improve performance of classifiers for the minority class. We describe a new approach to selective pre-processing of imbalanced data which combines local over-sampling of the minority class with(More)
In this paper we discuss problems of inducing classifiers from imbalanced data and improving recognition of minority class using focused resampling techniques. We are particularly interested in SMOTE over-sampling method that generates new synthetic examples from the minority class between the closest neighbours from this class. However, SMOTE could also(More)
Classification datasets often have an unequal class distribution among their examples. This problem is known as imbalanced classification. The Synthetic Minority Over-sampling Technique (SMOTE) is one of the most well-know data pre-processing methods to cope with it and tobalance thedifferentnumberof examples of eachclass.However, as recentworks claim,(More)
In this paper we studied re-sampling methods for learning classifiers from imbalanced data. We carried out a series of experiments on artificial data sets to explore the impact of noisy and borderline examples from the minority class on the classifier performance. Results showed that if data was sufficiently disturbed by these factors, then the focused(More)