Learn More
For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a binary property. In this paper, we contend that for many scenarios, it is more meaningful to assign to each(More)
Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all of the well-known clustering algorithms require input(More)
In order to access information from a variety of heterogeneous information sources, one has to be able to translate queries and data from one data model into another. This functionality is provided by so-called (source) <italic>wrappers</italic> [4,8] which convert queries into one or more commands/queries understandable by the underlying source and(More)
A broad class of algorithms for knowledge discovery in databases (KDD) relies heavily on similarity queries, i.e. range queries or nearest neighbor queries, in multidimensional feature spaces. Many KDD algorithms perform a similarity query for each point stored in the database. This approach causes serious performance degenera-tions if the considered data(More)
In this paper, we investigate how to scale hierarchical clustering methods (such as OPTICS) to extremely large databases by utilizing data compression methods (such as BIRCH or random sampling). We propose a three step procedure: 1) compress the data into suitable representative objects; 2) apply the hierarchical clustering algorithm only to these objects;(More)
For many KDD applications finding the outliers, i.e. the rare events, is more interesting and useful than finding the common cases, e.g. detecting criminal activities in E-commerce. Being an outlier, however, is not just a binary property. Instead, it is a property that applies to a certain degree to each object in a data set, depending on how 'isolated'(More)
One way to scale up clustering algorithms is to squash the data by some intelligent compression technique and cluster only the compressed data records. Such compressed data records can e.g. be produced by the BIRCH algorithm. Typically they consist of the sufficient statistics of the form (N, X, X 2) where N is the number of points, X is the (vector-)sum,(More)
  • 1