• Publications
  • Influence
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
TLDR
DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it. Expand
SCAN: a structural clustering algorithm for networks
TLDR
A novel algorithm called SCAN (Structural Clustering Algorithm for Networks), which detects clusters, hubs and outliers in networks and clusters vertices based on a structural similarity measure is proposed. Expand
Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications
TLDR
The generalized algorithm DBSCAN can cluster point objects as well as spatially extended objects according to both, their spatial and their nonspatial attributes, and four applications using 2D points (astronomy, 3D points,biology, 5D points and 2D polygons) are presented, demonstrating the applicability of GDBSCAN to real-world problems. Expand
Frequent term-based text clustering
TLDR
Two algorithms for frequent term-based text clustering are presented, FTC which creates flat clusterings and HFTC for hierarchical clustering, which obtain clusterings of comparable quality significantly more efficiently than state-of-the- artText clustering algorithms. Expand
Incremental Clustering for Mining in a Data Warehousing Environment
TLDR
It can be proven that the incremental algorithm yields the same result as DBSCAN, which is applicable to any database containing data from a metric space, e.g., to a spatial database or to a WWW-log database. Expand
DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN
TLDR
In new experiments, it is shown that the new SIGMOD 2015 methods do not appear to offer practical benefits if the DBSCAN parameters are well chosen and thus they are primarily of theoretical interest. Expand
A distribution-based clustering algorithm for mining in large spatial databases
TLDR
The new clustering algorithm DBCLASD (Distribution-Based Clustering of LArge Spatial Databases) is introduced to discover clusters of this type and is very attractive when considering its nonparametric nature and its good quality for clusters of arbitrary shape. Expand
Representative Sampling for Text Classification Using Support Vector Machines
TLDR
A straightforward active learning heuristic, representative sampling, is described, which explores the clustering structure of 'uncertain' documents and identifies the representative samples to query the user opinions, for the purpose of speeding up the convergence of Support Vector Machine (SVM) classifiers. Expand
A Fast Parallel Clustering Algorithm for Large Spatial Databases
TLDR
The dR*-tree is introduced, a distributed spatial index structure in which the data is spread among multiple computers and the indexes of the data are replicated on every computer in the ‘shared-nothing’ architecture with multiple computers interconnected through a network. Expand
Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification
TLDR
This paper addresses the task of class identification in spatial databases using clustering techniques using a well-known spatial access method, the R*-tree, and presents several strategies for focusing: selecting representatives from a spatial database, focusing on the relevant clusters and retrieving all objects of a given cluster. Expand
...
1
2
3
4
5
...