• Corpus ID: 244527396

Two step clustering for data reduction combining DBSCAN and k-means clustering

@article{Kremers2021TwoSC,
  title={Two step clustering for data reduction combining DBSCAN and k-means clustering},
  author={Bart J. J. Kremers and Aaron J. Ho and Jonathan Citrin and K. van de Plassche},
  journal={ArXiv},
  year={2021},
  volume={abs/2111.12559}
}
A novel combination of two widely-used clustering algorithms is proposed here for the detection and reduction of high data density regions. The Density Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is used for the detection of high data density regions and the kmeans algorithm for reduction. The proposed algorithm iterates while successively decrementing the DBSCAN search radius, allowing for an adaptive reduction factor based on the effective data density. The… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 22 REFERENCES

AA-DBSCAN: an approximate adaptive DBSCAN for finding clusters with varying densities

Clustering is a typical data mining technique that partitions a dataset into multiple subsets of similar objects according to similarity metrics. In particular, density-based algorithms can find

OPTICS: ordering points to identify the clustering structure

A new algorithm is introduced for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure.

ADBSCAN: Adaptive Density-Based Spatial Clustering of Applications with Noise for Identifying Clusters with Varying Densities

An adaptive DBSCAN is proposed which can work significantly well for identifying clusters with varying densities and which demonstrates reduced performances for clusters with different densities.

GMDBSCAN: Multi-Density DBSCAN Cluster Based on Grid

This work proposes GMDBSCAN algorithm which is based on spatial index and grid technique, and an experimental evaluation shows that it is effective and efficient.

Neural network surrogate of QuaLiKiz using JET experimental data to populate training space

Within integrated tokamak plasma modeling, turbulent transport codes are typically the computational bottleneck limiting their routine use outside of post-discharge analysis. Neural network (NN)

Fast modeling of turbulent transport in fusion plasmas using neural networks

An ultrafast neural network model, QLKNN, which predicts core tokamak transport heat and particle fluxes based on a database of 300 million flux calculations of the quasilinear gyrokinetic transport model QuaLiKiz is presented.

Tractable flux-driven temperature, density, and rotation profile evolution with the quasilinear gyrokinetic transport model QuaLiKiz

Quasilinear turbulent transport models are a successful tool for prediction of core tokamak plasma profiles in many regimes. Their success hinges on the reproduction of local nonlinear gyrokinetic