Estimation of the number of clusters on d-dimensional sphere

@article{Fujita2021EstimationOT,
  title={Estimation of the number of clusters on d-dimensional sphere},
  author={Kazuhisa Fujita},
  journal={ArXiv},
  year={2021},
  volume={abs/2011.07530}
}
Spherical data is distributed on the sphere. The data appears in various fields such as meteorology, biology, and natural language processing. However, a method for analysis of spherical data does not develop enough yet. One of the important issues is an estimation of the number of clusters in spherical data. To address the issue, I propose a new method called the Spherical X-means (SX-means) that can estimate the number of clusters on d-dimensional sphere. The SX-means is the model-based… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 24 REFERENCES
PG-means: learning the number of clusters in data
TLDR
A novel algorithm called PG-means is presented, able to learn the number of clusters in a classical Gaussian mixture model, which is robust and efficient, and provides a much more stable estimate of thenumber of clusters than existing methods. Expand
Estimating the number of clusters using diversity
TLDR
It is shown that the difference between the global diversity of clusters and the sum of each cluster’s local diversity of their members can be used as an effective indicator of the optimality of the number of clusters, where the diversity is measured by Rao's quadratic entropy. Expand
Clustering on the Unit Hypersphere using von Mises-Fisher Distributions
TLDR
A generative mixture-model approach to clustering directional data based on the von Mises-Fisher distribution, which arises naturally for data distributed on the unit hypersphere, and derives and analyzes two variants of the Expectation Maximization framework for estimating the mean and concentration parameters of this mixture. Expand
A Clustering Method for Data in Cylindrical Coordinates
We propose a new clustering method for data in cylindrical coordinates based on the -means. The goal of the -means family is to maximize an optimization function, which requires a similarity. Thus,Expand
Generative model-based clustering of directional data
TLDR
Modeling text data by vMF distributions lends theoretical validity to the use of cosine similarity which has been widely used by the information retrieval community and results indicate that this approach yields superior clusterings especially for difficult clustering tasks in high-dimensional spaces. Expand
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
TLDR
A new algorithm is introduced that eeciently, searches the space of cluster locations and number of clusters to optimize the Bayesian Information Criterion (BIC) or the Akaike Information Criteria (AIC) measure. Expand
Learning the k in k-means
TLDR
An improved algorithm for learning k while clustering based on a statistical test for the hypothesis that a subset of data follows a Gaussian distribution, which works well, and better than a recent method based on the BIC penalty for model complexity. Expand
Some methods for classification and analysis of multivariate observations
The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to giveExpand
Parameter estimation for von Mises–Fisher distributions
TLDR
An iterative algorithm using fixed points to obtain the maximum likelihood estimate (m.l.e.) for κ is proposed, and it is proved that there is a unique local maximum for δ, i.e. the level of precision of the von Mises–Fisher distribution. Expand
A Quantitative Discriminant Method of Elbow Point for the Optimal Number of Clusters in Clustering Algorithm
  • Congming Shi, Bingtao Wei, Shoulin Wei, Wen Wang, Hai Liu, Jialei Liu
  • Computer Science
  • 2020
TLDR
A new elbow point discriminant method is proposed to work out a statistical metric estimating an optimal cluster number when clustering on a dataset and the experimental results demonstrated that the estimated optimal clusters number output by the newly proposed method is better than widely used Silhouette method. Expand
...
1
2
3
...