Learn More
Geo-spatial data with geographical information explodes as the development of GPS-devices. The data contains certain patterns of users. To dig out the patterns behind the data efficiently, a grid-growing clustering algorithm is introduced. The proposed algorithm takes use of a grid structure, and a novel clustering operation is presented, which considers a(More)
Different clustering algorithms achieve different results to certain data sets because most clustering algorithms are sensitive to the input parameters and the structure of data sets. Cluster validity, as the way of evaluating the result of the clustering algorithms, is one of the problems in cluster analysis. In this paper, we build up a framework for(More)
The main challenge of cluster analysis is that the number of clusters or the number of model parameters is seldom known, and it must therefore be determined before clustering. Bayesian information criterion (BIC) often serves as a statistical criterion for model selection, which can also be used in solving model-based clustering problems, in particular for(More)
Bayesian Information Criterion (BIC) is a promising method for detecting the number of clusters. It is often used in model-based clustering in which a decisive first local maximum is detected as the number of clusters. In this paper, we re-formulate the BIC in partitioning based clustering algorithm, and propose a new knee point finding method based on it.(More)
Article history: Received 5 September 2012 Received in revised form 2 May 2014 Accepted 11 July 2014 Available online 17 July 2014 Determining the number of clusters is an important part of cluster validity that has been widely studied in cluster analysis. Sum-of-squares based indices show promising properties in terms of determining the number of clusters.(More)
0167-8655/$ see front matter 2012 Elsevier B.V. A http://dx.doi.org/10.1016/j.patrec.2012.06.017 ⇑ Corresponding author. Tel.: +358 132517962. E-mail address: zhao@cs.joensuu.fi (Q. Zhao). Expectation maximization (EM) algorithm is a popular way to estimate the parameters of Gaussian mixture models. Unfortunately, its performance highly depends on the(More)
Over the past three decades, Information Retrieval (IR) has been studied extensively. The purpose of information retrieval is to assist users in locating information they are looking for. Information retrieval is currently being applied in a variety of application domains from database systems to web information search engines. The main idea of it is to(More)
The expectation-maximization (EM) algorithm is a popular tool in estimating model parameters, especially mixture models. As the EM algorithm is a hill-climbing approach, problems such as local maxima, plateau and ridges may appear. In the case of mixture models, these problems involve the initialization of the algorithm and the structure of the data set. We(More)