Malay K. Pakhira

Learn More
The k-means algorithm is one of the most widely used clustering algorithms and has been applied in many fields of science and technology. One of the major problems of the k-means algorithm is that it may produce empty clusters depending on initial center vectors. For static execution of the k-means, this problem is considered insignificant and can be solved(More)
A number of algorithms and strategies and their variations are currently being used for solving complex optimization problems. Genetic algorithms (GAs) are one of the best strategies for solving such problems basically due to their inherent parallel search capability. Other methods found useful in diverse application areas are simulated annealing, evolution(More)
Determining number of clusters present in a data set is an important problem in clustering. There exist very few techniques that can solve this problem satisfactorily. Some of these techniques rely on user supplied information, while some others use cluster validity indices which are expensive with regard to computation time. This paper proposes an(More)
In this article, a distributed clustering technique, that is suitable for dealing with large data sets, is presented. This algorithm is actually a modified version of the very common k-means algorithm with suitable changes for making it executable in a distributed environment. For large input size, the running time complexity of k-means algorithm is very(More)
Determining number of clusters present in a data set is an important problem in clustering. There exist very few techniques that can solve this problem satisfactorily. Most of these techniques are expensive with regard to computation time. This paper proposes an alternative solution for the concerned problem that makes use of the concepts of genetic(More)
Determining number of clusters present in a data set is an important problem in clustering. There exist very few techniques that can solve this problem satisfactorily. Most of these techniques are expensive with regard to computation time. Recently VAT (Visual Assessment of Tendency for clustering) images of data sets are used for this purpose along with GA(More)
The k-means algorithm is one of the most popular clustering algorithms in use today. The high running time complexity of serial k-means limits its applicability for very large databases. On the other hand, the existing parallel kmeans algorithms demand huge data transfer operations incorporating high communication complexity. Transfer of actual data from(More)
The CLARA algorithm is one of the popular clustering algorithms in use nowadays. This algorithm works on a randomly selected subset of the original data and produces near accurate results at a faster rate than other clustering algorithms. CLARA is basically used in data mining applications. We have used this algorithm for color image segmentation.The(More)
Determining the number of clusters present in a data set automatically is a very important problem. Conventional clustering techniques assume a certain number of clusters, and then try to find out the possible cluster structure associated to the above number. For very large and complex data sets it is not easy to guess this number of clusters. There exists(More)