Learn More
5 6 Abstract 7 Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, 8 we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses sto-9 chastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike(More)
Fast and high-quality document clustering algorithms play an important role in effectively navigating, summarizing, and organizing information. Recent studies have shown that partitional clustering algorithms are more suitable for clustering large datasets. However, the K-means algorithm, the most commonly used partitional clustering algorithm, can only(More)
In this paper, we propose a new term weighting scheme called Term Frequency – Inverse Corpus Frequency (TF-ICF). It does not require term frequency information from other documents within the document collection and thus, it enables us to generate the document vectors of N streaming documents in linear time. In the context of a machine learning application,(More)
We intend to demonstrate that if the business model cannot adjust to new technology, by recognizing a) its limitations, b) the ability of the organization to control it, and c) by adjusting its deadlines to take advantage of the methodology potentials, it is unlikely that an investment in the technology will result in real productivity benefits. As software(More)
Analyzing and clustering large scale data set is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. One limitation of this method of data clustering is its complexity O(n 2). As the number of data and feature dimensions grows, it becomes increasingly difficult to generate results in(More)
In the real world, we have to frequently deal with searching for and tracking an optimal solution in a dynamic environment. This demands that the algorithm not only find the optimal solution but also track the trajectory of the solution in a dynamic environment. Particle Swarm Optimization (PSO) is a population-based stochastic optimization technique, which(More)
Latent Semantic Analysis (LSA) can be used to reduce the dimensions of large Term-Document datasets using Singular Value Decomposition. However, with the ever expanding size of data sets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. The Graphics Processing Unit (GPU) can solve some highly parallel(More)
Document clustering plays an important role in data mining systems. Recently, a flocking-based document clustering algorithm has been proposed to solve the problem through simulation resembling the flocking behavior of birds in nature. This method is superior to other clustering algorithms, including k-means, in the sense that the outcome is not sensitive(More)
Radiologists disagree with each other over the characteristics and features of what constitutes a normal mammogram and the terminology to use in the associated radiology report. Recently, the focus has been on classifying abnormal or suspicious reports, but even this process needs further layers of clustering and gradation, so that individual lesions can be(More)
We describe a method for indexing and retrieving high-resolution image regions in large geospatial data libraries. An automated feature extraction method is used that generates a unique and specific structural description of each segment of a tessellated input image file. These tessellated regions are then merged into similar groups and indexed to provide(More)