Learn More
Fast and high-quality document clustering algorithms play an important role in effectively navigating, summarizing, and organizing information. Recent studies have shown that partitional clustering algorithms are more suitable for clustering large datasets. However, the K-means algorithm, the most commonly used partitional clustering algorithm, can only(More)
5 6 Abstract 7 Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, 8 we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses sto-9 chastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike(More)
In this paper, we propose a new term weighting scheme called Term Frequency – Inverse Corpus Frequency (TF-ICF). It does not require term frequency information from other documents within the document collection and thus, it enables us to generate the document vectors of N streaming documents in linear time. In the context of a machine learning application,(More)
In the real world, we have to frequently deal with searching for and tracking an optimal solution in a dynamic environment. This demands that the algorithm not only find the optimal solution but also track the trajectory of the solution in a dynamic environment. Particle Swarm Optimization (PSO) is a population-based stochastic optimization technique, which(More)
We intend to demonstrate that if the business model cannot adjust to new technology, by recognizing a) its limitations, b) the ability of the organization to control it, and c) by adjusting its deadlines to take advantage of the methodology potentials, it is unlikely that an investment in the technology will result in real productivity benefits. As software(More)
Analyzing and clustering large scale data set is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. One limitation of this method of data clustering is its complexity O(n 2). As the number of data and feature dimensions grows, it becomes increasingly difficult to generate results in(More)
We describe a method for indexing and retrieving high-resolution image regions in large geospatial data libraries. An automated feature extraction method is used that generates a unique and specific structural description of each segment of a tessellated input image file. These tessellated regions are then merged into similar groups and indexed to provide(More)
How to organize and classify large amounts of heterogeneous information accessible over the Internet is a major problem faced by industry, government, and military organizations. XML is clearly a potential solution to this problem, [1,2] however, a significant challenge is how to automatically convert information currently expressed in a standard HTML(More)
Latent Semantic Analysis (LSA) can be used to reduce the dimensions of large Term-Document datasets using Singular Value Decomposition. However, with the ever expanding size of data sets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. The Graphics Processing Unit (GPU) can solve some highly parallel(More)