Feature Relevance in Ward’s Hierarchical Clustering Using the Lp Norm

@article{Amorim2015FeatureRI,
  title={Feature Relevance in Ward’s Hierarchical Clustering Using the Lp Norm},
  author={Renato Cordeiro de Amorim},
  journal={Journal of Classification},
  year={2015},
  volume={32},
  pages={46-62}
}
  • R. C. D. Amorim
  • Published 1 April 2015
  • Mathematics, Computer Science
  • Journal of Classification
In this paper we introduce a new hierarchical clustering algorithm called Wardp. Unlike the original Ward, Wardp generates feature weights, which can be seen as feature rescaling factors thanks to the use of the Lp norm. The feature weights are cluster dependent, allowing a feature to have different degrees of relevance at different clusters.We validate our method by performing experiments on a total of 75 real-world and synthetic datasets, with and without added features made of uniformly… Expand
A-Wardpβ: Effective hierarchical clustering using the Minkowski metric and a fast k-means initialisation
TLDR
An anomalous pattern initialisation method for hierarchical clustering algorithms, called A-Ward, capable of substantially reducing the time they take to converge, and a variant of Ward more capable of dealing with noise in data sets, are introduced. Expand
A Clustering-Based Approach to Reduce Feature Redundancy
TLDR
This paper introduces an unsupervised feature selection method that can be used in the data pre-processing step to reduce the number of redundant features in a data set and finds that this method selects features that produce better cluster recovery, without the need for an extra user-defined parameter. Expand
Feature weighting methods: A review
TLDR
A global taxonomy for Feature Weighting methods is proposed by focusing on: the learning approach (supervised or unsupervised), the methodology used to calculate the weights, and the feedback obtained from the ML algorithm when estimating the weights. Expand
A Hybrid Clustering Approach for Bag-of-Words Image Categorization
  • Hui Huang, Y. Ma
  • Computer Science
  • Mathematical Problems in Engineering
  • 2019
TLDR
A hybrid clustering approach that combines improved hierarchical clustering with a K-means algorithm that outperforms the conventional BoW model in terms of categorization and demonstrates the feasibility and effectiveness of the approach. Expand
Ultrametric Fitting by Gradient Descent
TLDR
The proposed framework sheds new light on the way to design a new generation of hierarchical clustering methods by leveraging the simple, yet effective, idea of replacing the ultrametric constraint with a min-max operation injected directly into the cost function. Expand
rCOSA: A Software Package for Clustering Objects on Subsets of Attributes
TLDR
rCOSA is a software package interfaced to the R language that extends the original COSA software by adding functions for hierarchical clustering methods, least squares multidimensional scaling, partitional clustering, and data visualization. Expand
2D–EM clustering approach for high-dimensional data through folding feature vectors
TLDR
The design of 2D–EM algorithm enables it to handle a diverse set of challenging biomedical dataset and cluster with higher accuracy than established methods, and build confidence in the methods ability to uncover novel disease subtypes in new datasets. Expand
A novel heuristic algorithm to solve penalized regression-based clustering model
TLDR
A novel heuristic algorithm is proposed to solve the reformulated model of PRClust, which needs only n × n - 1 / 2 scalar slack variables, which are much less than those of DC-CD and DC-ADMM, and updates them using a simple equation in each iteration of the algorithm. Expand
An improved frequency based agglomerative clustering algorithm for detecting distinct clusters on two dimensional dataset
TLDR
Experimental result shows that the DAAC is suitable for instinctively identifying the K distinct clusters over the different two dimensional datasets with higher intra thickness and lesser intra separation than existing techniques. Expand
A brief survey of unsupervised agglomerative hierarchical clustering schemes
Unsupervised hierarchical clustering process is a mathematical model or exploratory tool aims to provide the easiest way to categorize the distinct groups over the large volume of real timeExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 52 REFERENCES
Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering
TLDR
The Minkowski metric based method is experimentally validated on datasets from the UCI Machine Learning Repository and generated sets of Gaussian clusters, and appears to be competitive in comparison with other K-Means based feature weighting algorithms. Expand
Feature Selection as a Preprocessing Step for Hierarchical Clustering
TLDR
Analysis of the particular beneets that feature selection may provide in hierarchical clustering tasks and the power of feature selection methods applied as a prepro-cessing step under the proposed dimensions suggest thatfeature selection as preprocessing only provides limited improvements in the performance task. Expand
Optimal Variable Weighting for Ultrametric and Additive Trees and K-means Partitioning: Methods and Software
TLDR
A new computer program, OVW, which is available to researchers as freeware, implements improved algorithms for optimal variable weighting for ultrametric and additive tree clustering, and includes a new algorithm for optimal Variable Weighting for K-means partitioning. Expand
Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method
TLDR
A hierarchical clustering method that minimizes a joint between-within measure of distance between clusters, by defining a cluster distance and objective function in terms of Euclidean distance, or any power of Euclidesan distance in the interval (0,2). Expand
A preliminary study of optimal variable weighting in k-means clustering
Recently, algorithms for optimally weighting variables in non-hierarchical and hierarchical clustering methods have been proposed. Preliminary Monte Carlo research has shown that at least one ofExpand
Unsupervised Feature Selection Using Feature Similarity
TLDR
An unsupervised feature selection algorithm suitable for data sets, large in both dimension and size, based on measuring similarity between features whereby redundancy therein is removed, which does not need any search and is fast. Expand
Automated variable weighting in k-means type clustering
TLDR
A new step is introduced to the k-means clustering process to iteratively update variable weights based on the current partition of data and a formula for weight calculation is proposed, and the convergency theorem of the new clustered process is given. Expand
Weighting Features for Partition around Medoids Using the Minkowski Metric
TLDR
This paper shows that MW-PAM, particularly when initialized with the Build algorithm (also using the Minkowski metric), is superior to other medoid-based algorithms in terms of both accuracy and identification of irrelevant features. Expand
Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion?
TLDR
The survey work and case studies will be useful for all those involved in developing software for data analysis using Ward’s hierarchical clustering method. Expand
Some methods for classification and analysis of multivariate observations
The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to giveExpand
...
1
2
3
4
5
...