A Fast Quartet tree heuristic for hierarchical clustering

@article{Cilibrasi2011AFQ,
  title={A Fast Quartet tree heuristic for hierarchical clustering},
  author={Rudi Cilibrasi and Paul M. B. Vit{\'a}nyi},
  journal={Pattern Recognit.},
  year={2011},
  volume={44},
  pages={662-677}
}
The Minimum Quartet Tree Cost problem is to construct an optimal weight tree from the 3(n4) weighted quartet topologies on n objects, where optimality means that the summed weight of the embedded quartet topologies is optimal (so it can be the case that the optimal tree embeds all quartets as nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized hill-climbing, for approximating the optimal weight tree, given the quartet topology weights. The method repeatedly… 
Improved metaheuristics for the quartet method of hierarchical clustering
TLDR
This basic greedy heuristic is able to improve consistently the performance of popular quartet clustering algorithms in the literature, namely a reduced variable neighbourhood search and a simulated annealing metaheuristic, producing novel efficient solution approaches to the MQTC problem.
An exact algorithm for the minimum quartet tree cost problem
TLDR
The aim of this paper is to present a first exact solution approach for the minimum quartet tree cost problem and it can be used as a benchmark for validating the performance of any heuristic proposed for the MQTC problem.
Improved Variable Neighbourhood Search Heuristic for Quartet Clustering
TLDR
The solution approach substantially improves the performance of a Reduced Variable Neighborhood Search for the MQTC problem and proposes a basic greedy heuristic that is characterized by a very high speed and some interesting implementation details.
On the minimum quartet tree cost problem
TLDR
Details and formulation of this novel challenging problem, and the preliminaries of an exact algorithm under current development which may be useful to improve the MQTC heuristics to date into more efficient hybrid approaches are provided.
Fast Whole-Genome Phylogeny of the COVID-19 Virus SARS-CoV-2 by Compression
TLDR
The compression method is simpler and possibly faster than any other whole genome method, which makes it the ideal tool to explore phylogeny and treat the question whether Pangolins are involved in the SARS-CoV-2 virus.
Pythagorean Fuzzy Clustering Analysis: A Hierarchical Clustering Algorithm with the Ratio Index‐Based Ranking Methods
  • Xiaolu Zhang
  • Mathematics, Computer Science
    Int. J. Intell. Syst.
  • 2018
TLDR
A general type of distance measure for Pythagorean fuzzy numbers (PFNs) and a novel ratio index‐based ranking method of PFNs that can address the clustering problems in which the weights of criteria are not given precisely in advance and are expressed by PFNs and IVPFNs are proposed.
Recent Experiences in Parameter-Free Data Mining
TLDR
The results are very promising and show that one can obtain an (almost) perfect clustering for all the problems studied, and with respect to the standard compressors bzlip, ppmd, and zlib.
Content driven clustering algorithm combining density and distance functions
TLDR
The algorithm presented in this contribution outperforms other well-known algorithms, with which it is compared to, in the majority of the datasets used.
Combining attribute content and label information for categorical data ensemble clustering
TLDR
A new ensemble clustering framework for categorical data is proposed, in which the information matrix considers label information and original data information together and is instantiated into the ALM matrix, which takes account of not only the distribution of attribute content in each ensemble member, but also the relationship among ensemble members based on the distribution.
Compression-Based Similarity
  • P. Vitányi
  • Mathematics, Computer Science
    2011 First International Conference on Data Compression, Communications and Processing
  • 2011
TLDR
This work considers pair-wise distances for literal objects consisting of finite binary files, taken to contain all of their meaning, like genomes or books, and derives a similarity or relative semantics between names for objects.
...
1
2
3
4
...

References

SHOWING 1-10 OF 90 REFERENCES
Integer linear programming as a tool for constructing trees from quartet data
TLDR
This work proposes to use inner products of rate-matrix diagonals calculated for pairs of taxa and presents the trees resulting from applying this approach to two data sets of up to 36 mitochondrial sequences of mammals including an outgroup.
Constructing Phylogenies from Quartets: Elucidation of Eutherian Superordinal Relationships
TLDR
Two new approaches for constructing phylogenetic trees are presented, based on geometric ideas and dynamic programming, and it is guaranteed to find the optimal tree (with respect to the given quartets).
A Polynomial Time Approximation Scheme for Inferring Evolutionary Trees from Quartet Topologies and Its Application
TLDR
This paper presents a polynomial time approximation scheme (PTAS) for recombining the inferred quartet topologies optimally and a new technique, called quartet cleaning, that detects and corrects errors in the set Q with performance guarantees.
Heuristic Approaches for the Quartet Method of Hierarchical Clustering
TLDR
It is shown that the Reduced Variable Neighborhood Search heuristic is the most effective approach to the problem, obtaining high-quality solutions in short computational running times.
Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies
TLDR
A versatile method, quartet puzzling, is introduced to reconstruct the topology (branching pattern) of a phylogenetic tree based on DNA or amino acid sequence data and outperforms neighbor joining in some cases with high transition/transversion bias.
Short Quartet Puzzling: A New Quartet-Based Phylogeny Reconstruction Algorithm
TLDR
This study presents Short Quartet Puzzling, a new quartet-based phylogeny reconstruction algorithm, and demonstrates the improved topological accuracy of the new method over maximum parsimony and neighbor joining, disproving the conjecture of Ranwez and Gascuel.
Quartet methods for phylogeny reconstruction from gene orders
TLDR
The quartet-based method can handle more genomes than the base version of GRAPPA, thus enabling the number of levels of recursion in DCM-GRAPPA to be reduced, but is more sensitive to the rate of evolution, with error rates rapidly increasing when saturation is approached.
Quartet Cleaning: Improved Algorithms and Simulations
TLDR
In this paper, two efficient algorithms for correcting bounded numbers of quartet errors are presented and these "quartet cleaning" algorithms are shown to be optimal in that no algorithm can correct more Quartet errors.
Performance of Supertree Methods on Various Data Set Decompositions
TLDR
This study shows that the techniques used for dividing the data set into subproblems as well as those used for merging them into a single solution influence the quality of the supertree construction strongly, especially on the more challenging data sets.
Quartet Supertrees
We introduce two supertree methods that produce unrooted supertrees from unrooted input trees. The methods assemble supertrees from a weighted quartet (four-taxon) tree representation of the input
...
1
2
3
4
5
...