# A Fast Quartet tree heuristic for hierarchical clustering

@article{Cilibrasi2011AFQ, title={A Fast Quartet tree heuristic for hierarchical clustering}, author={Rudi Cilibrasi and Paul M. B. Vit{\'a}nyi}, journal={Pattern Recognit.}, year={2011}, volume={44}, pages={662-677} }

The Minimum Quartet Tree Cost problem is to construct an optimal weight tree from the 3(n4) weighted quartet topologies on n objects, where optimality means that the summed weight of the embedded quartet topologies is optimal (so it can be the case that the optimal tree embeds all quartets as nonoptimal topologies). We present a Monte Carlo heuristic, based on randomized hill-climbing, for approximating the optimal weight tree, given the quartet topology weights. The method repeatedly…

## Figures and Topics from this paper

## 33 Citations

Improved metaheuristics for the quartet method of hierarchical clustering

- Computer Science, MathematicsJ. Glob. Optim.
- 2020

This basic greedy heuristic is able to improve consistently the performance of popular quartet clustering algorithms in the literature, namely a reduced variable neighbourhood search and a simulated annealing metaheuristic, producing novel efficient solution approaches to the MQTC problem.

An exact algorithm for the minimum quartet tree cost problem

- Computer Science4OR
- 2019

The aim of this paper is to present a first exact solution approach for the minimum quartet tree cost problem and it can be used as a benchmark for validating the performance of any heuristic proposed for the MQTC problem.

Improved Variable Neighbourhood Search Heuristic for Quartet Clustering

- Computer ScienceICVNS
- 2018

The solution approach substantially improves the performance of a Reduced Variable Neighborhood Search for the MQTC problem and proposes a basic greedy heuristic that is characterized by a very high speed and some interesting implementation details.

On the minimum quartet tree cost problem

- Computer Science, MathematicsArXiv
- 2018

Details and formulation of this novel challenging problem, and the preliminaries of an exact algorithm under current development which may be useful to improve the MQTC heuristics to date into more efficient hybrid approaches are provided.

Fast Whole-Genome Phylogeny of the COVID-19 Virus SARS-CoV-2 by Compression

- BiologybioRxiv : the preprint server for biology
- 2020

The compression method is simpler and possibly faster than any other whole genome method, which makes it the ideal tool to explore phylogeny and treat the question whether Pangolins are involved in the SARS-CoV-2 virus.

Pythagorean Fuzzy Clustering Analysis: A Hierarchical Clustering Algorithm with the Ratio Index‐Based Ranking Methods

- Mathematics, Computer ScienceInt. J. Intell. Syst.
- 2018

A general type of distance measure for Pythagorean fuzzy numbers (PFNs) and a novel ratio index‐based ranking method of PFNs that can address the clustering problems in which the weights of criteria are not given precisely in advance and are expressed by PFNs and IVPFNs are proposed.

Recent Experiences in Parameter-Free Data Mining

- Computer ScienceISCIS
- 2010

The results are very promising and show that one can obtain an (almost) perfect clustering for all the problems studied, and with respect to the standard compressors bzlip, ppmd, and zlib.

Content driven clustering algorithm combining density and distance functions

- Computer SciencePattern Recognit.
- 2019

The algorithm presented in this contribution outperforms other well-known algorithms, with which it is compared to, in the majority of the datasets used.

Combining attribute content and label information for categorical data ensemble clustering

- Computer ScienceAppl. Math. Comput.
- 2020

A new ensemble clustering framework for categorical data is proposed, in which the information matrix considers label information and original data information together and is instantiated into the ALM matrix, which takes account of not only the distribution of attribute content in each ensemble member, but also the relationship among ensemble members based on the distribution.

Compression-Based Similarity

- Mathematics, Computer Science2011 First International Conference on Data Compression, Communications and Processing
- 2011

This work considers pair-wise distances for literal objects consisting of finite binary files, taken to contain all of their meaning, like genomes or books, and derives a similarity or relative semantics between names for objects.

## References

SHOWING 1-10 OF 90 REFERENCES

Integer linear programming as a tool for constructing trees from quartet data

- Mathematics, Computer ScienceComput. Biol. Chem.
- 2005

This work proposes to use inner products of rate-matrix diagonals calculated for pairs of taxa and presents the trees resulting from applying this approach to two data sets of up to 36 mitochondrial sequences of mammals including an outgroup.

Constructing Phylogenies from Quartets: Elucidation of Eutherian Superordinal Relationships

- Medicine, MathematicsJ. Comput. Biol.
- 1998

Two new approaches for constructing phylogenetic trees are presented, based on geometric ideas and dynamic programming, and it is guaranteed to find the optimal tree (with respect to the given quartets).

A Polynomial Time Approximation Scheme for Inferring Evolutionary Trees from Quartet Topologies and Its Application

- Mathematics, Computer ScienceSIAM J. Comput.
- 2000

This paper presents a polynomial time approximation scheme (PTAS) for recombining the inferred quartet topologies optimally and a new technique, called quartet cleaning, that detects and corrects errors in the set Q with performance guarantees.

Heuristic Approaches for the Quartet Method of Hierarchical Clustering

- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2010

It is shown that the Reduced Variable Neighborhood Search heuristic is the most effective approach to the problem, obtaining high-quality solutions in short computational running times.

Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies

- Biology
- 1996

A versatile method, quartet puzzling, is introduced to reconstruct the topology (branching pattern) of a phylogenetic tree based on DNA or amino acid sequence data and outperforms neighbor joining in some cases with high transition/transversion bias.

Short Quartet Puzzling: A New Quartet-Based Phylogeny Reconstruction Algorithm

- Biology, Computer ScienceJ. Comput. Biol.
- 2008

This study presents Short Quartet Puzzling, a new quartet-based phylogeny reconstruction algorithm, and demonstrates the improved topological accuracy of the new method over maximum parsimony and neighbor joining, disproving the conjecture of Ranwez and Gascuel.

Quartet methods for phylogeny reconstruction from gene orders

- Biology
- 2005

The quartet-based method can handle more genomes than the base version of GRAPPA, thus enabling the number of levels of recursion in DCM-GRAPPA to be reduced, but is more sensitive to the rate of evolution, with error rates rapidly increasing when saturation is approached.

Quartet Cleaning: Improved Algorithms and Simulations

- Computer ScienceESA
- 1999

In this paper, two efficient algorithms for correcting bounded numbers of quartet errors are presented and these "quartet cleaning" algorithms are shown to be optimal in that no algorithm can correct more Quartet errors.

Performance of Supertree Methods on Various Data Set Decompositions

- Computer Science
- 2004

This study shows that the techniques used for dividing the data set into subproblems as well as those used for merging them into a single solution influence the quality of the supertree construction strongly, especially on the more challenging data sets.

Quartet Supertrees

- 2004

We introduce two supertree methods that produce unrooted supertrees from unrooted input trees. The methods assemble supertrees from a weighted quartet (four-taxon) tree representation of the input…