Reflections on kernelizing and computing unrooted agreement forests

@article{Wersch2022ReflectionsOK,
  title={Reflections on kernelizing and computing unrooted agreement forests},
  author={Rim van Wersch and Steven M. Kelk and Simone Linz and Georgios Stamoulis},
  journal={Annals of Operations Research},
  year={2022},
  volume={309},
  pages={425-451}
}
Phylogenetic trees are leaf-labelled trees used to model the evolution of species. Here we explore the practical impact of kernelization (i.e. data reduction) on the NP-hard problem of computing the TBR distance between two unrooted binary phylogenetic trees. This problem is better-known in the literature as the maximum agreement forest problem, where the goal is to partition the two trees into a minimum number of common, non-overlapping subtrees. We have implemented two well-known reduction… 
Deep kernelization for the Tree Bisection and Reconnnect (TBR) distance in phylogenetics
TLDR
A kernel of size 9k − 8 is described for the NP-hard problem of computing the Tree Bisection and Reconnect distance k between two unrooted binary phylogenetic trees by extending the existing portfolio of reduction rules with three novel new reduction rules.
Convex characters, algorithms and matchings
TLDR
This work shows how combining the enumeration of convex characters with existing parameterised algorithms can be used to speed up exponential-time algorithms for the maximum agreement forest problem in phylogenetics, and re-visits the quantity g2(T), defined as the number of conveX characters on T in which each state appears on at least 2 taxa.
Sharp Upper and Lower Bounds on a Restricted Class of Convex Characters
TLDR
For every k ≥ 3 topological neutrality no longer holds, tree topologies achieving the maximum and minimum values of gk are described and corresponding expressions and exponential bounds are determined.

References

SHOWING 1-10 OF 43 REFERENCES
A tight kernel for computing the tree bisection and reconnection distance between two phylogenetic trees
TLDR
This work reanalyse Allen and Steel's kernelization algorithm and proves that the reduced instances will in fact have at most 15k-9 taxa, and introduces and uses "unrooted generators" which are analogues of rooted structures that have appeared earlier in the phylogenetic networks literature.
Reduction rules for the maximum parsimony distance on phylogenetic trees
On the fixed parameter tractability of agreement-based phylogenetic distances
TLDR
New analyses are presented showing that the use of the “cluster reduction” rule—already defined for the hybridization number and the rSPR distance and introduced here for the TBR distance—can transform any algorithm for solving three important measures of dissimilarity in phylogenetic trees into an O(f(k)·n)-time one.
SPR Distance Computation for Unrooted Trees
TLDR
The method is a heuristic version of a fixed parameter tractability (FPT) approach and the running time behaves similar to FPT algorithms, and was able to quickly compute dSPR for the majority of trees that were part of a study of LGT in 144 prokaryotic genomes.
Computing Maximum Agreement Forests without Cluster Partitioning is Folly
TLDR
The experiments show that cluster partitioning leads to substantial performance improvements for kernelization-based M(A)AF algorithms, and the kernel is often much smaller in practice than the theoretical worst case, but not small enough to fully explain the good performance of these algorithms.
Subtree Transfer Operations and Their Induced Metrics on Evolutionary Trees
TLDR
The problem of computing the minimum number of TBR operations required to transform one tree to another can be reduced to a problem whose size is a function just of the distance between the trees, and thereby establish that the problem is fixed-parameter tractable.
Supertrees Based on the Subtree Prune-and-Regraft Distance
TLDR
This work successfully constructed an SPR supertree from a phylogenomic dataset of 40,631 gene trees that covered 244 genomes representing several major bacterial phyla, and allowed direct inference of highways of gene transfer between bacterial classes and genera.
New Reduction Rules for the Tree Bisection and Reconnection Distance
TLDR
Five new reduction rules are proposed and shown to be the first reduction rules that strictly enhance the reductive power of the subtree and chain reduction rules.
Extremal Distances for Subtree Transfer Operations in Binary Trees
TLDR
It is shown that for a pair of leaf-labelled binary trees with n leaves, the maximum number of such moves required to transform one into the other is n-Theta (n-Θ(n), extending a result of Ding, Grünewald, and Humphries.
Calculating the Unrooted Subtree Prune-and-Regraft Distance
TLDR
A “progressive A*” search algorithm is developed using multiple heuristics, including the TBR and replug distances, to exactly compute the unrooted SPR distance, which is nearly two orders of magnitude faster than previous methods on small trees.
...
...