Heuristic algorithms for the Maximum Colorful Subtree problem

  title={Heuristic algorithms for the Maximum Colorful Subtree problem},
  author={Kai D{\"u}hrkop and Marie Lataretu and W. Timothy J. White and Sebastian B{\"o}cker},
In metabolomics, small molecules are structurally elucidated using tandem mass spectrometry (MS/MS); this resulted in the computational Maximum Colorful Subtree problem, which is NP-hard. Unfortunately, data from a single metabolite requires us to solve hundreds or thousands of instances of this problem; and in a single Liquid Chromatography MS/MS run, hundreds or thousands of metabolites are measured. Here, we comprehensively evaluate the performance of several heuristic algorithms for the… 

Figures from this paper

Bayesian networks for mass spectrometric metabolite identification via molecular fingerprints
A scoring that takes into account dependencies between molecular properties using machine learning and improves identification rates of CSI:FingerID by 2.85 percentage points is presented.
Computational methods for small molecule identification
SIRIUS, a tool for the structural elucidation of small molecules with tandem mass spectrometry, is developed and it is demonstrated that the machine learning model outperforms all other methods for this task, including its predecessor FingerID.
De Novo Molecular Formula Annotation and Structure Elucidation Using SIRIUS 4.
How to leverage the full potential of SIRIUS 4, how to incorporate it into your own workflow, and how it adds value to the analysis of mass spectrometry data beyond spectral library search are described.
The Maximum Colorful Arborescence problem parameterized by the structure of its color hierarchy graph
An O*(3^{nhs}) time algorithm is provided for solving MCA, where nhs is the number of vertices of indegree at least two in H, thereby improving the O* (3^{|C|}) algorithm from [Bocker et al. 2008].
On the Maximum Colorful Arborescence Problem and Color Hierarchy Graph Structure
There exists an O(3 ∗ H) time algorithm for solving MCA, where nH is the number of vertices of indegree at least two in H(G), thereby improving the O( 3) algorithm from Böcker et al.
SIRIUS 4: Turning tandem mass spectra into metabolite structure information
Mass Spectrometry is one of the two predominant experimental techniques in metabolomics and related elds, but structural elucidation remains highly challenging. A BLAST-like computational tool for
Computational Methods and Data Analysis for Metabolomics
The workflow of metabolomics is explained in the sequence of data processing, quality control, metabolite annotation, statistical analysis, pathway analysis, and multi-omics integration.


Finding Maximum Colorful Subtrees in Practice
New heuristics and an exact algorithm for this Maximum Colorful Subtree problem are introduced and evaluated against existing algorithms on real-world and artificial datasets and can help determine molecular formulas based on fragmentation trees.
Algorithmic Aspects of the Maximum Colorful Arborescence Problem
This paper introduces a more precise model of Maximum Colorful Arborescence (MCA), and extensively study it in terms of algorithmic complexity, showing that exploiting the implied color hierarchy of the input graph can lead to polynomial algorithms.
Searching molecular structure databases with tandem mass spectra using CSI:FingerID
This work presents CSI:FingerID, which combines fragmentation tree computation and machine learning for searching molecular structure databases using tandem MS data of small molecules, and is shown to improve on the competing methods for computational metabolite identification by a considerable margin.
Speedy Colorful Subtrees
Fragmentation trees are a technique for identifying molecular formulas and deriving some chemical properties of metabolites—small organic molecules—solely from mass spectral data by finding exact solutions to the NP-hard Maximum Colorful Subtree problem.
Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification
It is shown that CFM can be used in both a MS/MS spectrum prediction task (ie, predicting the mass spectrum from a chemical structure), and in a putative metabolite identification task (ranking possible structures for a target MS/ MS spectrum).
Automatic chemical structure annotation of an LC-MS(n) based metabolic profile from green tea.
A novel method to automatically process and annotate the LC-MS(n) data sets on the basis of candidate molecules from chemical databases, such as PubChem or the Human Metabolite Database, illustrating the potential to support systematic and untargeted metabolite identification.
Complexity issues in vertex-colored graph pattern matching
Metabolite identification through multiple kernel learning on fragmentation trees
This work combines fragmentation tree computations with kernel-based machine learning to predict molecular fingerprints and identify molecular structures, and introduces a family of kernels capturing the similarity of fragmentation trees, and combines these kernels using recently proposed multiple kernel learning approaches.
Improved metabolite identification with MIDAS and MAGMa through MS/MS spectral dataset-driven parameter optimization
It is demonstrated that large MS/MS metabolite spectral libraries can be used not only to validate and compare, but also to improve the methods, confirming MIDAS and MAGMa as the state-of-the-art freely available tools for metabolite identification.
Motif Search in Graphs: Application to Metabolic Networks
This work introduces a new definition of motif in the context of metabolic networks, and proposes instead to use an alternative definition based on the functional nature of the components that form the motif, which it is called a reaction motif.