A Set-Covering Approach with Column Generation for Parsimony Haplotyping


We introduce an exact algorithm, based on Integer Linear Programming, for the parsimony haplotyping problem (PHP). The PHP uses molecular data and is aimed at the determination of a smallest set of haplotypes that explain a given set of genotypes. Our approach is based on a Set Covering formulation of the problem, solved by branch and bound with both columnand rowgeneration. Existing ILP methods for the PHP suffer from the large size of the solution space, when the genotypes are long and with many heterozygous sites. Our approach, on the other hand, is based on an effective implicit representation of the solution space, and allows the solution of both real-data and simulated instances which are very hard to solve for other ILPs.

DOI: 10.1287/ijoc.1080.0285

