A High-Performance Toolkit for Fast Exact Algorithms


Ongoing work is described in which fast graph algorithms are combined with parallel, grid and reconfigurable technologies. This synergistic strategy can help solve problem instances too large or too difficult for standard techniques. Target problems need only be amenable to reduction and decomposition. ∗This research is supported in part by the National Science Foundation under grants EIA–9972889 and CCR–0075792, by the Office of Naval Research under grant N00014–01–1–0608, by the Department of Energy under contract DE–AC05–00OR22725, and by the Tennessee Center for Information Technology Research under award E01–0178–081. Overview We describe work in progress that combines novel algorithmic methods with powerful platforms and supporting infrastructure. We employ these emergent tools and technologies to launch systematic attacks on problems of significance. Preliminary results show considerable promise, often reducing runtimes from days to seconds, and bringing us ever closer to solving problems previously viewed as hopelessly out of reach. Exemplar For brevity, we restrict our attention to the clique problem. Clique is probably one of the best known NP-complete problems, with relevance in a variety of applications. In bioinformatics, for example, clique has utility in toolchains for phylogeny, microarray analysis and SELDI (surface enhanced laser desorption/ionization). Here researchers seek to discover a large clique of highly correlated protein sequences, DNA samples or biomarkers. We solve this problem by asking instead for a small vertex cover (G has a clique of size at least n − k if and only if G has a vertex cover of size at most k.) Fast Exact Algorithms Our vertex cover algorithms exploit reduction and decomposition. During reduction, we condense an arbitrarily difficult instance into its combinatorial core. It has long been known that, if a cover is present, removing vertices whose degree exceeds k reduces G to a graph of size at most k [2]. More complex techniques rely on linear programming relaxation [8, 13]. We have fine-tuned and implemented these and a number of more recent ideas [1], culminating in a suite of polynomial-time routines that yield cores of size 2k or less. When reduction is complete, the core is ready for decomposition. Decomposition is challenging, because the solution space that must be searched typically holds an exponential number of candidates. For this we use a tree to structure the search for a satisfying cover. Each internal node of the tree represents a choice. For example, one might make the choice at the root by selecting an arbitrary vertex, v. The left (right) subtree may then denote the set of all solutions in which v is to be in (not in) the cover. For problems like vertex cover that are fixed-parameter tractable, reduction and decomposition are often termed kernelization and branching, respectively. Reduction and decomposition work equally well for many problems that are not fixed-parameter tractable. See, for example, the work reported in [14] for the hitting set problem. Resources We complement the algorithmic engine described above with parallel machines, gridware and, when needed, hardware acceleration. Parallelization works well with decomposition. The spawning of processes is structured by the tree used to explore the core’s search space. Once spawned, however, these tasks are left to run in a virtually unstructured manner. Neither barrier synchronization nor MPI-like tools are required. Suppose, for example, that 32 processors are available. Decomposition will use the first 5 (<< k) levels of its tree to split the input into 32 subgraphs, one for each processor. In turn, each processor will, in parallel, examine its subgraph using the search tree technique. Almost any architectural model will do. We have run initial experiments on several different platforms including NOWs, SMPs, and near-random confederations of motley machines. We have also tried assorted grid middleware, including NetSolve [6], Condor [11] and Globus [7]. Our best results have generally been obtained with minimal intervention, however, in the extreme case by launching naked secure shells (SSHs). For recalcitrant subproblems, we aim to gain additional acceleration through the use of reconfigurable hardware [9]. We have recently brought on line a cluster with 12 Unix CAD workstations and eight Pilchard boards developed by Philip Leong’s research group [10]. We can access these directly, or through the use of the program description file mechanism of NetSolve. Each board fits into the DIMM slot of a Linux box, in this manner greatly reducing FPGA-CPU I/O latency. We are currently prototyping, synthesizing and testing VHDL versions of our codes.

Cite this paper

@inproceedings{AbuKhzam2007AHT, title={A High-Performance Toolkit for Fast Exact Algorithms}, author={Faisal N. Abu-Khzam and Michael A. Langston and Pushkar Shanbhag}, year={2007} }