Estimating the overlap between dependent computations for automatic parallelization

  title={Estimating the overlap between dependent computations for automatic parallelization},
  author={Paul Bone and Zolt{\'a}n Somogyi and Peter Schachte},
  journal={Theory and Practice of Logic Programming},
  pages={575 - 591}
Abstract Researchers working on the automatic parallelization of programs have long known that too much parallelism can be even worse for performance than too little, because spawning a task to be run on another CPU incurs overheads. Autoparallelizing compilers have therefore long tried to use granularity analysis to ensure that they only spawn off computations whose cost will probably exceed the spawn-off cost by a comfortable margin. However, this is not enough to yield good results, because… 
Controlling loops in parallel mercury code
A novel program transformation is presented that greatly increases the speedups the authors can get from parallel Mercury programs, and also allows recursive calls inside parallel conjunctions to take advantage of tail recursion optimization.
Automatic Parallelism in Mercury
This work concentrates on building a profiler-feedback automatic parallelization system for Mercury that creates programs with very good parallel performance with as little help from the programmer as possible.
AND Parallelism for ILP: The APIS System
This work proposes the APIS (And ParallelISm for ILP) system that uses results from Logic Programming AND-parallelism and defines a new type of redundancy (Coverage-equivalent redundancy) that enables the prune of significant parts of the search space.
Profiling parallel Mercury programs with ThreadScope
A proposal for a tool for profiling the parallel execution of Mercury programs, an adaptation and extension of the ThreadScope profiler that was first built to help programmers visualize the execution of parallel Haskell programs is presented.
Parallel Algorithms for Multirelational Data Mining: Application to Life Science Problems
This chapter presents a survey on parallel approaches to run Inductive Logic Programming (ILP), a flavor of multirelational algorithms, and analyzes different scheduling approaches for those implementations and describes two applications where the proposed approaches may be very useful.


Feedback directed implicit parallelism
An automated way of using spare CPU resources within a shared memory multi-processor or multi-core machine to squeeze extra performance out of the threads of an already-parallel program or out of a program that has not yet been parallelized.
A Methodology for Granularity-Based Control of Parallelism in Logic Programs
This paper describes a methodology whereby the granularity of parallel tasks is efficiently estimated and used to limit parallelism so that the effect of such overheads is controlled.
Minimizing the overheads of dependent {AND}-parallelism
This work presents a program transformation for implementing dependent AND-parallelism in logic programming languages that uses mode information to add synchronization code only to the variable accesses that actually need it.
Implicit parallelism with ordered transactions
The programming model of IPOT and an online tool that recommends boundaries of ordered transactions by observing a sequential execution are described and it is demonstrated that the method is effective in identifying opportunities for fine-grain parallelization.
Distance: A New Metric for Controlling Granularity for Parallel Execution
It is argued in this paper that the estimation of task complexity, on its own, is not an ideal metric for improving the performance of parallel programs through granularity control, and a new metric for measuring granularity is presented, based on a notion of emphdistance.
Automatic Compile-time Parallelization of Prolog Programs for Dependent And-Parallelism
This paper presents a static analysis technique based on abstract interpretation that detects (fruitful) dependent and-parallelism in Prolog programs and implements a prototype compiler that incorporates these ideas.
Annotation Algorithms for Unrestricted Independent And-Parallelism in Logic Programs
Two new algorithms which perform automatic parallelization via source-to-source transformations which use as targets new parallel execution primitives which are simpler and more flexible than the well-known &/2 parallel operator are presented.
Non-strict independence-based program parallelization using sharing and freeness information
Runtime support for multicore Haskell
This work quantitatively explores some of the complex design tradeoffs that make such implementations hard to build, and describes just such an implementation.