Optimal Shuffle Code with Permutation Instructions
@article{Buchwald2015OptimalSC, title={Optimal Shuffle Code with Permutation Instructions}, author={Sebastian Buchwald and Manuel Mohr and Ignaz Rutter}, journal={ArXiv}, year={2015}, volume={abs/1504.07073} }
During compilation of a program, register allocation is the task of mapping program variables to machine registers. During register allocation, the compiler may introduce shuffle code, consisting of copy and swap operations, that transfers data between the registers. Three common sources of shuffle code are conflicting register mappings at joins in the control flow of the program, e.g, due to if-statements or loops; the calling convention for procedures, which often dictates that input…
One Citation
Resource-aware Programming in a High-level Language - Improved performance with manageable effort on clustered MPSoCs
- Computer Science, Political Science
- 2018
Bis 2001 bedeutete Moores und Dennards Gesetz eine Verdoppelung der Ausfuhrungszeit alle 18 Monate durch verbesserte CPUs.
Heute ist Nebenlaufigkeit das dominante Mittel zur Beschleunigung von…
References
SHOWING 1-10 OF 13 REFERENCES
Register allocation for programs in SSA form
- Computer ScienceCC
- 2006
A novel register allocation architecture for programs in SSA-form is presented which simplifies register allocation significantly and a heuristic methods for spilling and coalescing are compared to an optimal method based on integer linear programming.
On the Complexity of Register Coalescing
- Computer ScienceInternational Symposium on Code Generation and Optimization (CGO'07)
- 2007
This paper is devoted to the complexity of the coalescing phase, in particular in the light of recent developments on the SSA form, and almost completely classify the NP-completeness of these problems, discussing also on the structure of the interference graph.
Copy coalescing by graph recoloring
- Computer SciencePLDI '08
- 2008
A coalescing technique designed for, but not limited to, SSA-form register allocation that improves upon two long-standing inconveniences of graph coloring register allocation by exploiting that a valid coloring can be easily obtained by an Ssa-based register allocator.
A Fast Cutting-Plane Algorithm for Optimal Coalescing
- Computer ScienceCC
- 2007
This work provides the first optimal solutions for a benchmark called "Optimal Coalescing Challenge", i.e., the ILP model outperforms previous approaches and is used to assess the quality of well-known heuristics.
Live-range unsplitting for faster optimal coalescing
- Computer ScienceLCTES '09
- 2009
This paper presents some theoretical properties that give rise to an algorithm for reducing interference graphs that preserves the optimality of coalescing and provides all the optimal solutions of the optimal coalescing challenge, including the three instances that were previously unsolved.
Sorting of Permutations by Cost-Constrained Transpositions
- Computer ScienceIEEE Transactions on Information Theory
- 2012
The algorithms in this paper represent a combination of Viterbi-type algorithms and graph-search techniques for minimizing the cost of individual transpositions, and dynamic programing algorithms for finding minimum cost decompositions of cycles.
Sorting by reversals is difficult
- MathematicsRECOMB '97
- 1997
We prove that the problem of sorting a permutation by the minimum number of reversals is NP-hard, thus answering a major question on the complexity of a problem which has widely been studied in the…
Permutation Group Algorithms
- Mathematics, Computer Science
- 2003
This paper presents an overview of black-box groups, a library of nearly linear time algorithms, and large-base groups, which are examples of permutation groups used for generating strong generating sets.
Hardware acceleration for programs in SSA form
- Computer Science2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)
- 2013
This paper proposes a processor architecture extension to provide register file permutations by which the shuffle code can be implemented more efficiently, and finds that using this extension, the number of executed instructions is reduced by up to 5.1 % while the compilation time is unaffected.
e k be the edges of the cycle K. First, observe that G − e i is a tree for i = 1, . . . , k. Hence, we can compute each table T G−ei
- e k be the edges of the cycle K. First, observe that G − e i is a tree for i = 1, . . . , k. Hence, we can compute each table T G−ei