A general constraint-centric scheduling framework for spatial architectures

@article{Nowatzki2013AGC,
  title={A general constraint-centric scheduling framework for spatial architectures},
  author={Tony Nowatzki and Michael Sartin-Tarm and Lorenzo De Carli and Karthikeyan Sankaralingam and Cristian Estan and Behnam Robatmili},
  journal={Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation},
  year={2013}
}
Specialized execution using spatial architectures provides energy efficient computation, but requires effective algorithms for spatially scheduling the computation. Generally, this has been solved with architecture-specific heuristics, an approach which suffers from poor compiler/architect productivity, lack of insight on optimality, and inhibits migration of techniques between architectures. Our goal is to develop a scheduling framework usable for all spatial architectures. To this end, we… 
Constraint centric scheduling guide
TLDR
This paper describes the generalized spatial scheduling framework, formulated with Integer Linear Programming, and summarizes results on the application to three real architectures, demonstrating the technique's practicality and competitiveness with existing schedulers.
DynaSpAM: Dynamic spatial architecture mapping using Out of Order instruction schedules
TLDR
The insight behind DynaSpAM is that today's powerful OOO processors do for themselves most of the work necessary to produce a highly optimized mapping for a spatial architecture, including aggressively speculating control and memory dependences, and scheduling instructions using a large window.
Towards Higher Performance and Robust Compilation for CGRA Modulo Scheduling
TLDR
This article decomposes the CGRA MS problem into the temporal and spatial mapping problem and reorganizes the processes inside these two problems to provide a comprehensive and systematic mapping flow that includes a powerful buffer allocation algorithm, and efficient interconnection & computational constraints solving algorithms.
CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
TLDR
CoSA leverages the regularities in DNN operators and hardware to formulate the DNN scheduling space into a mixed-integer programming (MIP) problem with algorithmic and architectural constraints, which can be solved to automatically generate a highly efficient schedule in one shot.
GenMap: A Genetic Algorithmic Approach for Optimizing Spatial Mapping of Coarse-Grained Reconfigurable Architectures
TLDR
GenMap is proposed, an application mapping framework using multiobjective optimization based on a genetic algorithm so that users can set optimization criteria as needed and provides aggressive power optimization using the dynamic power model and leakage minimization technique.
Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesign
TLDR
This work explores the codesign of scheduling algorithms with a challenging-to-schedule programmable accelerator, and shows it can improve its area by 35% by trimming its scheduling-friendly structures, using a scheduling algorithm that is 5× faster than the state-of-the-art optimization-based scheduler.
CGRA MODULO SCHEDULING FOR ACHIEVING BETTER PERFORMANCE AND INCREASED EFFICIENCY
  • Siva Sankara Phani.T
  • Computer Science
    Turkish Journal of Computer and Mathematics Education (TURCOMAT)
  • 2021
TLDR
A fast-stable algorithm for spatial mapping with a retransmission and rearrangement mechanism that addresses the algorithms of the time mapping problem with a powerful buffer algorithm and efficient connection and calculation limitations.
Generic Connectivity-Based CGRA Mapping via Integer Linear Programming
TLDR
This paper proposes to derive connectivity information from an otherwise generic device model, and use this to create simpler ILPs, which are combined in an iterative schedule and retain most of the exactness of a fully-generic ILP approach.
Scaling distributed cache hierarchies through computation and data co-scheduling
TLDR
Novel monitoring hardware is developed that enables fine-grained space allocation on large caches, and data movement support to allow frequent full-chip reconfigurations, and outperforms state-of-the-art NUCA schemes under different thread scheduling policies.
Optimizing the Efficiency of Data Transfer in Dataflow Architectures
  • Yujing Feng, Taoran Xiang, Zhimin Tang
  • Computer Science
    2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
  • 2018
TLDR
A cost-efficient hardware mechanism to dynamically detect the imbalances in different directions in each router and adaptively reallocate resources in the bottleneck router is proposed and evaluation results suggest that the approach is an effective improvement for the efficiency of data transfer in dataflow processors.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 56 REFERENCES
A spatial path scheduling algorithm for EDGE architectures
TLDR
This paper describes a compiler scheduling algorithm called spatial path scheduling that factors in previously fixed locations - called anchor points - for each placement and augments this basic algorithm with three heuristics: local and global ALU and network link contention modeling, and dependence chain path reservation.
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures
TLDR
Experiments on a wide variety of compute-intensive loops from the multimedia domain show that EMS improves throughput by 25% over traditional iterative modulo scheduling, and achieves 98% of the throughput of simulated annealing techniques at a fraction of the compilation time.
A Decomposition-based Constraint Optimization Approach for Statically Scheduling Task Graphs with Communication Delays to Multiprocessors
TLDR
A decomposition strategy to speed up constraint optimization for a representative multiprocessor scheduling problem and iteratively learns constraints to prune the solution space in the manner of Benders decomposition is presented.
Tartan: evaluating spatial computation for whole program execution
TLDR
The initial investigation reveals that a hierarchical RF architecture, designed around a scalable interconnect, is instrumental in harnessing the benefits of spatial computation and can provide an order of magnitude improvement in energy-delay compared to an aggressive superscalar core on single-threaded workloads.
Instruction scheduling for a tiled dataflow architecture
TLDR
This paper develops a parameterizable instruction scheduler that more effectively optimizes this trade-off and determines the contention-latency sweet spot that generates the best instruction schedule for each application.
Efficient formulation for optimal modulo schedulers
TLDR
A more efficient formulation of the modulo scheduling space that significantly decreases the execution time of solvers based on integer linear programs and indicates that significantly larger loops can be scheduled under realistic machine constraints.
Synthesis Algorithm for Application-Specific Homogeneous Processor Networks
TLDR
A novel framework is employed, similar to that of technology mapping in the logic synthesis domain, and a set of efficient algorithms are developed for efficient generation of the multiprocessor architecture with application specific optimized latency, which is latency optimal for directed acyclic task graphs.
Orchestrating the execution of stream programs on multicore platforms
TLDR
A compiler technique for planning and orchestrating the execution of streaming applications on multicore platforms and a generalized code generation template for mapping the software pipeline onto the Cell architecture is presented.
Some efficient solutions to the affine scheduling problem. I. One-dimensional time
  • P. Feautrier
  • Computer Science
    International Journal of Parallel Programming
  • 2005
TLDR
This paper deals with the problem of finding closed form schedules as affine or piecewise affine functions of the iteration vector and presents an algorithm which reduces the scheduling problem to a parametric linear program of small size, which can be readily solved by an efficient algorithm.
Space-time scheduling of instruction-level parallelism on a raw machine
Increasing demand for both greater parallelism and faster clocks dictate that future generation architectures will need to decentralize their resources and eliminate primitives that require single
...
1
2
3
4
5
...