Balancing Programmability and Silicon Efficiency of Heterogeneous Multicore Architectures
Modern embedded Systems-on-a-Chip deploy multiple programmable cores to meet increasing performance requirements of video, graphics, and modem applications. However, software implementations of task scheduling and inter-task synchronization often limit performance improvements of multicores. Remarkably, several demanding video applications (e.g. H.264 video decoding) rely on task dependency graphs that can be constructed from a simple dependency pattern. Based on such a pattern, our novel hardware task scheduler can quickly create, order, synchronize and map tasks to cores. We found that our hardware task scheduler speeds up a Quad HD H.264 video decoding by 1.17 times compared to a chip multi-processor with a state-of-the-art hardware task queues. Moreover, our hardware task scheduler allows decreasing the number of cores needed to meet the real-time performance requirements for the H.264 decoder and, consequently, reduces the silicon area of the multicore by up to 12.5%.