Kyuseung Han

Learn More
Predication is an essential technique to accelerate kernels with control flow on CGRAs. While state-based full predication (SFP) can remove wasteful power consumption on issuing/decoding instructions from conventional full predication, generating code for SFP is challenging for general CGRAs, especially when there are multiple conditionals to be handled due(More)
Coarse-grained reconfigurable architecture typically has an array of processing elements which are controlled by a centralized unit. This makes it difficult to execute programs having control divergence among PEs without predication. However, conventional predication techniques have a negative impact on both performance and power consumption due to longer(More)
Coarse-grained reconfigurable array is a very attractive architecture from the viewpoint of performance and flexibility. However, because the performance improvement is achieved by exploiting parallelism, the architecture is typically poor at handling control flow, which is sequential in nature. There have been many attempts to overcome this problem by(More)
It has been one of the most fundamental challenges in architecture design to achieve high performance with low power while maintaining flexibility. Parallel architectures such as coarse-grained reconfigurable architecture, where multiple PEs are tightly coupled with each other, can be a viable solution to the problem. However, the PEs are typically(More)
This paper demonstrates a chip implementation of coarse-grained reconfigurable architecture named FloRA. Two-dimensional array of integer processing elements in the FloRA is configured in run-time to perform integer functions as well as floating-point functions. FloRA is implemented in Dongbu HiTek 130nm process and evaluate by running applications(More)
Coarse-grained reconfigurable architectures have drawn increasing attention due to their merits in performance and flexibility. Typically, they have many processing elements in the form of an array, which is suitable for implementing spatial redundancy used for fault-tolerant systems design. This paper presents a purely software-level approach to(More)
In this paper, we propose a coarse-grained reconfigurable architecture, which supports both integer type application domain and floating-point type application domain. Our coarse-grained reconfigurable architecture has an 8x8 array of integer processing elements to execute 64 integer operations or 32 floating-point operations in parallel. In order to show(More)
Coarse-Grained Reconfigurable Architectures (CGRAs) are suitable for accelerating data-intensive applications in embedded systems due to high performance and power efficiency. However, as application programs become complex having more control flows in them, it becomes harder to accelerate such programs on CGRAs. Previous researches on this issue have(More)
Binary acceleration of a kernel on an accelerator may have a data duplication problem. Some data in an address range may be copied into the local memory of the accelerator incurring data copy overhead as well as a coherence problem. Configurable Range Memory (CRM) is a memory shared by the host processor and the accelerator, which can specify its own(More)