Learn More
GPUs for numerical computations are becoming an attractive alternative in research. In this paper, we propose a new parallel processing environment for matrix multiplications by using both CPUs and GPUs. The execution time of matrix multiplications can be decreased to 40.1% by our method, compared with using the fastest of either CPU only case or GPU only(More)
We present an alternative algorithm of fully decentralized resource discovery in Grid computing, which enables the sharing, selection, and aggregation of a wide variety of geographically distributed computational resources. Our algorithm is based on a simply unicast request transmission that can be easily implemented. The addition of a reservation algorithm(More)
With the rapid advances in semiconductor process technology and microarchitecture, the speed gap between the clock cycle time of processor cores and that of memory systems has increased significantly. To solve this problem, memory system should be efficiently managed. In general, since a thread with fewer requests has a large capability to improve the total(More)
This paper evaluates the effect of an auto-tuning facility with the user's knowledge for numerical software. We proposed a new software architecture framework, named FIBER, to generalize auto-tuning facilities and obtain highly accurate estimated parameters. The FIBER framework also provides a loop-unrolling function and an algorithm selection function to(More)
Many-core processors which have thousands of cores on a chip will be realized. We developed an infrastructure which accelerates the research and development of such many-core processors. This paper describes three main elements provided by our infrastructure. The first element is the definition of simple many-core processor architecture called M-Core. The(More)
Network-on-Chip (NoC) has become the de facto on-chip communication architecture for many-core systems. This paper proposes novel methods for emulating large-scale NoC designs on a single FPGA. Since FPGAs offer a highly parallel platform, FPGA-based emulation can be much faster than the software-based approach. However, emulating NoC designs with up to(More)