Ichiro Kuroda

Learn More
This paper describes a new hardware/software co-verification method for System-On–a-Chip, based on the integration of a C/C++ simulator and an inexpensive FPGA emulator. Communication between the simulator and emulator occurs via a flexible interface based on shared communication registers. This method enables easy debugging, rich portability, and high(More)
In this paper, extended instructions for the advanced encryption standard (AES) cryptography acceleration in embedded processors and efficient implementation of these instructions are presented. These AES instructions generate four elements in single-instruction, multiple-data format from each input of an AES state. The instruction count for 128-bit key AES(More)
Reed–Solomon (RS) coders are used for error-control coding in many applications such as digital audio, digital TV, software radio, CD players, and wireless and satellite communications. Traditionally, RS coders have been implemented using dedicated hardware. This paper considers software-based implementation of RS codecs. A hardware–software codesign(More)
This paper presents an implementation of a fast twodimensional inverse Discrete Cosine Transform (IDCT) with multimedia instructions for a software MPEG2 decoder. IDCT algorithms for sparse blocks which eliminate the calculation for zero coefficients are realized by using multimedia instructions. To reduce the cycle count for IDCT, an adaptive control(More)
Recently, micro-processors with enhanced SIMD instructions have become increasingly popular. However, as automatic extraction of parallelism from conventional sequential C programs is still difficult, so far no effective compiler that can generate efficient code sequences based on the use of these SIMD instructions has been developed. This paper first(More)
We have developed a new-generation, general-purpose digital signal processor (DSP) core with low power dissipation for use in third-generation (3G) mobile terminals. The DSP core employs a 4-way VLIW (very long instruction word) approach, as well as a dual-multiply-accumulate (dual-MAC) architecture with good orthogonality. It is able to perform both video(More)
This paper describes an analysis-based method for optimizing the timing of decisions regarding early termination of block matching (BM) in the application of a successive similarity detection algorithm (SSDA). Although the SSDA reduces BM computational costs, making decisions to terminate BM or not consumes additional processor cycles. Here, total costs,(More)