Learn More
As GPU becomes an integrated component in handheld devices like smartphones, we have been investigating the opportunities and limitations of utilizing the ultra-low-power GPU in a mobile platform as a general-purpose accelerator, similar to its role in desktop and server platforms. The special focus of our investigation has been on mobile GPU’s role for(More)
This paper presents an efficient VLSI architecture design of MPEG-4 shape coding, which is the key technology for supporting the content-based functionality of the MPEG-4 Video standard. The real-time constraint of MPEG-4 shape coding leads to a heavy computational bottleneck on today’s computer architectures. To overcome this problem, design analysis and(More)
The Graphics Processor Unit (GPU) has expanded its role from an accelerator for rendering graphics into an efficient parallel processor for general purpose computing. The GPU, an indispensable component in desktop and server-class computers as well as game consoles, has also become an integrated component in handheld devices, such as smartphones. Since the(More)
A modern mobile application processor is a heterogeneous multi-core SoC which integrates CPU and application-specific accelerators such as GPU and DSP. It provides opportunity to accelerate other compute-intensive applications, yet mapping an algorithm to such a heterogeneous platform is not a straightforward task and has many design decisions to make. In(More)
Modern smartphones use heterogeneous multi-core SoC which includes CPU, GPU, DSP and various applicationspecific accelerators. It provides opportunities to realize compute-intensive applications on a battery-powered and resource-limited mobile device by assigning each sub-task to the most suitable computing core. To meet the performance requirement with(More)
This paper presents an efficient architecture of binary motion estimation (BME) for MPEG-4 shape coding. This architecture, called DDBME, mainly consists of a data dispatch based 1-D systolic array and a 16×32 bit search range buffer. In DDBME, bit parallelism technique is applied on the SAD calculation of block matching algorithm. In order to support(More)
We believe that by adapting architectures to fit the requirements of a given application domain, we can significantly improve the efficiency of computation. To validate the idea for our application domain, we evaluate a wide spectrum of commodity computing platforms to quantify the potential benefits of heterogeneity and customization for the(More)
In this paper, we present a scalable module-based architecture for block matching motion estimation algorithm of MPEG-4. The basic module comprises one set of processing elements based on one-dimensional systolic array architecture. To support various applications, modules of processing elements can be configured to form the processing element array to meet(More)
As GPU becomes an integrated component in handheld devices like smartphones, we have been investigating the opportunities and limitations of utilizing the ultra-low-power GPU in a mobile platform as a general-purpose accelerator, similar to its role in desktop and server platforms. The special focus of our investigation has been on mobile GPU's role for(More)
Face annotation makes it easy to share and manage digital photos and videos. While state-of-the-art face recognition algorithms can achieve high accuracy to support automatic face annotation, their implementations on an embedded platform cannot achieve real-time performance due to the demanding computational requirement. However, the availability of an(More)