Learn More
DSP architectures typically provide indirect addressing modes with autoincrement and decrement. In addition, indexing mode is generally not available, and there are usually few, if any, general-purpose registers. Hence, it is necessary to use address registers and perform address arithmetic to access automatic variables. Subsuming the address arithmetic(More)
System-level design issues become critical as implementation technology evolves towards increasingly complex integrated circuits and the time-to-market pressure continues relentlessly. To cope with these issues, new methodologies that emphasize re-use at all levels of abstraction are a " must " , and this is a major focus of our work in the Gigascale(More)
Recent developments in programmable, highly parallel Graphics Processing Units (GPUs) have enabled high performance implementations of machine learning algorithms. We describe a solver for Support Vector Machine training running on a GPU, using the Sequential Minimal Optimization algorithm and an adaptive first and second order working set selection(More)
Dense and accurate motion tracking is an important requirement for many video feature extraction algorithms. In this paper we provide a method for computing point trajectories based on a fast parallel implementation of a recent optical flow algorithm that tolerates fast motion. The parallel implementation of large displacement optical flow runs about 78×(More)
Communication-based design represents a formal method approach to of system-on-a-chip design that considers communication between components as important as the computations they perform. “Our network-on-chip&rdqo ; approach partitions the communication into layers to maximize reuse and provide a programmer with an abstraction of the underlying(More)
Modern parallel microprocessors deliver high performance on applications that expose substantial fine-grained data parallelism. Although data parallelism is widely available in many computations, implementing data parallel algorithms in low-level languages is often an unnecessarily difficult task. The characteristics of parallel microprocessors and the(More)
—Global interconnect is commonly regarded as a key potential bottleneck to the advancing performance of high-speed integrated circuits. Our previous work has suggested that local interconnect effects can be managed through a deep submicron design hierarchy that uses 50 000 to 100 000 gate modules as primitive building blocks. The primary goal of this paper(More)