Learn More
The propagation delay across long on-chip buses is increasingly becoming a limiting factor in high-speed designs. Crosstalk between adjacent wires on the bus may create a significant portion of this delay. Placing a shield wire between each signal wire alleviates the crosstalk problem but doubles the area used by the bus, an unacceptable consequence when(More)
DSP architectures typically provide indirect addressing modes with autoincrement and decrement. In addition, indexing mode is generally not available, and there are usually few, if any, general-purpose registers. Hence, it is necessary to use address registers and perform address arithmetic to access automatic variables. Subsuming the address arithmetic(More)
System-level design issues become critical as implementation technology evolves towards increasingly complex integrated circuits and the time-to-market pressure continues relentlessly. To cope with these issues, new methodologies that emphasize re-use at all levels of abstraction are a " must " , and this is a major focus of our work in the Gigascale(More)
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to(More)
Recent developments in programmable, highly parallel Graphics Processing Units (GPUs) have enabled high performance implementations of machine learning algorithms. We describe a solver for Support Vector Machine training running on a GPU, using the Sequential Minimal Optimization algorithm and an adaptive first and second order working set selection(More)
Communication-based design represents a formal method approach to of system-on-a-chip design that considers communication between components as important as the computations they perform. “Our network-on-chip&rdqo ; approach partitions the communication into layers to maximize reuse and provide a programmer with an abstraction of the underlying(More)
Convolutional Neural Networks (CNNs) can provide accurate object classification. They can be extended to perform object detection by iterating over dense or selected proposed object regions. However, the runtime of such detectors scales as the total number and/or area of regions to examine per image, and training such detectors may be prohibitively slow.(More)
—Global interconnect is commonly regarded as a key potential bottleneck to the advancing performance of high-speed integrated circuits. Our previous work has suggested that local interconnect effects can be managed through a deep submicron design hierarchy that uses 50 000 to 100 000 gate modules as primitive building blocks. The primary goal of this paper(More)