Alok Garg

Learn More
Continued device scaling enables microprocessors and other systems-on-chip (SoCs) to increase their performance, functionality, and hence, complexity. Simultaneously, relentless scaling, if uncompensated, degrades the performance and signal integrity of on-chip metal interconnects. These systems have therefore become increasingly communications-limited. The(More)
[11] A. Stroele, " BIST patter generators using addition and subtraction operations , " J. A concurrent built-in self-test architecture based on a self-testing RAM, " IEEE Trans. An on-chip march pattern generator for testing embedded memory cores, " IEEE Trans. Diagnostic data compression techniques for embedded memories with built-in self-test, " J.(More)
An efficient mechanism to track and enforce memory dependences is crucial to an out-of-order microprocessor. The conventional approach of using cross-checked load queue and store queue, while very effective in earlier processor incarnations, suffers from scalability problems in modern high-frequency designs that rely on buffering many in-flight instructions(More)
There is a critical need to securely store, manage, share and analyze massive amounts of complex (e.g., semi-structure and unstructured) data to determine patterns and trends in order to improve the quality of healthcare, better safeguard the nation and explore alternative energy. Because of the critical nature of the applications, it is important that(More)
In high-end processors, increasing the number of in-flight instructions can improve performance by overlapping useful processing with long-latency accesses to the main memory. Buffering these instructions requires a tremendous amount of microarchitectural resources. Unfortunately, large structures negatively impact processor clock speed and energy(More)
Buffering more in-flight instructions in an out-of-order microprocessor is a straightforward and effective method to help tolerate the long latencies generally associated with off-chip memory accesses. One of the main challenges of buffering a large number of instructions, however, is the implementation of a scalable and efficient mechanism to detect memory(More)
Optimizing the common case has been an adage in decades of processor design practices. However, as the system complexity and optimization techniques' sophistication have increased substantially , maintaining correctness under all situations, however unlikely, is contributing to the necessity of extra conservatism in all layers of the system design. The(More)
—This letter presents a new optical interconnect system for intrachip communications based on free-space optics. It provides all-to-all direct communications using dedicated lasers and photodetectors, hence avoiding packet switching while offering ultra-low latency and scalable bandwidth. A technology demonstration prototype is built on a circuit board(More)
While a canonical out-of-order engine can effectively exploit implicit parallelism in sequential programs, its effectiveness is often hindered by instruction and data supply imperfections manifested as branch mispredictions and cache misses. Accurate and deep look-ahead guided by a slice of the executed program is a simple yet effective approach to mitigate(More)