Learn More
Impulse is a new memory system architecture that adds two important features to a traditional memory controller. First, Impulse supports application-specific optimizations through configurable physical address remapping. By remapping physical addresses, applications control how their data is accessed and cached, improving their cache and bus utilization.(More)
We present a brute-force ray tracing system for interactive volume visualization. The system runs on a conventional (distributed) shared-memory multiprocessor machine. For each pixel we trace a ray through a volume to compute the color for that pixel. Although this method has high intrinsic computational cost, its simplicity and scalability make it ideal(More)
Impulse is a memory system architecture that adds an optional level of address indirection at the memory controller. Applications can use this level of indirection to remap their data structures in memory. As a result, they can control how their data is accessed and cached, which can improve cache and bus utilization. The Impulse design does not require any(More)
We have developed a new way to look at real-time and embedded software: as a collection of execution environments created by a hierarchy of schedulers. Common sched-ulers include those that run interrupts, bottom-half handlers , threads, and events. We have created algorithms for deriving response times, scheduling overheads, and blocking terms for tasks in(More)
Recognizing speech, gestures, and visual features are important interface capabilities for future embedded mobile systems. Unfortunately, the real-time performance requirements of complex perception applications cannot be met by current embedded processors and often even exceed the performance of high performance microprocessors whose energy consumption far(More)
Because irregular applications have unpredictable memory access patterns, their performance is dominated by memory behavior. The Impulse conngurable memory controller will enable signiicant performance improvements for irregular applications, because it can be con-gured to optimize memory accesses on an application-by-application basis. In this paper we(More)
The key to crafting an eeective scalable parallel computing system lies in minimizing the delays imposed by the system. Of particular importance are communications delays, since parallel algorithms must communicate frequently. The communication delay is a system-imposed latency. The existence of relatively inexpensive high performance workstations and(More)
Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can(More)
Synchronization is a crucial operation in many parallel applications. Conventional synchronization mechanisms are failing to keep up with the increasing demand for efficient synchronization operations as systems grow larger and network latency increases. The contributions of this paper are threefold. First, we revisit some representative synchronization(More)