Learn More
B+ tree structured index searches are one of the fundamental database operations and hence, accelerating them is essential. GPUs provide a compelling mix of performance per watt and performance per dollar, and thus are an attractive platform for accelerating B+ tree searches. However, tree search on discrete GPUs presents significant challenges for(More)
The emergence of die-stacking technology with mixed logic and memory processes has brought about a renaissance in “processing in memory” (PIM) concepts, first envisioned decades ago. For some, the PIM concept conjures an image of a complete processing unit (e.g., CPU, GPU) integrated directly with memory, perhaps on a logic chip 3D-stacked under one or more(More)
As computation becomes increasingly limited by data movement and energy consumption, exploiting locality throughout the memory hierarchy becomes critical for maintaining the performance scaling that many have come to expect from the computing industry. Moving computation closer to main memory presents an opportunity to reduce the overheads associated with(More)
Accelerating breadth-first search (BFS) can be a compelling value-add given its pervasive deployment. The current state-of-the-art hybrid BFS algorithm selects different traversal directions based on graph properties, thereby, possessing heterogeneous characteristics. Related work has studied this heterogeneous BFS algorithm on homogeneous processors. In(More)
The cell broadband engine provides the first implementation of a chip multiprocessor with a significant number of general-purpose programmable cores targeting a broad set of workloads. Open source software played a critical role in the development of the cell software stack. The system includes a power architecture processor and eight attached processor(More)
In this paper we provide a detailed description of a Field Programmable Gate Array (FPGA) based reconfigurable system which has been used in the development of a System on a Chip (SoC) processor and corresponding applications targeted for network computing appliances. The complexity of the processor, in terms of number of hardware threads (64), integrated(More)
Traditional use of software and hardware simulators and emulators has been in efforts for chip level analysis and verification. However, prototyping and bringup requirements often demands system or platform level integration and analysis requiring new uses of these traditional pre-silicon methods along with novel interpretations of existing hardware to(More)
In this paper we have described the two-sided 200 &#x00D7; 35 cm<sup>2</sup> cold cathode used in the LASL dual-beam module 2.5 kJ short pulse laser. The characteristics may be summarized as follows: a) Electron emission begins on both sides within ~ 0.1 &#x03BC;s of application of the 50 ns rise diode voltage. b) The currents from each side of the gun are(More)
In this paper, we propose ExtraV, a framework for nearstorage graph processing. It is based on the novel concept of graph virtualization, which efficiently utilizes a cache-coherent hardware accelerator at the storage side to achieve performance and flexibility at the same time. ExtraV consists of four main components: 1) host processor, 2) main memory, 3)(More)
  • 1