Larrabee: A many-Core x86 architecture for visual computing

@article{Seiler2008LarrabeeAM,
  title={Larrabee: A many-Core x86 architecture for visual computing},
  author={Larry Seiler and Douglas M. Carmean and Eric Sprangle and Tom Forsyth and Michael Abrash and Pradeep K. Dubey and Stephen Junkins and Adam T. Lake and Jeremy Sugerman and Robert Cavin and Roger Espasa and Edward T. Grochowski and Toni Juan and Pat Hanrahan},
  journal={2008 IEEE Hot Chips 20 Symposium (HCS)},
  year={2008},
  pages={1-30}
}
This article consists of a collection of slides from the author's conference presentation. Some of the topics discussed include: architecture convergence; Larrabee architecture; and graphics pipeline. 
Efficient Processing and Delivery of Multimedia Data
TLDR
Novel approaches to improve the overall efficiency of the stack by tailoring software design to hardware properties, as well as optimize systems by exploiting workload characteristics using learning-based approaches are presented.
MacroSS: macro-SIMDization of streaming applications
TLDR
MacroSS is introduced, which is capable of performing macro-SIMDization on high-level streaming graphs, and low-overhead architectural modifications that accelerate shuffling of data elements between the scalar and vectorized parts of a streaming program.
Putting ' p ' in RabbitCT-Fast CT Reconstruction Using a Standardized Benchmark
Computational architectures and processors are an ever-changing eld of research and development. Standardized and representable problem-dependent tests are required to nd the optimal design of a
State-of-the-art in heterogeneous computing
TLDR
An overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs).
Ray casting of multiple volumetric datasets with polyhedral boundaries on manycore GPUs
TLDR
This work presents a new GPU-based rendering system for ray casting of multiple volumes which provides interactive frame rates when concurrently rendering more than 50 arbitrarily overlapping volumes on current graphics hardware.
CMA: Chip multi-accelerator
TLDR
This paper shows that starting from a baseline description of several SDR waveforms and candidate tasks for acceleration, it is able to map the different waveforms on the heterogeneous multi-accelerator architecture while keeping a logical view of a regular multi-core architecture, thus simplifying the mapping of the waveforms onto the multi-ACcelerator.
Using program behaviour to exploit heterogeneous multi-core processors
TLDR
The findings of this work demonstrate that a runtime system with a homogeneous virtual machine interface can reduce the challenge of application development for HMA processors, whilst still being able to exploit such a processor by taking program behaviour into account.
Devices and architectures for photonic chip-scale integration
TLDR
This paper presents a design study for a many-core architecture called Corona which utilizes dense wavelength division multiplexing (DWDM) for on- and off-chip communication together with the devices which will be needed to implement such a communication infrastructure.
Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs
TLDR
This paper re-examines two popular join algorithms to determine if the latest computer architecture trends shift the tide that has favored hash join for many years and offers multicore implementations of hash join and sort-merge join which consistently outperform all previously reported results.
Monolithically Integrating Non-Volatile Main Memory over the Last-Level Cache
TLDR
This article exploits non-volatile memories employing 3D crosspoint subarrays, such as resistive RAM (ReRAM), and integrates them over the CPU’s last-level cache (LLC), and develops a streamlined LLC/main memory interface that employs a single shared internal interconnect for both the cache and main memory arrays.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 46 REFERENCES
Convergence of Recognition, Mining, and Synthesis Workloads and Its Implications
TLDR
A diverse set of emerging RMS applications from market segments like graphics, gaming, media-mining, unstructured information management, financial analytics, and interactive virtual communities presents a relatively focused, highly overlapping set of common platform challenges.
Scalable parallel programming with CUDA
Presents a collection of slides covering the following topics: CUDA parallel programming model; CUDA toolkit and libraries; performance optimization; and application development.
A Survey of General‐Purpose Computation on Graphics Hardware
TLDR
This report describes, summarize, and analyzes the latest research in mapping general‐purpose computation to graphics hardware.
Future Proof Data Parallel Algorithms and Software on Intel Multicore Architecture
TLDR
This paper describes how Ct is designed for minimal effort by the developer, while providing forward scaling on multi-core IA, and describes how a sampling of key application spaces can be easily written using Ct to achieve high performance.
Intel Performance Libraries MultiCoreReady Software for Numeric Intensive Computation
TLDR
This paper discusses some of the methods used to improve performance that largely focus on cache utilization and minimization of table look-aside buffer (TLB) misses and discusses how this concept of ease of use will be expanded to provide more flexibility in the use of the library without greatly expanding its size.
Intel threading building blocks - outfitting C++ for multi-core processor parallelism
TLDR
This guide explains how to maximize the benefits of multi-core chips through a portable C++ library that works on Windows, Linux, Macintosh, and Unix systems, and reveals the gotchas in TBB.
Multi-fragment effects on the GPU using the k-buffer
TLDR
The goal of this work is to demonstrate the large number of graphics algorithms that the k-buffer enables and that the efficiency is superior to current multipass approaches.
Physical simulation for animation and visual effects: parallelization and characterization for chip multiprocessors
TLDR
This work studies a set of three workloads that exemplify the span and complexity of physical simulation applications used in a production environment: fluid dynamics, facial animation, and cloth simulation, which are computationally demanding and can benefit greatly from the acceleration possible with large scale CMPs.
Practical logarithmic rasterization for low-error shadow maps
TLDR
The rasterizer is modified to support rendering to a nonuniform grid with the same watertight rasterization properties as current rasterizers, and a depth compression scheme is described to handle the nonlinear primitives produced by logarithmic rasterized.
Ray-Triangle Intersection Algorithm for Modern CPU Architectures
We present an algorithm for determining if a ray intersects a triangle interior; and computing intersection point parameters as well as distance of intersection in response to the ray intersecting a
...
1
2
3
4
5
...