Texture Caches
@article{Doggett2012TextureC, title={Texture Caches}, author={Michael C. Doggett}, journal={IEEE Micro}, year={2012}, volume={32}, pages={136-141} }
This column examines the texture cache, an essential component of modern GPUs that plays an important role in achieving real-time performance when generating realistic images. GPUs have many components and the texture cache is only one of them. But it has a real impact on the performance of the GPU if rasterization and memory tiling are set up correctly.
21 Citations
Architecture Design for a Four-Way Pipelined Parallel Texture Engine
- Computer Science2017 International Conference on Computer Systems, Electronics and Control (ICCSEC)
- 2017
A dedicated hardware architecture of texture engine for 3D graphics engine based on OpenGL 3.0 and GLSL 1.40 with a number of novel features, such as an optimized full-purpose four-way pipelined Parallel texel data formatters and filters, multi-port multi-bank non-blocking texture cache.
Efficient management of last-level caches in graphics processors for 3D scene rendering workloads
- Computer Science2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
- 2013
This paper characterize the intra-stream and inter-stream reuses in 52 frames captured from eight DirectX game titles and four DirectX benchmark applications and proposes graphics stream-aware probabilistic caching (GSPC) that dynamically learns the reuse probabilities and accordingly manages the LLC of the GPU.
An efficient GPU approach for designing 3D cultural heritage information systems
- Computer Science
- 2020
GPUpd: A Fast and Scalable Multi-GPU Architecture Using Cooperative Projection and Distribution
- Computer Science2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
- 2017
GPUpd is proposed, a novel multi-GPU architecture for fast and scalable split frame rendering (SFR) and introduces a new graphics pipeline stage called Cooperative Projection & Distribution (C-PD) where all GPUs cooperatively project 3D objects to 2D screen and effciently redistribute the objects to their corresponding GPUs.
Quantifying the NUMA Behavior of Partitioned GPGPU Applications
- Computer ScienceGPGPU@ASPLOS
- 2019
A framework that allows analyzing the internal communication behavior of GPGPU applications, consisting of an open-source memory tracing plugin for Clang/LLVM, and a simple communication model based on summaries of a kernel's memory accesses that allows reasoning about virtual bandwidth-limited communication paths between NUMA nodes using different partitioning strategies is introduced.
Romou: rapidly generate high-performance tensor kernels for mobile GPUs
- Computer ScienceMobiCom
- 2022
A mobile-GPU-specific kernel compiler Romou is proposed, which supports the unique hardware feature in kernel implementation, and prunes inefficient ones against hardware resources, and can thus rapidly generate high-performance kernels.
Reviewing GPU architectures to build efficient back projection for parallel geometries
- Computer ScienceJournal of Real-Time Image Processing
- 2019
This article builds a performance model to find hardware hotspots and proposes several optimizations to balance the load between texture engine, computational and special function units, as well as different types of memory maximizing the utilization of all GPU subsystems in parallel.
GPU-based implementation of an optimized nonparametric background modeling for real-time moving object detection
- Computer ScienceIEEE Transactions on Consumer Electronics
- 2013
This paper presents a novel real-time implementation of an optimized spatio-temporal nonparametric moving object detection strategy that features smart cooperation between a computer/device's Central and Graphics Processing Units and extensive usage of the texture mapping and filtering units of the latter, including a novel method for fast evaluation of Gaussian functions.
AVR: Reducing Memory Traffic with Approximate Value Reconstruction
- Computer ScienceICPP
- 2019
Approximate Value Reconstruction (AVR) reduces the memory traffic of applications that tolerate approximations in their dataset improving significantly system performance and energy efficiency and supports the compression scheme maximizing its effect and minimizing its overheads.
MemSZ: Squeezing Memory Traffic with Lossy Compression
- Computer ScienceACM Trans. Archit. Code Optim.
- 2020
MemSZ introduces a low latency, parallel design of the Squeeze (SZ) algorithm offering aggressive compression ratios, up to 16:1 in the authors' implementation, and improves the execution time, energy, and memory traffic by up to 15%, 9%, and 64%, respectively.
References
SHOWING 1-10 OF 26 REFERENCES
The Design And Analysis Of A Cache Architecture For Texture Mapping
- Computer ScienceConference Proceedings. The 24th Annual International Symposium on Computer Architecture
- 1997
The use of texture image caches are proposed to alleviate the above bottlenecks, and indicate that caching is a promising approach to designing memory systems for texture mapping.
Fermi GF100 GPU Architecture
- Computer ScienceIEEE Micro
- 2011
The Fermi GF100 is a GPU architecture that provides several new capabilities beyond the Nvidia GT200 or Tesla architecture, including tessellation, physics processing, and computational graphics.
Prefetching in a texture cache architecture
- Computer ScienceWorkshop on Graphics Hardware
- 1998
This paper introduces a prefetching texture cache architecture designed to take advantage of the access characteristics of texture mapping, and demonstrates that even in the presence of a high-latency memory system, this architecture can attain at least 97% of the performance of a zerolatency memory systems.
Rise of the Graphics Processor
- Computer ScienceProceedings of the IEEE
- 2008
This work examines some of this evolution of hardware to accelerate graphics processing operations, looks at the structure of a modern GPU, and discusses how graphics processing exploits this structure and how nongraphical applications can take advantage of this capability.
Larrabee: A Many-Core x86 Architecture for Visual Computing
- Computer ScienceIEEE Micro
- 2009
The Larrabee many-core visual computing architecture uses multiple in-order x86 cores augmented by wide vector processor units, together with some fixed-function logic. This increases the…
Demystifying GPU microarchitecture through microbenchmarking
- Computer Science2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS)
- 2010
This work develops a microbechmark suite and measures the CUDA-visible architectural characteristics of the Nvidia GT200 (GTX280) GPU, exposing undocumented features that impact program performance and correctness.
Reducing shading on GPUs using quad-fragment merging
- Computer ScienceACM Trans. Graph.
- 2010
It is found that a fragment-shading pipeline with this optimization is competitive with the REYES pipeline approach of shading at micropolygon vertices and, in cases of complex occlusion, can perform up to two times less shading work.
Hardware for Superior Texture Performance
- Computer ScienceWorkshop on Graphics Hardware
- 1995
This work will focus on the use of a specific compression scheme for texture mapping, which allows theuse of a very simple and fast decompression hardware, bringing high performance texture mapping to low-cost systems.
Neon: a single-chip 3D workstation graphics accelerator
- Computer ScienceWorkshop on Graphics Hardware
- 1998
High-performance 3D graphics accelerators traditionally require multiple chips on multiple boards, including geometry, rasterizing, pixel processing, and texture mapping chips. These designs are…
Larrabee: A many-Core x86 architecture for visual computing
- Art2008 IEEE Hot Chips 20 Symposium (HCS)
- 2008
This article consists of a collection of slides from the author's conference presentation. Some of the topics discussed include: architecture convergence; Larrabee architecture; and graphics pipeline.