Texture Caches

@article{Doggett2012TextureC,
  title={Texture Caches},
  author={Michael C. Doggett},
  journal={IEEE Micro},
  year={2012},
  volume={32},
  pages={136-141}
}
  • M. Doggett
  • Published 1 May 2012
  • Computer Science
  • IEEE Micro
This column examines the texture cache, an essential component of modern GPUs that plays an important role in achieving real-time performance when generating realistic images. GPUs have many components and the texture cache is only one of them. But it has a real impact on the performance of the GPU if rasterization and memory tiling are set up correctly. 
Architecture Design for a Four-Way Pipelined Parallel Texture Engine
TLDR
A dedicated hardware architecture of texture engine for 3D graphics engine based on OpenGL 3.0 and GLSL 1.40 with a number of novel features, such as an optimized full-purpose four-way pipelined Parallel texel data formatters and filters, multi-port multi-bank non-blocking texture cache.
Efficient management of last-level caches in graphics processors for 3D scene rendering workloads
TLDR
This paper characterize the intra-stream and inter-stream reuses in 52 frames captured from eight DirectX game titles and four DirectX benchmark applications and proposes graphics stream-aware probabilistic caching (GSPC) that dynamically learns the reuse probabilities and accordingly manages the LLC of the GPU.
Polyhedral Model Guided Automatic GPU Cache Exploitation Framework
We propose a compiler driven acceleration of parallel computations on GPUs by exploiting the various special varieties of caches (texture, surface and constant for NVIDIA GPUs). We show that our
An efficient GPU approach for designing 3D cultural heritage information systems
TLDR
This article describes a new architecture for 3D information systems that takes advantage of the inherent parallelism of the GPUs and detail the GPU algorithms required to edit these layers, allowing a level of detail independent of the resolution of the meshes.
GPUpd: A Fast and Scalable Multi-GPU Architecture Using Cooperative Projection and Distribution
TLDR
GPUpd is proposed, a novel multi-GPU architecture for fast and scalable split frame rendering (SFR) and introduces a new graphics pipeline stage called Cooperative Projection & Distribution (C-PD) where all GPUs cooperatively project 3D objects to 2D screen and effciently redistribute the objects to their corresponding GPUs.
Quantifying the NUMA Behavior of Partitioned GPGPU Applications
TLDR
A framework that allows analyzing the internal communication behavior of GPGPU applications, consisting of an open-source memory tracing plugin for Clang/LLVM, and a simple communication model based on summaries of a kernel's memory accesses that allows reasoning about virtual bandwidth-limited communication paths between NUMA nodes using different partitioning strategies is introduced.
Reviewing GPU architectures to build efficient back projection for parallel geometries
TLDR
This article builds a performance model to find hardware hotspots and proposes several optimizations to balance the load between texture engine, computational and special function units, as well as different types of memory maximizing the utilization of all GPU subsystems in parallel.
GPU-based implementation of an optimized nonparametric background modeling for real-time moving object detection
TLDR
This paper presents a novel real-time implementation of an optimized spatio-temporal nonparametric moving object detection strategy that features smart cooperation between a computer/device's Central and Graphics Processing Units and extensive usage of the texture mapping and filtering units of the latter, including a novel method for fast evaluation of Gaussian functions.
AVR: Reducing Memory Traffic with Approximate Value Reconstruction
TLDR
Approximate Value Reconstruction (AVR) reduces the memory traffic of applications that tolerate approximations in their dataset improving significantly system performance and energy efficiency and supports the compression scheme maximizing its effect and minimizing its overheads.
MemSZ
This article describes Memory Squeeze (MemSZ), a new approach for lossy general-purpose memory compression. MemSZ introduces a low latency, parallel design of the Squeeze (SZ) algorithm offering
...
1
2
...

References

SHOWING 1-10 OF 30 REFERENCES
The Design and Analysis of a Cache Architecture for Texture Mapping
TLDR
This paper proposes the use of texture image caches to alleviate the above bottlenecks, and indicates that caching is a promising approach to designing memory systems for texture mapping.
Fermi GF100 GPU Architecture
TLDR
The Fermi GF100 is a GPU architecture that provides several new capabilities beyond the Nvidia GT200 or Tesla architecture, including tessellation, physics processing, and computational graphics.
Prefetching in a texture cache architecture
TLDR
This paper introduces a prefetching texture cache architecture designed to take advantage of the access characteristics of texture mapping, and demonstrates that even in the presence of a high-latency memory system, this architecture can attain at least 97% of the performance of a zerolatency memory systems.
Rise of the Graphics Processor
  • D. Blythe
  • Computer Science
    Proceedings of the IEEE
  • 2008
TLDR
This work examines some of this evolution of hardware to accelerate graphics processing operations, looks at the structure of a modern GPU, and discusses how graphics processing exploits this structure and how nongraphical applications can take advantage of this capability.
Demystifying GPU microarchitecture through microbenchmarking
TLDR
This work develops a microbechmark suite and measures the CUDA-visible architectural characteristics of the Nvidia GT200 (GTX280) GPU, exposing undocumented features that impact program performance and correctness.
Hardware for Superior Texture Performance
TLDR
A rapidly emerging technology which offers the combination of enormous transfer rates and computing power: logic-embedded memories is observed, bringing high performance texture mapping to low-cost systems and a specific compression scheme is described for texture mapping.
NVIDIA Tesla: A Unified Graphics and Computing Architecture
TLDR
To enable flexible, programmable graphics and high-performance computing, NVIDIA has developed the Tesla scalable unified graphics and parallel computing architecture, which is massively multithreaded and programmable in C or via graphics APIs.
Neon: a single-chip 3D workstation graphics accelerator
High-performance 3D graphics accelerators traditionally require multiple chips on multiple boards, including geometry, rasterizing, pixel processing, and texture mapping chips. These designs are
GPU Computing
TLDR
The background, hardware, and programming model for GPU computing is described, the state of the art in tools and techniques are summarized, and four GPU computing successes in game physics and computational biophysics that deliver order-of-magnitude performance gains over optimized CPU applications are presented.
Texram: a smart memory for texturing
Logic embedded memory is an emerging technology that combines high transfer rates and computing power. Texram implements this technology and a new filtering algorithm to achieve high speed, high
...
1
2
3
...