Skip to search form
Skip to main content
Skip to account menu
Semantic Scholar
Semantic Scholar's Logo
Search 218,217,185 papers from all fields of science
Search
Sign In
Create Free Account
Thread block
A thread block is a programming abstraction that represents a group of threads that can be executing serially or in parallel. For better process and…
Expand
Wikipedia
(opens in a new tab)
Create Alert
Alert
Related topics
Related topics
16 relations
CUDA
Cache (computing)
Central processing unit
Fermi (microarchitecture)
Expand
Broader (1)
Parallel computing
Papers overview
Semantic Scholar uses AI to extract papers important to this topic.
2020
2020
Enabling Highly Efficient Batched Matrix Multiplications on SW26010 Many-core Processor
Lijuan Jiang
,
Chao Yang
,
Wenjing Ma
ACM Transactions on Architecture and Code…
2020
Corpus ID: 212549604
We present a systematic methodology for optimizing batched matrix multiplications on SW26010 many-core processor of the Sunway…
Expand
Highly Cited
2019
Highly Cited
2019
Kernel Tuner: A search-optimizing GPU code auto-tuner
B. V. Werkhoven
Future generations computer systems
2019
Corpus ID: 52898424
2015
2015
Brute-Force k-Nearest Neighbors Search on the GPU
Shengren Li
,
N. Amenta
Similarity Search and Applications
2015
Corpus ID: 16734835
We present a brute-force approach for finding k-nearest neighbors on the GPU for many queries in parallel. Our program takes…
Expand
2015
2015
GPU voltage noise: Characterization and hierarchical smoothing of spatial and temporal voltage noise interference in GPU architectures
Jingwen Leng
,
Yazhou Zu
,
V. Reddi
International Symposium on High-Performance…
2015
Corpus ID: 3175226
Energy efficiency is undoubtedly important for GPU architectures. Besides the traditionally explored energy-efficiency…
Expand
Highly Cited
2015
Highly Cited
2015
Dynamic Thread Block Launch: A lightweight execution mechanism to support irregular applications on GPUs
Jin Wang
,
Norman Rubin
,
A. Sidelnik
,
S. Yalamanchili
International Symposium on Computer Architecture
2015
Corpus ID: 207225377
GPUs have been proven effective for structured applications that map well to the rigid 1D-3D grid of threads in modern bulk…
Expand
2013
2013
3D Non-Local Means denoising via multi-GPU
G. Palma
,
F. Piccialli
,
+4 authors
B. Alfano
Conference on Computer Science and Information…
2013
Corpus ID: 16207054
Non-Local Means (NLM) algorithm is widely considered as a state-of-the-art denoising filter in many research fields. High…
Expand
2013
2013
Efficient 3D stencil computations using CUDA
M. Krotkiewski
,
M. Dąbrowski
Parallel Computing
2013
Corpus ID: 12494820
2011
2011
Source-to-Source Code Translator: OpenMP C to CUDA
Gabriel Noaje
,
Christophe Jaillet
,
M. Krajecki
IEEE International Conference on High Performance…
2011
Corpus ID: 1536288
In recent years hardware accelerators have become a full part of the HPC domain as their peak performance has increased steadily…
Expand
Highly Cited
2011
Highly Cited
2011
On the Development of a High-Order, Multi-GPU Enabled, Compressible Viscous Flow Solver for Mixed Unstructured Grids
P. Castonguay
,
D. M. Williams
,
P. Vincent
,
Manuel E. Lopez
,
A. Jameson
2011
Corpus ID: 15569250
This work discusses the development of a three-dimensional, high-order, compressible viscous ow solver for mixed unstructured…
Expand
Highly Cited
2008
Highly Cited
2008
Efficient computation of sum-products on GPUs through software-managed cache
M. Silberstein
,
A. Schuster
,
D. Geiger
,
Anjul Patney
,
John Douglas Owens
International Conference on Supercomputing
2008
Corpus ID: 3178362
We present a technique for designing memory-bound algorithms with high data reuse on Graphics Processing Units (GPUs) equipped…
Expand
By clicking accept or continuing to use the site, you agree to the terms outlined in our
Privacy Policy
(opens in a new tab)
,
Terms of Service
(opens in a new tab)
, and
Dataset License
(opens in a new tab)
ACCEPT & CONTINUE