Skip to search form
Skip to main content
Skip to account menu
Semantic Scholar
Semantic Scholar's Logo
Search 230,710,332 papers from all fields of science
Search
Sign In
Create Free Account
BLAS
Known as:
AXPY
, CGEMM
, DGEMM
Expand
Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra…
Expand
Wikipedia
(opens in a new tab)
Create Alert
Alert
Related topics
Related topics
50 relations
AMD Core Math Library
ARM architecture
ATLAS
Amortized analysis
Expand
Papers overview
Semantic Scholar uses AI to extract papers important to this topic.
2015
2015
maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs
Andrew Lavin
arXiv.org
2015
Corpus ID: 14595069
This paper describes maxDNN, a computationally efficient convolution kernel for deep learning with the NVIDIA Maxwell GPU. maxDNN…
Expand
2015
2015
Accelerating LINPACK with MPI-OpenCL on Clusters of Multi-GPU Nodes
Gangwon Jo
,
J. Nah
,
Jun Lee
,
Jungwon Kim
,
Jaejin Lee
IEEE Transactions on Parallel and Distributed…
2015
Corpus ID: 18418343
OpenCL is an open standard to write parallel applications for heterogeneous computing systems. Since its usage is restricted to a…
Expand
2012
2012
Implementing a Code Generator for Fast Matrix Multiplication in OpenCL on the GPU
Kazuya Matsumoto
,
N. Nakasato
,
S. Sedukhin
IEEE 6th International Symposium on Embedded…
2012
Corpus ID: 16408566
This paper presents results of an implementation of code generator for fast general matrix multiply (GEMM) kernels. When a set of…
Expand
2011
2011
A high-performance, low-power linear algebra core
A. Pedram
,
A. Gerstlauer
,
R. V. D. Geijn
IEEE International Conference on Application…
2011
Corpus ID: 1771707
Achieving high-performance while reducing power consumption is a key concern as technology scaling is reaching its limits. It is…
Expand
2011
2011
Autotuning GEMMs for Fermi
J. Kurzak
,
S. Tomov
,
J. Dongarra
2011
Corpus ID: 9943454
In recent years, the use of graphics chips has been recognized as a viable way of accelerating scientic and engineering…
Expand
2007
2007
PRACTICAL TUNING OF FRACTIONAL ORDER PROPORTIONAL AND INTEGRAL CONTROLLER (I): TUNING RULE DEVELOPMENT
T. Bhaskaran
,
YangQuan Chen
,
Dingyu Xue
2007
Corpus ID: 15108932
This paper presents a new practical tuning method for fractional order proportional and integral controller (FO-PI). The plant to…
Expand
2005
2005
Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L
S. Chatterjee
,
Leonardo R. Bachega
,
+11 authors
Peng Wu
IBM Journal of Research and Development
2005
Corpus ID: 14480219
We describe the design of a dual-issue single-instruction, multiple-data-like (SIMD-like) extension of the IBM PowerPC® 440…
Expand
2004
2004
Fault–Tolerant High–Performance Matrix Multiplication
John A. Gunnels
,
D. Katz
,
E. S. Q. Ortí
,
R. Geijn
2004
Corpus ID: 15405059
In this paper, we extend the theory of algorithmic fault-tolerant matrix-matrix multiplication, C = AB, in a number of ways…
Expand
1999
1999
Efficient eigenvalue and singular value computations on shared memory machines
B. Lang
Parallel Computing
1999
Corpus ID: 44564161
1984
1984
Bimas-A Basic Mathematical Package for Computer-Aided Systems Analysis and Design
A. Varga
,
V. Sima
1984
Corpus ID: 59786729
By clicking accept or continuing to use the site, you agree to the terms outlined in our
Privacy Policy
(opens in a new tab)
,
Terms of Service
(opens in a new tab)
, and
Dataset License
(opens in a new tab)
ACCEPT & CONTINUE