Cyclops Tensor Framework: Reducing Communication and Eliminating Load Imbalance in Massively Parallel Contractions

  title={Cyclops Tensor Framework: Reducing Communication and Eliminating Load Imbalance in Massively Parallel Contractions},
  author={E. Solomonik and D. Matthews and J. Hammond and J. Demmel},
  journal={2013 IEEE 27th International Symposium on Parallel and Distributed Processing},
  • E. Solomonik, D. Matthews, +1 author J. Demmel
  • Published 2013
  • Computer Science
  • 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
  • Cyclops (cyclic-operations) Tensor Framework (CTF) 1 is a distributed library for tensor contractions. [...] Key Method The mapping framework decides on the best mapping for each tensor contraction at run-time via explicit calculations of memory usage and communication volume. CTF employs a general redistribution kernel, which transposes tensors of any dimension between arbitrary distributed layouts, yet touches each piece of data only once.Expand Abstract
    73 Citations
    Optimizing Tensor Contractions in CCSD(T) for Efficient Execution on GPUs
    Tensor Contractions with Extended BLAS Kernels on CPU and GPU
    • 24
    • PDF
    Design of a High-Performance Tensor-Vector Multiplication with BLAS
    A framework for load balancing of Tensor Contraction expressions via dynamic task partitioning
    • 19
    A Communication-Optimal Framework for Contracting Distributed Tensors
    • 20
    • PDF
    HPTT: a high-performance tensor transposition C++ library
    • 24
    • Highly Influenced
    • PDF
    Design of a High-Performance GEMM-like Tensor–Tensor Multiplication
    • 38
    • Highly Influenced
    • PDF
    CAST: Contraction Algorithm for Symmetric Tensors
    • 4
    • Highly Influenced
    Generating Efficient Tensor Contractions for GPUs
    • 25
    • Highly Influenced
    • PDF


    Minimizing Communication in Linear Algebra
    • 55
    • PDF
    Improving communication performance in dense linear algebra via topology aware collectives
    • 55
    • PDF
    Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds
    • 56
    • PDF
    Elemental: A New Framework for Distributed Memory Dense Matrix Computations
    • 210
    • PDF
    Automatic code generation for many-body electronic structure methods: the tensor contraction engine
    • 97
    • PDF
    Minimizing the Communication Time for Matrix Multiplication on Multiprocessors
    • S. Johnsson
    • Mathematics, Computer Science
    • Parallel Comput.
    • 1993
    • 57
    An infrastructure for scalable and portable parallel programs for computational chemistry
    • 10
    Efficient Search-Space Pruning for Integrated Fusion and Tiling Transformations
    • 9
    • PDF
    Global arrays: A nonuniform memory access programming model for high-performance computers
    • 316
    • PDF