Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions

@article{Lu2012EmpiricalPM,
  title={Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions},
  author={Qingda Lu and X. Gao and S. Krishnamoorthy and G. Baumgartner and J. Ramanujam and P. Sadayappan},
  journal={J. Parallel Distributed Comput.},
  year={2012},
  volume={72},
  pages={338-352}
}
Empirical optimizers like ATLAS have been very effective in optimizing computational kernels in libraries. The best choice of parameters such as tile size and degree of loop unrolling is determined in ATLAS by executing different versions of the computation. In contrast, optimizing compilers use a model-driven approach to program transformation. While the model-driven approach of optimizing compilers is generally orders of magnitude faster than ATLAS-like library generators, its effectiveness… Expand
Generating Efficient Tensor Contractions for GPUs
Automatic Data Layout Transformations in the ExaStencils Code Generator
Using autotuning for accelerating tensor contraction on graphics processing units (GPUs)
Generating Efficient Quantum Chemistry Codes for Novel Architectures.
Tensor Contractions with Extended BLAS Kernels on CPU and GPU
...
1
2
3
...

References

SHOWING 1-10 OF 84 REFERENCES
Integrated compiler optimizations for tensor contractions
Data layout optimization techniques for modern and emerging architectures
A comparison of empirical and model-driven optimization
Combining analytical and empirical approaches in tuning matrix transposition
Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy
Memory-Constrained Data Locality Optimization for Tensor Contractions
Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms
Timing high performance kernels through empirical compilation
Global communication optimization for tensor contraction expressions under memory constraints
...
1
2
3
4
5
...