Accelerating BLAS and LAPACK via Efficient Floating Point Architecture Design

@article{Merchant2017AcceleratingBA,
  title={Accelerating BLAS and LAPACK via Efficient Floating Point Architecture Design},
  author={Farhad Merchant and Anupam Chattopadhyay and Soumyendu Raha and S. K. Nandy and Ranjani Narayan},
  journal={Parallel Process. Lett.},
  year={2017},
  volume={27},
  pages={1750006:1-1750006:17}
}
  • Farhad Merchant, Anupam Chattopadhyay, +2 authors Ranjani Narayan
  • Published in Parallel Process. Lett. 2017
  • Computer Science
  • Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package (LAPACK) form basic building blocks for several High Performance Computing (HPC) applications and hence dictate performance of the HPC applications. Performance in such tuned packages is attained through tuning of several algorithmic and architectural parameters such as number of parallel operations in the Directed Acyclic Graph of the BLAS/LAPACK routines, sizes of the memories in the memory hierarchy of the underlying platform… CONTINUE READING

    Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

    Citations

    Publications citing this paper.
    SHOWING 1-6 OF 6 CITATIONS

    Efficient Realization of Householder Transform Through Algorithm-Architecture Co-Design for Acceleration of QR Factorization

    VIEW 2 EXCERPTS
    CITES METHODS

    Applying Modified Householder Transform to Kalman Filter

    VIEW 1 EXCERPT
    CITES METHODS

    A Systematic Approach for Acceleration of Matrix-Vector Operations in CGRA through Algorithm-Architecture Co-Design

    VIEW 1 EXCERPT
    CITES METHODS

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 20 REFERENCES

    Optimizing pipelines for power and performance

    VIEW 10 EXCERPTS
    HIGHLY INFLUENTIAL

    The optimum pipeline depth for a microprocessor

    VIEW 7 EXCERPTS
    HIGHLY INFLUENTIAL

    A Linear Algebra Core Design for Efficient Level-3 BLAS

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL

    Efficient Realization of Table Look-Up Based Double Precision Floating Point Arithmetic

    VIEW 1 EXCERPT

    Micro-architectural Enhancements in Distributed Memory CGRAs for LU and QR Factorizations

    VIEW 1 EXCERPT

    Floating Point Architecture Extensions for Optimized Matrix Factorization

    VIEW 4 EXCERPTS
    HIGHLY INFLUENTIAL