We report on a GPU implementation of the condensation method designed by Abdelmalek Salem and Kouachi Said for computing the determinant of a matrix. We consider two types of coefficients: modularâ€¦ (More)

- Sardar Anisul Haque, Marc Moreno Maza, Ning Xie
- PARCO
- 2015

We present a model of multithreaded computation with an emphasis on estimating parallelism overheads of programs written for modern many-core architectures. We establish a Graham-Brent theorem so asâ€¦ (More)

- Sardar Anisul Haque, Xin Li, Farnam Mansouri, Marc Moreno Maza, Wei Pan, Ning Xie
- ICMS
- 2014

CUMODP is a CUDA library for exact computations with dense polynomials over finite fields. A variety of operations like multiplication, division, computation of subresultants, multi-point evaluation,â€¦ (More)

- Sardar Anisul Haque, Shahadat Hossain
- 2009 International Conference on Computingâ€¦
- 2009

We revisit ordering techniques as a preprocessing step for improving the performance of sparse matrix-vector multiplication (SpM$\times$V) on modern hierarchical memory computers. In computingâ€¦ (More)

With the advent of hardware accelerator technologies, multi-core processors and GPUs, much effort for taking advantage of those architectures by designing parallel algorithms has been made. Toâ€¦ (More)

- Sardar Anisul Haque, Shahadat Hossain, Marc Moreno Maza
- PASCO
- 2010

Sparse matrix-vector multiplication or <i>SpMXV</i> is an important kernel in scientific computing. For example, the conjugate gradient method (CG) is an iterative linear system solving process whereâ€¦ (More)

We propose parallel algorithms for operations on univariate polynomials (multi-point evaluation, interpolation) based on subproduct tree techniques and targeting many-core GPUs. On thoseâ€¦ (More)

- Sardar Anisul Haque, X. Li, Farnam Mansouri, Marc Moreno Maza, Davood Mohajerani, Wei Pan
- ACM Comm. Computer Algebra
- 2017

The CUDA Modular Polynomial (CUMODP) Library implements arithmetic operations for dense matrices and dense polynomials, primarily with modular integer coefficients. Some operations are available forâ€¦ (More)

As for serial code on CPUs, parallel code on GPUs for dense polynomial arithmetic relies on a combination of asymptotically fast and plain algorithms. Those are employed for data of large and smallâ€¦ (More)

- Sardar Anisul Haque, Amir Hashemi, Davood Mohajerani, Marc Moreno Maza
- PASCO@ISSAC
- 2017

We present multithreaded adaptations of the Euclidean plain division and the Euclidean GCD algorithms to the many-core GPU architectures We report on implementation with NVIDIA CUDA and complexityâ€¦ (More)