Learn More
Communication-avoiding linear algebra algorithms with low communication latency and high memory bandwidth requirements like Tall-Skinny QR factorization (TSQR) are highly appropriate for acceleration using FPGAs. TSQR parallelizes QR factorization of tall-skinny matrices in a divide-and-conquer fashion by decomposing them into sub-matrices, performing local(More)
Trading communication with redundant computation can increase the silicon efficiency of FPGAs and GPUs in accelerating communication-bound sparse iterative solvers. While k iterations of the iterative solver can be unrolled to provide O(k) reduction in communication cost, the extent of this unrolling depends on the underlying architecture, its memory model,(More)
Iterative numerical algorithms with high memory bandwidth requirements but medium-size data sets (matrix size ∼ a few 100s) are highly appropriate for FPGA acceleration. This paper presents a streaming architecture comprising floating-point operators coupled with high-bandwidth on-chip memories for the Lanczos method, an iterative algorithm for symmetric(More)
We consider the problem of minimizing communication with off-chip memory and composition of multiple linear algebra kernels in iterative solvers for solving large-scale eigenvalue problems and linear systems of equations. While GPUs may offer higher throughput for individual kernels, overall application performance is limited by the inability to support(More)
To evaluate the role of chest CT in the initial staging of testicular seminomatous germ cell tumours. All patients referred to Addenbrooke's Hospital with testicular seminoma from 1 January 2000 to 31 December 2005 were included and case notes retrospectively reviewed. One hundred and eighty-two patients with testicular seminoma were identified, with a(More)
  • 1