Everett H. Phillips

Learn More
In this paper we present a thorough experience on tuning double-precision matrix-matrix multiplication (DGEM-M) on the Fermi GPU architecture. We choose an optimal algorithm with blocking in both shared memory and registers to satisfy the constraints of the Fermi memory hierarchy. Our optimization strategy is further guided by a performance modeling based(More)
This paper presents an implementation of the Least Squares Monte Carlo (LSMC) method by Longstaff and Schwartz [1] to price American options on GPU using CUDA. We focused our attention to the calibration phase and performed several experiments to assess the quality of the results. The implementation can price a put option with 200,000 paths and 50 time(More)
The High Performance Conjugate Gradient (HPCG) benchmark has been recently proposed as a complement to the High Performance Linpack (HPL) benchmark currently used to rank supercomput-ers in the Top500 list. This new benchmark solves a large sparse linear system using a multigrid preconditioned conjugate gradient (PCG) algorithm. The PCG algorithm contains(More)
—This paper presents the details of a Synthetic Aperture Radar (SAR) imaging on the smallest CUDA-capable platform available, the Jetson TK1. The results indicate that GPU accelerated embedded platforms have considerable potential for this type of workload and in conjunction with low power consumption, light weight and standard programming tools, could open(More)
  • 1