Dynamic programming matrices and the P7Viterbi algorithm of HMMER 3.0 show high parallelism in its code. Within the code, every query can have its score calculated in parallel with one thread per query. In this paper, these parallel features were exploited through the use of CUDA and a GPGPU. The CUDA implementation of this algorithm being performed on the Tesla C1060 enabled a 10–15x speedup depending on the number of queries. Without concurrent kernel execution and memory transfers a speedup of over 4x was achieved in terms of the total execution time. With a wide range of data sizes where the CPU has greater performance, it would be important that CUDA enabled programs properly select when to and not utilize the GPU for acceleration.