The enumeration of Costas arrays is a problem that grows factorially with input size and that has lately been completed for sizes up to 28 using computer clusters. This paper presents designs for solving this problem using, separately, GPUs and FPGAs. Both implementations rely on Costas array symmetries to reduce the search space and perform concurrent explorations over the remaining candidate solutions. The fine grained parallelism utilized to evaluate and progress the exploration, coupled with the additional concurrency provided by the multiple instanced cores allowed the FPGA (XC5VLX330-2) implementation to achieve speedups of up to 40 times over the GPU (GeForce GTX 480). Estimates for bigger sizes, up to N=28 indicate a speedup of 4.44 times over the fastest reported software implementation.