Memory-related constraints (memory bandwidth, cache size) are nowadays the performance bottleneck of most computational applications. Especially in the scenario of multiple cores, the performance does not scale with the number of cores in many cases. In our work, we present our FPGA-based solution for the 3D Reverse Time Migration (RTM) algorithm. As the most computationally demanding imaging algorithm in current oil and gas exploration, RTM involves various computational challenges, such as a high demand for storage size and bandwidth, and a poor cache behavior. Combining optimizations from both the algorithmic and architectural perspectives, our FPGA-based solution manages to remove the memory constraints and provide a high performance that can scale well with the amount of computational resources available. Compared with an optimized CPU implementation using two quad-core Intel Nehalem CPUs, our solution achieves 4x speedup on two Virtex-5 FPGAs, and 8x speedup on two Virtex-6 FPGAs. Our projection demonstrates that the performance will continue to scale with the future increase of FPGA capacities.