Modular inversion is an operation frequently used in many contemporary cryptographic applications, especially in public-key crypto-systems. In this paper, we present an efficient, scalable and generic hardware implementation of modular inversion operation optimized for a class of FPGA (Field Programmable Gate Array) devices. The long carry chains, which increase critical path delay, are avoided by utilizing generic block adder and subtract or circuits that exploit the hardwired carry logic of the FPGA devices. In our design, we utilize the Montgomery modular inversion that is chosen for compatibility with Montgomery multiplication operation. The effectiveness and efficiency of our methods are explored by realizing our design on a Xilinx Spartan-6 FPGA, which is a recent, low-end reconfigurable logic device popular in embedded applications for its power efficiency. Timing simulation demonstrate that our design achieves maximum clock frequency of 280 MHz. The implementation performs one modular inversion operation in a considerably small amount of time and it takes a negligible amount of resources on FPGA.