Problem statement: Ultra Wide Band (UWB) technology has attracted many researchers’ attention due to its advantages and its great potential for future applications. The physical layer standard of Multi-band Orthogonal Frequency Division Multiplexing (MB-OFDM) UWB system is defined by ECMA International. In this standard, the data sampling rate from the analog-to-digital converter to the physical layer is up to 528 M sample sec. Therefore, it is a challenge to realize the physical layer especially the components with high computational complexity in Very Large Scale Integration (VLSI) implementation. Fast Fourier Transform (FFT) block which plays an important role in MB-OFDM system is one of these components. Furthermore, the execution time of this module is only 312.5 ns. Therefore, if employing the traditional approach, high power consumption and hardware cost of the processor will be needed to meet the strict specifications of the UWB system. The objective of this study was to design an Application Specific Integrated Circuit (ASIC) FFT processor for this system. The specification was defined from the system analysis and literature research. Approach: Based on the algorithm and architecture analysis, a novel Genetic Algorithm (GA) based Canonical Signed Digit (CSD) Multiplier less 128-point FFT processor and its inverse (IFFT) for MB-OFDM UWB systems had been proposed. The proposed pipelined architecture was based on the modified Radix-2 algorithm that had same number of multipliers as that of the conventional Radix-2. However, the multiplication complexity and the ROM memory needed for storing twiddle factors coefficients could be eliminated by replacing the conventional complex multipliers with a newly proposed GA optimized CSD constant multipliers. The design had been coded in Verilog HDL and targeted Xilinx Virtex-II FPGA series. It was fully implemented and tested on real hardware using Virtex-II FG456 prototype board and logic analyzer. Results: From the synthesis reports, the proposed GA optimized CSD constant complex multiplier achieved 79 and 50% equivalent gates and latency efficiency when compared to the conventional complex multiplier. Conclusion: As a conclusion, we successfully implemented 128-points FFT/IFFT processor with the proposed architecture that can meet the requirement of MB-OFDM UWB system with higher throughput and less area compared to conventional architecture.