A single-chip FPGA implementation of a vision core is an efficient way to design fast and compact embedded vision systems from the PCB design level. The scope of the research is to design a novel FPGA-based parallel architecture for embedded vision entirely with on-chip FPGA resources. We designed it by utilizing block-RAMs and IO interfaces on the FPGA. As a result, the system is compact, fast and flexible. We evaluated this architecture for several mid-level neighborhood algorithms using Xilinx Virtex-2 Pro (XC2VP30) FPGA. Our algorithm uses a vision core with a 100 MHz system clock which supports image processing on a low-resolution image of 128×128 pixels up to 200 images per second. The results are accurate. We have compared our results with existing FPGA implementations. The performance of the algorithms could be substantially improved by applying sufficient parallelism.