This paper introduces an embedded hybrid processor that increases performance by more than an order of magnitude while also reducing power consumption by a similar amount, even at the higher performance. This is achieved by moving key software functions into hardware through an automated process. These functions are labeled hardware functions and are controlled by the software processor. Specialized dual-ported memories are used to enable data sharing between the software processor and the hardware functions so that there is no overhead associated with calling the hardware functions. Adding hardware functions resulted in dramatically improved performance and reduced energy consumption. Our processor has been synthesized for 160nm standard cell ASIC fabrication process from OKI and for a 90nm Stratix II FPGA with a core operating frequency of 167 MHz for both technologies. We present multimedia and signal processing benchmarks that show kernel performance improvements over a single processor ranging from 9X to 332X, and entire application speedups ranging from 4X to 127X. Hardware functions also provide many orders of magnitude of power improvement for the computational kernels, ranging from 42X to over 418X.