External memory bandwidth is an important issue in System-onChip (SoC) systems. Especially in high definition (HD) video coding, the bandwidth requirement of off-chip memory is critical in video processing. In recent researches, embedded compression shows high potential on off-chip memory bandwidth reduction. Works about embedded compression have been done for low power applications. However, there is no suitable efficient embedded compression with good rate-distortion performance for high throughput applications. In this paper, an algorithm and hardware architecture of high performance lossy embedded compression is proposed to ease the bus congestion problem while keeping the latency low. Using the proposed algorithm, not only the high throughput requirement of HD video encoder is met, but also the hardware cost is relatively low. From our simulation, about 70% memory bandwidth is reduced with only 0.1dB PSNR degradation in 1080p HD video.