We present an architecture for embedded systems that decompresses ofline-compressed instructions during runtime. This is useful for compressed code systems where instructions are stored in a compressed format and decompressed on demand. The results is a sign$cant reduction in power consumption, and in most cases a performance improvement. The stand-alone decompression engine is placed between the instruction cache and the CPU (post-cache architecture) as we have found this to be the most power-eficient architecture. This paper describes the design of this unit in detail and analyzes its power consumption and performance.