Memory latency limits program performance. Object-oriented languages such as C# and Java exacerbate this problem, but their software engineering benefits make them increasingly popular. We show that current memory hierarchies are not particularly well suited to Java in which object streams write and read a window of short-lived objects that pollute the cache. These observations motivate the exploration of transient cacheswhich assist a parent cache. For an L1 parent cache, transient caches are positioned similarly to a classic L0, providing one cycle access time. Their distinguishing features are (1) they are tiny (4 to 8 lines), (2) they are highly associative, and (3) the processor may seek them in parallel with their parent. They can assist any cache level. To address object stream behavior, we explore policies for read and write instantiation, promotion, filtering, and valid bits to implement no-fetch on write. Good design points include a parallel L0 (PL0) which improves Java programs by 3% on average, and C by 2% in cycle-accurate simulation over a two-cycle 32KB, 128B line, 2-way L1. A transient qualifying cache (TQ) improves further by a) minimizing pollution in the parent by filtering short-lived lines without temporal reuse, and b) using a write no-fetch policy with per-byte valid bits to eliminate wasted fetch bandwidth. TQs at L1 and L2 improve Java programs by 5% on average and up to 15%. The TQ even achieves improvements when the parent has half the capacity or associativity compared to the original larger L1. The one-cycle access time, a write no-fetch policy, and filtering bestow these benefits. Java motivates this approach, but it also improves for C programs.

