Small scale chip multiprocessors are being shipped in volume by all microprocessor vendors. Many of these vendors are also investigating large scale chip multiprocessors targeted towards highly parallel workloads in media, graphics, and others. One of the most challenging aspects of architecting terascale processors is the design of a scalable memory hierarchy. Current proposals for providing coherent shared memory in terascale systems require a sophisticated coherence protocol and memory hierarchy. In this paper we propose an alternate memory configuration along with a programming model that significantly simplifies the terascale memory hierarchy. Our proposal still provides fully coherent shared memory but eliminates the hardware coherence protocol. Our programming model enables the programmer to better express the memory characteristic of terascale workloads. Finally, our proposed memory hierarchy performs better and is more scalable than conventional designs.