Computer systems are moving towards a heterogeneous architecture with a combination of one or more CPUs and one or more accelerator processors. Such heterogeneous systems pose a new challenge to the parallel programming community. Languages such as OpenCL and CUDA provide a program environment for such systems. However, they focus on data parallel programming where the majority of computation is carried out by the accelerators. Our view is that, in the future, accelerator processors will be tightly coupled with the CPUs, be available in different system architectures (e.g., integrated and discrete), and systems will be dynamically reconfigurable. In this paper we advocate a balanced programming model where computation is balanced between the CPU and its accelerators. This model supports sharing virtual memory between the CPU and the accelerator processors so the same data structures can be manipulated by both sides. It also supports task-parallel as well as data-parallel programming, fine-grained synchronization, thread scheduling, and load balancing. This model not only leverages the computational capability of CPUs, but also allows dynamic system reconfiguration, and supports different platform configurations. To help demonstrate the practicality of our programming model, we present performance results for a preliminary implementation on a computer system with an Intel® CoreTM i7 processor and a discrete Larrabee processor. These results show that the model’s most performance-critical part, its shared virtual memory implementation, simplifies programming without hurting performance.