As the most accurate model for simulating light propagation in heterogeneous tissues, Monte Carlo (MC) method has been widely used in the field of optical molecular imaging. However, MC method is time-consuming due to the calculations of a large number of photons propagation in tissues. The structural complexity of the heterogeneous tissues further increases the computational time. In this paper we present a parallel implementation for MC simulation of light propagation in heterogeneous tissues whose surfaces are constructed by different number of triangle meshes. On the basis of graphics processing units (GPU), the code is implemented with compute unified device architecture (CUDA) platform and optimized to reduce the access latency as much as possible by making full use of the constant memory and texture memory on GPU. We test the implementation in the homogeneous and heterogeneous mouse models with a NVIDIA GTX 260 card and a 2.40GHz Intel Xeon CPU. The experimental results demonstrate the feasibility and efficiency of the parallel MC simulation on GPU.