Data race detection becomes an important problem in GPU programming. The paper presents a novel solution mainly aimed at detecting data races happening in shared memory accesses with no use of atomic primitives. It makes use of compiler support to privatize shared data and then at run time parallelizes data race checking. It has two distinct features. First, unlike previous existing work, our work gets rid of per memory access monitoring by data privatization technique, which brings a very low performance overhead and also well scalability. Second, data race checking utilizes massively parallel resources on GPU. Preliminary results show two orders of magnitude performance improvement over an existing work.