Unified Parallel C for GPU Clusters: Language Extensions and Compiler Implementation