Yangzhao Yang

  • Citations Per Year
Learn More
A typical decoupled access/execute architecture (DAE) processor is consisting of Access Processors (AP) and Execute Processors (EP). The overhead of memory access of AP can be hidden by calculation of EP. Based on this principle, a new optimization algorithm of general dense matrix multiplication operation (GEMM) will be introduced in this paper. The(More)
The instruction-level parallelism (ILP) of Very Long Instruction (VLIW) Word DSP processor is acquired through operation partitioning and software pipeline. In the previous research of cluster, researchers always focus on reducing move operations between clusters, but rarely consider the effect of heterogeneous architecture combined with SIMD structure and(More)
  • 1