Learn More
Massively parallel, throughput-oriented systems such as graphics processing units (GPUs) offer high performance for a broad range of programs. They are, however, complex to program, especially because of their intricate memory hierarchies with multiple address spaces. In response, modern GPUs have widely adopted caches, hoping to providing smoother(More)
Initially introduced as special-purpose accelerators for games and graphics code, graphics processing units (GPUs) have emerged as widely-used high-performance parallel computing platforms. GPUs traditionally provided only software-managed local memories (or scratchpads) instead of demand-fetched caches. Increasingly, however, GPUs are being used in broader(More)
Graphics processing units (GPUs) are of increasing interest because they offer massive parallelism for high-throughput computing. While GPUs promise high peak performance, their challenge is a less-familiar programming model with more complex and irregular performance trade-offs than traditional CPUs or CMPs. In particular, modest changes in software or(More)
Graphics processing units (GPUs) are in increasingly wide use, but significant hurdles lie in selecting the appropriate algorithms, runtime parameter settings, and hardware configurations to achieve power and performance goals with them. Exploring hardware and software choices requires time-consuming simulations or extensive real-system measurements. While(More)
GPU performance and power tuning is difficult, requiring extensive user expertise and time-consuming trial and error. To accelerate design tuning, statistical design space exploration methods have been proposed. This article presents Starchart, a novel design space partitioning tool that uses regression trees to approach GPU tuning problems. Improving on(More)
Wireless Sensor Networks (WSNs) consist of small power-constrained nodes with sensing, computation and wireless communication capabilities. These nodes are deployed in the sensing region to monitor especial events such as temperature, pollution, etc. They transmit their sensed data to the sink in a multi-hop manner. The sink is the interface between sensor(More)
Today's computer systems often employ high-throughput accelerators (such as Intel Xeon Phi coprocessors and NVIDIA Tesla GPUs) to improve the performance of some applications or portions of applications. While such accelerators are useful for suitable applications, it remains challenging to predict which workloads will run well on these platforms and to(More)
  • 1