Learn More
This paper describes and evaluates a compiler transformation that improves the performance of parallel programs on Network-of-Workstation (NOW) shared-memory multiprocessors. The transformation overlaps the communication time resulting form non-local memory accesses with the computation time in parallel loops to effectively hide the latency of the remote(More)
This thesis describes a novel approach for distributing low skew clock signals across large digital systems independent of environmental and process variations. The technique is integrated into a multi-output clock buffer circuit that can handle a scalable number of clock loads in a point-to-point configuration. The circuit contains an impedance-locked loop(More)
Developing suitable processes for thin wafer fast recovery power diodes is important for modern production plants as the substrate dimension increases. A set of emerging technologies has been employed here in order to fabricate 1700V rated fast recovery diodes from standard Si IGBT substrates without pre-diffused backside n-type buffer. Wafer grinding,(More)
A recent ar)cle in HPCWire recognizes ORNL's Scalable Heterogeneous Compu)ng benchmarks, which are sponsored by DOE ASCR and NSF, for performance tes)ng on emerging architectures with OpenCL and CUDA. Kyle Spafford from ORNL's Future Technologies group has been benchmarking the two technologies for some)me and is now convinced that OpenCL performance is(More)
  • 1