Learn More
Multicore systems are becoming ubiquituous in scientificcomputing. As performance libraries are adapted to such systems, thedifficulty to extract the best performance out of them is quite high. Indeed,performance libraries such as Intel's MKL, while performing verywell on unicore architectures, see their behaviour degrade when used onmulticore systems.(More)
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may(More)
Thanks to the improvements in semiconductor technologies, extreme-scale systems such as teradevices (i.e., composed by 1000 billion of transistors) will enable systems with 1000+ general purpose cores per chip, probably by 2020. Three major challenges have been identified: programmability, manageable architecture design, and reliability. TERAFLUX is a(More)
As computing has moved relentlessly through giga-, tera-, and peta-scale systems, exa-scale (a million trillion opera-tions/sec.) computing is currently under active research. DARPA has recently sponsored the " UHPC " [1] — ubiquitous high-performance computing — program, encouraging partnership with academia and industry to explore such systems. Among the(More)
Chip architectures are shifting from few, faster, functionally heavy cores to abundant, slower, simpler cores to address pressing physical limitations such as energy consumption and heat expenditure. As architectural trends continue to fluctuate, we propose a novel program execution model, the Codelet model, which is designed for new systems tasked with(More)
Current hardware trends place increasing pressure on programmers and tools to optimize scientific code. Numerous tools and techniques exist, but no single tool is a panacea; instead, different tools have different strengths. Therefore, an assortment of performance tuning utilities and strategies are necessary to best utilize scarce resources (e.g.,(More)
Developing parallel high-performance applications is an error-prone and time-consuming challenge. Performance tuning can be alleviated considerably by using optimisation tools, either by simply applying a stand-alone tool or by applying a tool chain with a number of more or less integrated tools covering different aspects of the optimisation process. In the(More)
Dissemination Level: PU PU Public PP Restricted to other programs participant (including the Commission Services) RE Restricted to a group specified by the consortium (including the Commission Services) CO Confidential, only for members of the consortium (including the Commission Services) Version# Author Organization Change History 1 Stephane Zuckerman UD(More)
Future extreme-scale supercomputers will feature arrays of general-purpose and specialized many-core processors, totaling thousands of cores on a single chip. In general, many-core chips will most likely resemble a "hierarchical and distributed system on chip." It is expected that such systems will be hard to exploit not only for performance, but will also(More)
The code let model is a fine-grain dataflow-inspired program execution model that balances the parallelism and overhead of the runtime system. It plays an important role in terms of performance, scalability, and energy efficiency in exascale studies such as the DARPA UHPC project and the DOE X-Stack project. As an important application, the Fast Fourier(More)