Learn More
Multicore systems are becoming ubiquituous in scientificcomputing. As performance libraries are adapted to such systems, thedifficulty to extract the best performance out of them is quite high. Indeed,performance libraries such as Intel's MKL, while performing verywell on unicore architectures, see their behaviour degrade when used onmulticore systems.(More)
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may(More)
Thanks to the improvements in semiconductor technologies, extreme-scale systems such as teradevices (i.e., composed by 1000 billion of transistors) will enable systems with 1000+ general purpose cores per chip, probably by 2020. Three major challenges have been identified: programmability, manageable architecture design, and reliability. TERAFLUX is a(More)
As computing has moved relentlessly through giga-, tera-, and peta-scale systems, exa-scale (a million trillion opera-tions/sec.) computing is currently under active research. DARPA has recently sponsored the " UHPC " [1] — ubiquitous high-performance computing — program, encouraging partnership with academia and industry to explore such systems. Among the(More)
Current hardware trends place increasing pressure on programmers and tools to optimize scientific code. Numerous tools and techniques exist, but no single tool is a panacea; instead, different tools have different strengths. Therefore, an assortment of performance tuning utilities and strategies are necessary to best utilize scarce resources (e.g.,(More)
Chip architectures are shifting from few, faster, functionally heavy cores to abundant, slower, simpler cores to address pressing physical limitations such as energy consumption and heat expenditure. As architectural trends continue to fluctuate, we propose a novel program execution model, the Codelet model, which is designed for new systems tasked with(More)
Developing parallel high-performance applications is an error-prone and time-consuming challenge. Performance tuning can be alleviated considerably by using optimisation tools, either by simply applying a stand-alone tool or by applying a tool chain with a number of more or less integrated tools covering different aspects of the optimisation process. In the(More)
Future extreme-scale supercomputers will feature arrays of general-purpose and specialized many-core processors, totaling thousands of cores on a single chip. In general, many-core chips will most likely resemble a "hierarchical and distributed system on chip." It is expected that such systems will be hard to exploit not only for performance, but will also(More)
The code let model is a fine-grain dataflow-inspired program execution model that balances the parallelism and overhead of the runtime system. It plays an important role in terms of performance, scalability, and energy efficiency in exascale studies such as the DARPA UHPC project and the DOE X-Stack project. As an important application, the Fast Fourier(More)
The dataflow programming paradigm shows an important way to improve programming productivity for streaming systems. In this paper we propose COStream, a programming language based on synchronous data flow execution model for data-driven application. We also propose a compiler framework for COStream on general-purpose multi-core architectures. It features an(More)