Andreas Diavastos

  • Citations Per Year
Learn More
Low-Density Parity-Check (LDPC) codes are powerful error correcting codes used today in communication standards such as DVB-S2 and WiMAX to transmit data inside noisy channels with high error probability. LDPC decoding is computationally demanding and requires irregular accesses to memory which makes it suitable for parallelization. The recent introduction(More)
The introduction of multi-core processors has renewed the interest in programming models which can efficiently exploit general purpose parallelism. Data-Flow is one such model which has demonstrated significant potential in the past. However, it is generally associated with functional styles of programming which do not deal well with shared mutable state.(More)
Decision Support System (DSS) workloads are known to be one of the most time-consuming database workloads that process large data sets. Traditionally, DSS queries have been accelerated using large-scale multiprocessors. In this work we exploit the benefits of using future many-core architectures, more specifically on-chip clustered many-core architectures.(More)
The number of computational units integrated in a single processor is rapidly increasing. This suggests that applications will require efficient and effective ways to exploit the parallelism to achieve the performance offered by large-scale multicore processors. The efficient parallelization of the applications relies on the programming and execution(More)
Our target in this work is to study ways of exploring the parallelism offered by vectorization on accelerators with very wide vector units. To this end, we implemented two kernels that derive from the Wilson Dslash operator and investigate several data layout techniques for increasing the scalability of lattice QCD scientific kernels suitable for the Intel(More)
The current trend in processor design is to increase the number of cores as to achieve a desired performance. While having a large number of cores on a chip seems to be feasible in terms of the hardware, the development of the software that is able to exploit that parallelism is one of the biggest challenges. In this paper we propose a Data-Flow based(More)
The increasing parallelism offered by the parallel architectures introduced by processor vendors, coupled with the need to extract more parallelism out of the applications, has led the community to examine more efficient programming and execution models. The Dataflow Multithreading model is known to be the model that can exploit the most parallelism out of(More)
Exploiting the recently introduced very-wide vector units of the Xeon Phi coprocessor can potentially increase the scalability for scientific applications. Using lattice QCD compute kernels, the authors find that the performance achieved using the Xeon Phi coprocessors wide vector units is similar to GPGPU performance after appropriate code refactoring,(More)
SWITCHES is a task-based dataflow runtime that implements a lightweight distributed triggering system for runtime dependence resolution and uses static scheduling and compile-time assignment policies to reduce runtime overheads. Unlike other systems, the granularity of loop-tasks can be increased to favor data-locality, even when having dependences across(More)