Philippe Olivier Alexandre Navaux

Learn More
One of the main challenges for parallel architectures is the increasing complexity of the memory hierarchy, which consists of several levels of private and shared caches, as well as interconnections between separate memories in NUMA machines. To make full use of this hierarchy, it is necessary to improve the locality of memory accesses by reducing accesses(More)
Parallel applications use grid infrastructures to obtain more performance during their execution. The successful result of these executions depends directly on a performance analysis that takes into account the grid characteristics, such as the network topology and resources location. This paper presents Triva, a software analysis tool that implements a(More)
One of the new research tendencies within the well-established cluster computing area is the growing interest in the use of multiple workstation clusters as a single virtual parallel machine, in much the same way as individual workstations are nowadays connected to build a single parallel cluster. In this paper we present an analysis on several aspects(More)
A Simulator for SMT Architectures: Evaluating Instruction Cache Topologies Ronaldo Gonçalves, Eduard Ayguadé, Mateo Valero, Philippe Navaux 1 Departamento de Informática, Universidade Estadual de Maringá Avenida Colombo 5790, Maringá, Brazil {ronaldo@din.uem.br} 2 Departament d’Arquitectura de Computadors, Universitat Politècnica de Catalunya Jordi Girona(More)
Graphics Processing Units (GPUs) offer high computational power but require high scheduling strain to manage parallel processes, which increases the GPU cross section. The results of extensive neutron radiation experiments performed on NVIDIA GPUs confirm this hypothesis. Reducing the application Degree Of Parallelism (DOP) reduces the scheduling strain but(More)
Cache memories have traditionally been designed to exploit spatial locality by fetching entire cache lines from memory upon a miss. However, recent studies have shown that often the number of sub-blocks within a line that are actually used is low. Furthermore, those sub-blocks that are used are accessed only a few times before becoming dead (i.e., never(More)
This paper argues that connectionist systems are a good approach to implement a speech understanding computational model. In this direction, we propose SUM, a speech understanding model, which is a software architecture based on neurocognitive researches. The SUM's computational implementation applies wavelets transforms to speech signal processing and(More)
Research works have focused on high-performance on-chip interconnections with low cost and energy consumption for the next generation of many-core processors. In the same way, parallel applications will explore thread level parallelism and message-passing communication through a Network-on-Chip (NoC) to perform a high data throughput. Due to the dynamic(More)
Increase in graphics hardware performance and improvements in programmability has enabled GPUs to evolve from a graphics-specific accelerator to a general-purpose computing device. Titan, the world's second fastest supercomputer for open science in 2014, consists of more dum 18,000 GPUs that scientists from various domains such as astrophysics, fusion,(More)