Learn More
High-Performance Computing (HPC) in the cloud has reached the mainstream and is currently a hot topic in the research community and the industry. The attractiveness of cloud for HPC is the capability to run large applications on powerful, scalable hardware without needing to actually own or maintain this hardware. In this paper, we conduct a detailed(More)
Using the Cloud Computing paradigm for High-Performance Computing (HPC) is currently a hot topic in the research community and the industry. The attractiveness of Cloud Computing for HPC is the capability to run large applications on powerful, scalable hardware without needing to actually own or maintain this hardware. Most current research focuses on(More)
One of the main challenges for parallel architectures is the increasing complexity of the memory hierarchy, which consists of several levels of private and shared caches, as well as interconnections between separate memories in NUMA machines. To make full use of this hierarchy, it is necessary to improve the locality of memory accesses by reducing accesses(More)
Highly distributed systems such as Grids are used today to the execution of large-scale parallel applications. The behavior analysis of these applications is not trivial. The complexity appears because of the event correlation among processes, external influences like time-sharing mechanisms and saturation of network links, and also the amount of data that(More)
Increase in graphics hardware performance and improvements in programmability has enabled GPUs to evolve from a graphics-specific accelerator to a general-purpose computing device. Titan, the world's second fastest supercomputer for open science in 2014, consists of more dum 18,000 GPUs that scientists from various domains such as astrophysics, fusion,(More)
User-level threads have been used to implement migratable MPI processes. This is a better strategy to implement load balancing mechanisms. That is because, in general, these threads are faster to create, manage and migrate than heavy processes and kernel threads. However, they present some issues concerning private data because they break the private(More)
Cache memories have traditionally been designed to exploit spatial locality by fetching entire cache lines from memory upon a miss. However, recent studies have shown that often the number of sub-blocks within a line that are actually used is low. Furthermore, those sub-blocks that are used are accessed only a few times before becoming dead (i.e., never(More)
The communication latency between the cores in multiprocessor architectures differs depending on the memory hierarchy and the interconnections. With the increase of the number of cores per chip and the number of threads per core, this difference between the communication latencies is increasing. Therefore, it is important to map the threads of parallel(More)
Graphics Processing Units (GPUs) offer high computational power but require high scheduling strain to manage parallel processes, which increases the GPU cross section. The results of extensive neutron radiation experiments performed on NVIDIA GPUs confirm this hypothesis. Reducing the application Degree Of Parallelism (DOP) reduces the scheduling strain but(More)
This paper argues that connectionist systems are a good approach to implement a speech understanding computational model. In this direction, we propose SUM, a speech understanding model, which is a software architecture based on neurocognitive researches. The SUM's computational implementation applies wavelets transforms to speech signal processing and(More)