Davide Rossetti

Learn More
Lab-scale tests were set up to investigate different options for achieving sustainability by reducing long-term landfill emissions. The options which have been studied and compared with the traditional anaerobic landfill for unprocessed refuse were: mechanical-biological pretreatment, landfill aeration with forced and with natural advective air flow(More)
Modern GPUs support special protocols to exchange data directly across the PCI Express bus. While these protocols could be used to reduce GPU data transmission times, basically by avoiding staging to host memory, they require specific hardware features which are not available on current generation network adapters. In this paper we describe the(More)
QUonG is an INFN (Istituto Nazionale di Fisica Nucleare) initiative targeted to develop a high performance computing system dedicated to Lattice QCD computations. QUonG is a massively parallel computing platform that lever-ages on commodity multi-core processors coupled with last generation GPUs. Its network mesh exploits the characteristics of LQCD(More)
We describe herein the APElink+ board, a PCIe interconnect adapter featuring the latest advances in wire speed and interface technology plus hardware support for a RDMA programming model and experimental acceleration of GPU networking; this design allows us to build a low latency, high bandwidth PC cluster, the APEnet+ network, the new generation of our(More)
Increasing number of MPI applications are being ported to take advantage of the compute power offered by GPUs. Data movement on GPU clusters continues to be the major bottleneck that keeps scientific applications from fully harnessing the potential of GPUs. Earlier, GPU-GPU inter-node communication has to move data from GPU memory to host memory before(More)
Many scientific computations need multi-node parallelism for matching up both space (memory) and time (speed) ever-increasing requirements. The use of GPUs as accelerators introduces yet another level of complexity for the programmer and may potentially result in large overheads due to the complex memory hierarchy. Additionally, top-notch problems may(More)
We present preliminary results of a multi-GPU code for exploring large graphs (hundreds of millions vertices and billions of edges) by using the Breadth First Search algorithm. The GPU hosts are connected by APEnet+, a custom interconnection network that has full support for NVIDIA GPUDirect peer-topeer communication, i.e. the technology allowing a third(More)
— This paper presents Unified Communication X (UCX), a set of network APIs and their implementations for high throughput computing. UCX comes from the combined efforts of national laboratories, industry, and academia to design and implement a high-performing and highly-scalable network stack for next generation applications and systems. UCX design provides(More)