Corpus ID: 237278282

The benefits of prefetching for large-scale cloud-based neuroimaging analysis workflows

  title={The benefits of prefetching for large-scale cloud-based neuroimaging analysis workflows},
  author={Val{\'e}rie Hayot-Sasson and Tristan Glatard and Ariel S. Rokem},
To support the growing demands of neuroscience applications, researchers are transitioning to cloud computing for its scalable, robust and elastic infrastructure. Nevertheless, large datasets residing in object stores may result in significant data transfer overheads during workflow execution. Prefetching, a method to mitigate the cost of reading in mixed workloads, masks data transfer costs within processing time of prior tasks. We present an implementation of “Rolling Prefetch”, a Python… Expand

Figures and Tables from this paper


Netco: Cache and I/O Management for Analytics over Disaggregated Stores
Experiments on a public cloud, with production-trace inspired workloads, show that Netco uses up to 5x less remote I/O compared to existing techniques and increases the number of jobs that meet their deadlines up to 80%. Expand
Design and evaluation of a compiler algorithm for prefetching
This paper proposes a compiler algorithm to insert prefetch instructions into code that operates on dense matrices, and shows that this algorithm significantly improves the execution speed of the benchmark programs-some of the programs improve by as much as a factor of two. Expand
Improving the Effectiveness of Burst Buffers for Big Data Processing in HPC Systems with Eley
Eley embraces interference-aware prefetching technique that makes reading data input faster while introducing low interference for HPC applications, and improves the performance of Big Data applications by up to 30% compared to existing BBs while maintaining the QoS of HPC Applications. Expand
Software prefetching
These simulations show that, even when generated by a very simple compiler algorithm, prefetch instructions can eliminate nearly all cache misses, while causing only modest increases in data traffic between memory and cache. Expand
Dipy, a library for the analysis of diffusion MRI data
Dipy aims to provide transparent implementations for all the different steps of dMRI analysis with a uniform programming interface, and has implemented classical signal reconstruction techniques, such as the diffusion tensor model and deterministic fiber tractography. Expand
Recognition of white matter bundles using local and global streamline-based registration and clustering
The purpose of the proposed method, named RecoBundles, is to segment white matter bundles and make virtual dissection easier to perform and robust and adaptive to incomplete data and bundles with missing components. Expand
Evaluating the reliability of human brain white matter tractometry
The overall approach taken here both demonstrates the specific trustworthiness of tractometry analysis and outlines what researchers can do to demonstrate the reliability of computational analysis pipelines in neuroimaging. Expand
Probabilistic streamline q-ball tractography using the residual bootstrap
The proposed residual bootstrap method utilizes a spherical harmonic representation for high angular resolution diffusion imaging (HARDI) data in order to estimate the uncertainty in multimodal q-ball reconstructions. Expand
Network neuroscience
This work reviews emerging trends in network neuroscience and attempts to chart a path toward a better understanding of the brain as a multiscale networked system. Expand
GPU-accelerated diffusion MRI tractography in DIPY. International Society for Magnetic Resonance in Medicine, May 2019
  • 2019