Learn More
High performance parallel computing infrastructures, such as computing clusters, have recently become freely available for scientific researchers to solve problems of unprecedented scale through data parallelization. However scientists are not necessarily skilled in writing efficient parallel code, especially when dealing with spatial datasets. Two(More)
Disk and network latency must be taken into account when applying parallel computing to large multidimensional datasets because they can hinder performance by reducing the rate at which data can be fed to the compute nodes. Existing methods aggregate some number of data requests from cluster nodes to improve overall performance by reducing the number of(More)
— We describe IDEA, an API designed specifically for the parallel processing of large spatial datasets on a cluster. Because such datasets present special challenges for efficient I/O and communication, it is especially valuable to provide an API that frees the user from the burden of partitioning the data among the processors. IDEA allows the user to(More)
Due to disk and network latencies, I/O performance remains a major bottleneck for HPC on large datasets. As an important I/O optimization technique, prefetching and caching are widely employed in modern file systems to speed up data access. However, they are optimized for sequential locality and not usually effective for volumetric scientific data retrieval(More)
  • 1