A Case Study Competition Among Methods for Analyzing Large Spatial Data

  title={A Case Study Competition Among Methods for Analyzing Large Spatial Data},
  author={Matthew J. Heaton and Abhirup Datta and Andrew O. Finley and Reinhard Furrer and Joseph Guinness and Rajarshi Guhaniyogi and Florian Gerber and Robert B. Gramacy and Dorit M. Hammerling and Matthias Katzfuss and Finn Lindgren and Douglas W. Nychka and Furong Sun and Andrew Zammit‐Mangion},
  journal={Journal of Agricultural, Biological, and Environmental Statistics},
  pages={398 - 425}
The Gaussian process is an indispensable tool for spatial data analysts. The onset of the “big data” era, however, has lead to the traditional Gaussian process being computationally infeasible for modern spatial data. As such, various alternatives to the full Gaussian process that are more amenable to handling big spatial data have been proposed. These modern methods often exploit low-rank structures and/or multi-core and multi-threaded computing environments to facilitate computation. This… 

Competition on Spatial Statistics for Large Datasets

All the competition details and results along with some analysis of the competition outcomes are disclosed and made publicly available to serve as a benchmark for other approximation methods.

Computationally efficient joint species distribution modeling of big spatial data

A practical alleviation of this scalability constraint for joint species modeling is proposed by exploiting two spatial‐statistics techniques that facilitate the analysis of large spatial data sets: Gaussian predictive process and nearest‐neighbor Gaussian process.

Machine learning and geospatial methods for large-scale mining data

Out-of-sample validation exercise shows how GP methods can be both more economical (fewer human and compute resources), more accurate and better uncertainty quantification than kriging-based alternatives.

Computationally Efficient Estimation of Non-stationary Gaussian Process Models for Large Spatial Data.

This dissertation explores novel computation of non-stationary models when the field size is too large to compute the multivariate normal likelihood directly and develops and appropriately constrain spatially varying coefficient models to first normalize the data, select the most likely chemical contributers, and determine the mostlikely percent contributions of those chemicals across a sample.

Grid-Parametrize-Split (GriPS) for Improved Scalable Inference in Spatial Big Data Analysis

The Grid-Parametrize-Split (GriPS) approach for conducting Bayesian inference in spatially oriented big data settings by a combination of careful model construction and algorithm design to effectuate substantial improvements in MCMC convergence is introduced.

Spatial Multivariate Trees for Big Data Bayesian Regression

This work proposes Bayesian multivariate regression models based on spatial multivariate trees (SpamTrees) which achieve scalability via conditional independence assumptions on latent random effects following a treed directed acyclic graph.

A multi-resolution approximation via linear projection for large spatial datasets

A multi-resolution approximation via linear projection ($M$-RA-lp) is proposed, which conducts a linear projection approach on each subregion whenever a spatial domain is subdivided, which leads to an approximated covariance function capturing both the large- and small-scale spatial variations.

Practical Bayesian modeling and inference for massive spatial data sets on modest computing environments †

This article devises massively scalable Bayesian approaches that can rapidly deliver inference on spatial process that are practically indistinguishable from inference obtained using more expensive alternatives.

Bayesian Levy-Dynamic Spatio-Temporal Process: Towards Big Data Analysis

A new nonparametric, nonstationary and nonseparable dynamic spatio-temporal process is constructed with the additional realistic property that the lagged spatio -temporal correlations converge to zero as the lag tends to infinity.



Parallel inference for massive distributed spatial data using low-rank models

It is shown that for a very widely used class of spatial low-rank models, which can be written as a linear combination of spatial basis functions plus a fine-scale-variation component, parallel spatial inference and prediction for massive distributed data can be carried out exactly, meaning that the results are the same as for a traditional, non-distributed analysis.

Gaussian predictive process models for large spatial data sets

This work achieves the flexibility to accommodate non‐stationary, non‐Gaussian, possibly multivariate, possibly spatiotemporal processes in the context of large data sets in the form of a computational template encompassing these diverse settings.

Meta-Kriging: Scalable Bayesian Modeling and Inference for Massive Spatial Datasets

A divide-and-conquer strategy within the Bayesian paradigm that offers full posterior predictive inference at arbitrary locations for the outcome as well as the residual spatial surface after accounting for spatially oriented predictors is proposed.

A Multi-Resolution Approximation for Massive Spatial Datasets

A multi-resolution approximation (M-RA) of Gaussian processes observed at irregular locations in space is proposed, which can capture spatial structure from very fine to very large scales.

laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R

This work discusses an implementation of local approximate Gaussian process models, in the laGP package for R, that offers a particular sparse-matrix remedy uniquely positioned to leverage modern parallel computing architectures.

On nearest‐neighbor Gaussian process models for massive spatial data

A multivariate data analysis is presented which demonstrates how the nearest neighbor approach yields inference indistinguishable from the full rank GP despite being several times faster.

Improving the performance of predictive process modeling for large datasets

A comparison of spatial predictors when datasets could be very large

This article reviews and compares a number of methods of spatial prediction, and provides technical material for the unfamiliar, which includes the definition and motivation for each (deterministic and stochastic) spatial predictor.

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets

A class of highly scalable nearest-neighbor Gaussian process (NNGP) models to provide fully model-based inference for large geostatistical datasets are developed and it is established that the NNGP is a well-defined spatial process providing legitimate finite-dimensional Gaussian densities with sparse precision matrices.

A Divide-and-Conquer Bayesian Approach to Large-Scale Kriging

A three-step divide-and-conquer strategy within the Bayesian paradigm to achieve massive scalability for any spatial process model and offers significant advantages in applications where the entire data are or can be stored on multiple machines.