# Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations

@article{Abdulah2018ParallelAO, title={Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations}, author={Sameh Abdulah and Hatem Ltaief and Ying Sun and Marc G. Genton and David E. Keyes}, journal={2018 IEEE International Conference on Cluster Computing (CLUSTER)}, year={2018}, pages={98-108} }

Maximum likelihood estimation is an important statistical technique for estimating missing data, for example in climate and environmental applications, which are usually large and feature data points that are irregularly spaced. In particular, the Gaussian log-likelihood function is the de facto model, which operates on the resulting sizable dense covariance matrix. The advent of high performance systems with advanced computing power and memory capacity have enabled full simulations only for… Expand

#### Figures, Tables, and Topics from this paper

#### 15 Citations

Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC

- Computer Science
- IEEE Transactions on Parallel and Distributed Systems
- 2022

This contribution reduces the precision of weakly correlated locations to single- or half- precision based on distance, and exploits mathematical structure to migrate MLE to a three-precision approximation that takes advantage of contemporary architectures offering BLAS3-like operations in a single instruction that are extremely fast for reduced precision. Expand

ExaGeoStatR: A Package for Large-Scale Geostatistics in R

- Computer Science, Mathematics
- ArXiv
- 2019

The ExaGeoStatR package is presented, a package for large-scale Geostatistics in R that supports parallel computation of the maximum likelihood function on shared memory, GPU, and distributed systems and assesses its accuracy using both synthetic datasets and a sea surface temperature dataset. Expand

Geostatistical Modeling and Prediction Using Mixed Precision Tile Cholesky Factorization

- Computer Science
- 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC)
- 2019

A mixed-precision tile algorithm to accelerate the Cholesky factorization during the log-likelihood function evaluation is presented and an average of 1.6X performance speedup is obtained on massively parallel architectures while maintaining the accuracy necessary for modeling and prediction. Expand

ExaGeoStatR: Harnessing HPC Capabilities for Large Scale Geospatial Modeling Using R

Large scale simulations and parallel computing techniques are becoming essential in Gaussian process calculations applications. The Gaussian log-likelihood function is used in Geospatial applications… Expand

High Performance Multivariate Geospatial Statistics on Manycore Systems

- Computer Science
- IEEE Transactions on Parallel and Distributed Systems
- 2021

A large-scale multivariate spatial modeling and inference on parallel hardware architectures is developed and a novel algorithm is proposed to assess the prediction accuracy after the online parameter estimation, which demonstrates accuracy robustness and performance scalability on a variety of computer systems. Expand

Competition on Spatial Statistics for Large Datasets

- Computer Science
- Journal of Agricultural, Biological and Environmental Statistics
- 2021

All the competition details and results along with some analysis of the competition outcomes are disclosed and made publicly available to serve as a benchmark for other approximation methods. Expand

Comments on: Data science, big data and statistics

- Computer Science
- TEST
- 2019

Functional data analysis techniques have been increasingly used for space–time data, where data are typically considered to be functional but are correlated in time and/or space and the data structure can be even more complicated. Expand

Scalable3-BO: Big Data meets HPC - A scalable asynchronous parallel high-dimensional Bayesian optimization framework on supercomputers

- Computer Science, Mathematics
- ArXiv
- 2021

This work proposes the Scalable-BO framework, which employs sparse GP as the underlying surrogate model to scope with Big Data and is equipped with a random embedding to efficiently optimize high-dimensional problems with low effective dimensionality. Expand

Nonstationary cross-covariance functions for multivariate spatio-temporal random fields

- Computer Science
- 2020

A review of the state-of-the-art methods and technical progress regarding model construction and a rich class of multivariate spatio-temporal asymmetric nonstationary models stemming from the Lagrangian framework are introduced. Expand

High Performance Multivariate Spatial Modeling for Geostatistical Data on Manycore Systems

- Computer Science
- ArXiv
- 2020

This research used GPU-based systems as well as Shaheen supercomputer hosted at the Supercomputing Laboratory at King Abdullah University of Science and Technology (KAUST) for computer time. Expand

#### References

SHOWING 1-10 OF 45 REFERENCES

Hierarchical Low Rank Approximation of Likelihoods for Large Spatial Datasets

- Mathematics, Computer Science
- 2016

This work develops a new approximation scheme for maximum likelihood estimation and shows how the composite likelihood method can be adapted to provide different types of hierarchical low rank approximations that are both computationally and statistically efficient. Expand

Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets

- Mathematics
- 2016

For Gaussian process models, likelihood-based methods are often difficult to use with large irregularly spaced spatial datasets, because exact calculations of the likelihood for n observations… Expand

ExaGeoStat: A High Performance Unified Framework for Geostatistics on Manycore Systems

- Computer Science
- ArXiv
- 2017

The ExaGeoStat framework takes a first step in the merger of large-scale data analytics and extreme computing for geospatial statistical applications, to be followed by additional complexity reducing improvements from the solver side that can be implemented under the same interface. Expand

Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets

- Mathematics
- 2008

Maximum likelihood is an attractive method of estimating covariance parameters in spatial models based on Gaussian processes. But calculating the likelihood can be computationally infeasible for… Expand

Gaussian predictive process models for large spatial data sets.

- Mathematics, Medicine
- Journal of the Royal Statistical Society. Series B, Statistical methodology
- 2008

This work achieves the flexibility to accommodate non-stationary, non-Gaussian, possibly multivariate, possibly spatiotemporal processes in the context of large data sets in the form of a computational template encompassing these diverse settings. Expand

A full scale approximation of covariance functions for large spatial data sets

- Mathematics
- 2012

Summary. Gaussian process models have been widely used in spatial statistics but face tremendous computational challenges for very large data sets. The model fitting and spatial prediction of such… Expand

STOCHASTIC APPROXIMATION OF SCORE FUNCTIONS FOR GAUSSIAN PROCESSES

- Mathematics
- 2013

We discuss the statistical properties of a recently introduced unbiased stochastic approximation to the score equations for maximum likelihood calculation for Gaussian processes. Under certain… Expand

Fixed rank kriging for very large spatial data sets

- Mathematics
- 2008

Spatial statistics for very large spatial data sets is challenging. The size of the data set, "n", causes problems in computing optimal spatial predictors such as kriging, since its computational… Expand

ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems

- Computer Science
- IEEE Transactions on Parallel and Distributed Systems
- 2018

The ExaGeoStat software takes a first step in the merger of large-scale data analytics and extreme computing for geospatial statistical applications, to be followed by additional complexity reducing improvements from the solver side that can be implemented under the same interface. Expand

Application of hierarchical matrices for computing the Karhunen–Loève expansion

- Mathematics, Computer Science
- Computing
- 2008

A log-linear computational cost of the matrix-vector product and a log- linear storage requirement yield an efficient and fast discretisation of the random fields presented. Expand