Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations

  title={Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations},
  author={Sameh Abdulah and Hatem Ltaief and Ying Sun and Marc G. Genton and David E. Keyes},
  journal={2018 IEEE International Conference on Cluster Computing (CLUSTER)},
  • Sameh Abdulah, H. Ltaief, +2 authors D. Keyes
  • Published 2018
  • Computer Science, Mathematics
  • 2018 IEEE International Conference on Cluster Computing (CLUSTER)
Maximum likelihood estimation is an important statistical technique for estimating missing data, for example in climate and environmental applications, which are usually large and feature data points that are irregularly spaced. In particular, the Gaussian log-likelihood function is the de facto model, which operates on the resulting sizable dense covariance matrix. The advent of high performance systems with advanced computing power and memory capacity have enabled full simulations only for… Expand
Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC
This contribution reduces the precision of weakly correlated locations to single- or half- precision based on distance, and exploits mathematical structure to migrate MLE to a three-precision approximation that takes advantage of contemporary architectures offering BLAS3-like operations in a single instruction that are extremely fast for reduced precision. Expand
ExaGeoStatR: A Package for Large-Scale Geostatistics in R
The ExaGeoStatR package is presented, a package for large-scale Geostatistics in R that supports parallel computation of the maximum likelihood function on shared memory, GPU, and distributed systems and assesses its accuracy using both synthetic datasets and a sea surface temperature dataset. Expand
Geostatistical Modeling and Prediction Using Mixed Precision Tile Cholesky Factorization
A mixed-precision tile algorithm to accelerate the Cholesky factorization during the log-likelihood function evaluation is presented and an average of 1.6X performance speedup is obtained on massively parallel architectures while maintaining the accuracy necessary for modeling and prediction. Expand
ExaGeoStatR: Harnessing HPC Capabilities for Large Scale Geospatial Modeling Using R
Large scale simulations and parallel computing techniques are becoming essential in Gaussian process calculations applications. The Gaussian log-likelihood function is used in Geospatial applicationsExpand
High Performance Multivariate Geospatial Statistics on Manycore Systems
A large-scale multivariate spatial modeling and inference on parallel hardware architectures is developed and a novel algorithm is proposed to assess the prediction accuracy after the online parameter estimation, which demonstrates accuracy robustness and performance scalability on a variety of computer systems. Expand
Competition on Spatial Statistics for Large Datasets
All the competition details and results along with some analysis of the competition outcomes are disclosed and made publicly available to serve as a benchmark for other approximation methods. Expand
Comments on: Data science, big data and statistics
Functional data analysis techniques have been increasingly used for space–time data, where data are typically considered to be functional but are correlated in time and/or space and the data structure can be even more complicated. Expand
Scalable3-BO: Big Data meets HPC - A scalable asynchronous parallel high-dimensional Bayesian optimization framework on supercomputers
  • Anh Tran
  • Computer Science, Mathematics
  • ArXiv
  • 2021
This work proposes the Scalable-BO framework, which employs sparse GP as the underlying surrogate model to scope with Big Data and is equipped with a random embedding to efficiently optimize high-dimensional problems with low effective dimensionality. Expand
Nonstationary cross-covariance functions for multivariate spatio-temporal random fields
A review of the state-of-the-art methods and technical progress regarding model construction and a rich class of multivariate spatio-temporal asymmetric nonstationary models stemming from the Lagrangian framework are introduced. Expand
High Performance Multivariate Spatial Modeling for Geostatistical Data on Manycore Systems
This research used GPU-based systems as well as Shaheen supercomputer hosted at the Supercomputing Laboratory at King Abdullah University of Science and Technology (KAUST) for computer time. Expand


Hierarchical Low Rank Approximation of Likelihoods for Large Spatial Datasets
This work develops a new approximation scheme for maximum likelihood estimation and shows how the composite likelihood method can be adapted to provide different types of hierarchical low rank approximations that are both computationally and statistically efficient. Expand
Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets
For Gaussian process models, likelihood-based methods are often difficult to use with large irregularly spaced spatial datasets, because exact calculations of the likelihood for n observationsExpand
ExaGeoStat: A High Performance Unified Framework for Geostatistics on Manycore Systems
The ExaGeoStat framework takes a first step in the merger of large-scale data analytics and extreme computing for geospatial statistical applications, to be followed by additional complexity reducing improvements from the solver side that can be implemented under the same interface. Expand
Covariance Tapering for Likelihood-Based Estimation in Large Spatial Data Sets
Maximum likelihood is an attractive method of estimating covariance parameters in spatial models based on Gaussian processes. But calculating the likelihood can be computationally infeasible forExpand
Gaussian predictive process models for large spatial data sets.
This work achieves the flexibility to accommodate non-stationary, non-Gaussian, possibly multivariate, possibly spatiotemporal processes in the context of large data sets in the form of a computational template encompassing these diverse settings. Expand
A full scale approximation of covariance functions for large spatial data sets
Summary.  Gaussian process models have been widely used in spatial statistics but face tremendous computational challenges for very large data sets. The model fitting and spatial prediction of suchExpand
We discuss the statistical properties of a recently introduced unbiased stochastic approximation to the score equations for maximum likelihood calculation for Gaussian processes. Under certainExpand
Fixed rank kriging for very large spatial data sets
Spatial statistics for very large spatial data sets is challenging. The size of the data set, "n", causes problems in computing optimal spatial predictors such as kriging, since its computationalExpand
ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems
The ExaGeoStat software takes a first step in the merger of large-scale data analytics and extreme computing for geospatial statistical applications, to be followed by additional complexity reducing improvements from the solver side that can be implemented under the same interface. Expand
Application of hierarchical matrices for computing the Karhunen–Loève expansion
A log-linear computational cost of the matrix-vector product and a log- linear storage requirement yield an efficient and fast discretisation of the random fields presented. Expand