Gradient boosting with extreme-value theory for wildfire prediction

  title={Gradient boosting with extreme-value theory for wildfire prediction},
  author={Jonathan Koh},
  • Jonathan Koh
  • Published 18 October 2021
  • Computer Science
  • Extremes
This paper details the approach of the team Kohrrelation in the 2021 Extreme Value Analysis data challenge, dealing with the prediction of wildfire counts and sizes over the contiguous US. Our approach uses ideas from extreme-value theory in a machine learning context with theoretically justified loss functions for gradient boosting. We devise a spatial cross-validation scheme and show that in our setting it provides a better proxy for test set performance than naive cross-validation. The… 

A marginal modelling approach for predicting wildfire extremes across the contiguous United States

A methodology proposed for the EVA 2021 conference data challenge to predict the number and size of wildfires over the contiguous US between 1993 and 2015, with more importance placed on extreme events is detailed.

Insights into the drivers and spatio-temporal trends of extreme Mediterranean wildfires with statistical deep-learning

Extreme wildfires continue to be a significant cause of human death and biodiversity destruction within countries that encompass the Mediterranean Basin. Recent worrying trends in wildfire activity

A unifying partially-interpretable framework for neural network-based extreme quantile regression

A new methodological framework for performing extreme quantile regression using artificial neutral networks, which are able to capture complex non-linear relationships and scale well to high-dimensional data and a novel point process model for extreme values which overcomes the finite lower-endpoint problem associated with the generalised extreme value class of distributions.



Estimating the prediction performance of spatial models via spatial k-fold cross validation

A modified version of the CV method called spatial k-fold cross validation (SKCV) is proposed, which provides a useful estimate for model prediction performance without optimistic bias due to SAC, and can be applied as a criterion for selecting data sampling density for new research area.

A review of machine learning applications in wildfire science and management

A scoping review of ML in wildfire science and management, identified 298 relevant publications, where the most frequently used ML methods included random forests, MaxEnt, artificial neural networks, decision trees, support vector machines, and genetic algorithms.

Comparing Density Forecasts Using Threshold- and Quantile-Weighted Scoring Rules

We propose a method for comparing density forecasts that is based on weighted versions of the continuous ranked probability score. The weighting emphasizes regions of interest, such as the tails or

A data-driven approach to assess large fire size generation in Greece

Identifying factors and drivers which control large fire size generation is critical for planning fire management activities. This study attempts to determine the role of fire suppression tactics and

INLA goes extreme: Bayesian tail regression for the estimation of high spatio-temporal quantiles

This work estimates a high non-stationary threshold using a gamma distribution for precipitation intensities that incorporates spatial and temporal random effects and develops a penalized complexity (PC) prior specification for the tail index that shrinks the GP model towards the exponential distribution, thus preventing unrealistically heavy tails.

Spatiotemporal prediction of wildfire size extremes with Bayesian finite sample maxima

  • M. JosephM. Rossi J. Balch
  • Environmental Science
    Ecological applications : a publication of the Ecological Society of America
  • 2019
It is concluded that recent extremes should not be surprising, and that the contiguous United States may be on the verge of even larger wildfire extremes.

A Neural Network Model for Wildfire Scale Prediction Using Meteorological Factors

This model enables fire rescuers to take appropriate measures to minimize damage caused by a wildfire based on its predicted scale in the fire’s early stages using meteorological information.


We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as

Greedy function approximation: A gradient boosting machine.

A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.

A Space–Time Conditional Intensity Model for Evaluating a Wildfire Hazard Index

Numerical indices are commonly used as tools to aid wildfire management and hazard assessment. Although the use of such indices is widespread, assessment of these indices in their respective regions