Sampling bias in presence-only data used for species distribution modelling: theory and methods for detecting sample bias and its effects on models

  title={Sampling bias in presence-only data used for species distribution modelling: theory and methods for detecting sample bias and its effects on models},
  author={Bente St{\o}a and Rune Halvorsen and Sabrina Mazzoni and Vladimir I. Gusarov},
  pages={1 - 53}
Abstract This paper provides a theoretical understanding of sampling bias in presence-only data in the context of species distribution modelling. This understanding forms the basis for two integrated frameworks, one for detecting sampling bias of different kinds in presence-only data (the bias assessment framework) and one for assessing potential effects of sampling bias on species distribution models (the bias effects framework). We exemplify the use of these frameworks to museum data for nine… 

Figures and Tables from this paper

Presence-only species distribution models are sensitive to sample prevalence: Evaluating models using spatial prediction stability and accuracy metrics
A key recommendation of this study is the use of metrics to assess agreement between replicate predictions as a measure of spatial stability, rather than relying solely on performance metrics such as area under the curve (AUC).
Species distribution modelling of the Southern Ocean benthos: a review on methods, cautions and solutions
Abstract Species distribution modelling studies the relationship between species occurrence records and their environmental setting, providing a valuable approach to predicting species distribution
Bunching up the background betters bias in species distribution models
Sets of presence records used to model species’ distributions typically consist of observations collected opportunistically rather than systematically. As a result, sampling probability is
Modelling European small pelagic fish distribution: Methodological insights
Abstract The distribution of marine organisms is strongly influenced by climatic gradients worldwide. The ecological niche (sensu Hutchinson) of a species, i.e. the combination of environmental
The MIAmaxent R package: Variable transformation and model selection for species distribution models
The MIAmaxent R package is introduced, which provides a statistical approach to modeling species distributions similar to Maxent's, but with subset selection instead of lasso regularization, and decouples variable transformation, model fitting, and model selection.
Oh the places they’ll go: improving species distribution modelling for invasive forest pests in an uncertain world
The effects of various SDM design strategies on distribution mapping of four forest invasive species (FIS) in Canada are explored and simplifying SDM complexity and including biologically informed assumptions are recommended to achieve more accurate dispersal predictions, particularly when projecting FIS spread across time.
Reliability in Distribution Modeling—A Synthesis and Step-by-Step Guidelines for Improved Practice
Information about the distribution of a study object (e.g., species or habitat) is essential in face of increasing pressure from land or sea use, and climate change. Distribution models are
Analysis of potentially suitable habitat within migration connections of an intra-African migrant-the Blue Swallow (Hirundo atrocaerulea)
There is a negative impact of climate change on the distribution of Blue Swallow habitat and any increase in temperature results in the surge of unsuitable areas, so unless strict protection is awarded to the current suitable habitat, the suitable habitat and population of the Blue Swallows will continue to decline.
European cephalopods distribution under climate-change scenarios
This study focuses on three largely harvested and common cephalopod species in Europe and modelled their contemporary and potential future distributional range over the twenty-first century using a recently improved species ensemble modelling framework coupled with five atmosphere–ocean general circulation models.
Comparing maximum entropy modelling methods to inform aquaculture site selection for novel seaweed species
Abstract Maximum entropy (maxent) modelling is a widely used method for developing species distribution models (SDMs), but default maxent modelling methods can result in overly complex models with


Mapping Species Distributions with MAXENT Using a Geographically Biased Sample of Presence Data: A Performance Assessment of Methods for Correcting Sampling Bias
The ability of methods to correct the initial sampling bias varied greatly depending on bias type, bias intensity and species, but the simple systematic sampling of records consistently ranked among the best performing across the range of conditions tested, whereas other methods performed more poorly in most cases.
The Effects of Sampling Bias and Model Complexity on the Predictive Performance of MaxEnt Species Distribution Models
Correcting for geographical sampling bias led to major improvements in goodness of fit, but did not entirely resolve the problem: predictions made with clustered ecological data were inferior to those made with the herbarium dataset, even after sampling bias correction.
Sampling bias in geographic and environmental space and its effect on the predictive power of species distribution models
It is argued that species reproductive biology should be taken into account when distributional data are analysed in terms of their suitability for species distribution modelling, and will inform biodiversity conservation assessments, particularly those using data from natural history collections.
The importance of correcting for sampling bias in MaxEnt species distribution models
It is concluded that a substantial improvement in the quality of model predictions can be achieved if uneven sampling effort is taken into account, thereby improving the efficacy of species conservation planning.
Species distribution modelling—Effect of design and sample size of pseudo-absence observations
We explored the effect of varying pseudo-absence data in species distribution modelling using empirical data for four real species and simulated data for two imaginary species. In all analyses we
Species-specific tuning increases robustness to sampling bias in models of species distributions: An implementation with Maxent
Various methods exist to model a species’ niche and geographic distribution using environmental data for the study region and occurrence localities documenting the species’ presence (typically from
The effects of small sample size and sample bias on threshold selection and accuracy assessment of species distribution models
Species distribution models are used for a range of ecological and evolutionary questions, but often are constructed from few and/or biased species occurrence records. Recent work has shown that the
Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data.
It is argued that increased awareness of the implications of spatial bias in surveys, and possible modeling remedies, will substantially improve predictions of species distributions and as large an effect on predictive performance as the choice of modeling method.
Species distribution models and ecological theory: A critical assessment and some possible new approaches
Given the importance of knowledge of species distribution for conservation and climate change management, continuous and progressive evaluation of the statistical models predicting species
Spatial prediction of species distribution: an interface between ecological theory and statistical modelling
Neglect of ecological knowledge is a limiting factor in the use of statistical modelling to predict species distribution. Three components are needed for statistical modelling, an ecological model