Do not log‐transform count data

  title={Do not log‐transform count data},
  author={Robert B. O'Hara and D. Johan Kotze},
  journal={Methods in Ecology and Evolution},
  • R. O'Hara, D. Kotze
  • Published 24 March 2010
  • Environmental Science
  • Methods in Ecology and Evolution
1. Ecological count data (e.g. number of individuals or species) are often log‐transformed to satisfy parametric test assumptions. 

Distance‐based multivariate analyses confound location and dispersion effects

1. A critical property of count data is its mean–variance relationship, yet this is rarely considered in multivariate analysis in ecology.

mvabund– an R package for model‐based analysis of multivariate abundance data

1. The mvabund package for R provides tools for model‐based analysis of multivariate abundance data in ecology.

A comparison of statistical approaches for analysis of count and proportion data in ecotoxicology

Ecotoxicologists often encounter count and pro- portion data that are rarely normally distributed. To meet the assumptions of the linear model, such data are usu- ally transformed or non-parametric

Do Not Divide Count Data with Count Data; A Story from Pollination Ecology with Implications Beyond

If the data structure, statistical analyses and interpretations of results are mixed up, valuable information can be lost and the traditional approach using number of visits per flower and count data models gives a much better chance of detecting effects.

Count transformation models

In ecology studies, uncertainties regarding whether and how to transform count data can be resolved in the framework of count transformation models, which were designed to simultaneously estimate an appropriate transformation and the linear effects of environmental variables by maximizing the exact count log‐likelihood.

Interpretation and Implications of Lognormal Linear Regression Used for Bacterial Enumeration.

The implication of slope and intercept from an unweighted linear regression and compare it to the results of the regression of log transformed data are explored to explore the way to interpret statistical results developed from arithmetic domain.

The Effects of Normalization, Transformation, and Rarefaction on Clustering of OTU Abundance

Three common methods, normalization, log transformation, and rarefaction, for handling outliers and erroneous abundances in Operational Taxonomic Unit (OTU) tables are tested and it is found that log transformation is closest to the unmodified OTUs data, indicating that when doing specific types of clustering analysis on OTU data, if outliers are to be adjusted within a clusteringAnalysis, a log transformation may be applied.

Should ecologists prefer model‐ over distance‐based multivariate methods?

This study shows that both model‐ and distance‐based methods have their place in the ecologist's statistical toolbox, whereas both CQO and CCA exhibited considerable flaws, especially with linear environmental gradients.

Effect of pitfall trap type and diameter on vertebrate by‐catches and ground beetle (Coleoptera: Carabidae) and spider (Araneae) sampling

1. To determine the diversity of epigaeic arthropods, pitfall traps are a suitable and widely accepted sampling method. Unfortunately small mammals such as shrews, voles and mice are accidentally

Range geometry and socio‐economics dominate species‐level biases in occurrence information

Aim Despite the central role of species distributions in ecology and conservation, occurrence information remains geographically and taxonomically incomplete and biased. Efforts to address this



Analysis of Frequency Count Data Using the Negative Binomial Distribution

A likelihood—ratio testing framework based on the negative binomial distribution that tests for the goodness of fit of this distribution to the observed counts, and then tests for differences in the mean and/or aggregation of the counts among treatments.

The generalized linear model for spatial data: assessing the effects of environmental covariates on population density in the field

The generalized linear model is adapted to accommodate spatially correlated, discrete data and P‐values for the overall significance of the models depended heavily on whether the GLM assumed a discrete or continuous response variable, and whether or not spatial autocorrelation in the response variable was accounted for.

How to Make Models Add Up — A Primer on GLMMs

Many problems in the analysis of ecological data have the format where there is an observed response that may be predicted by several covariates. Although the response can take several forms (e.g.

Impacts of Leaf-litter Addition on Carabids in a Conifer Plantation

Enhanced habitat heterogeneity (leaf-litter addition) in homogeneous plantations influenced the spatial distribution and composition of carabids, through altered abiotic (lower ground temperature in the leaf- litter plots) and biotic (more prey items) factors.

A protocol for data exploration to avoid common statistical problems

A protocol for data exploration is provided; current tools to detect outliers, heterogeneity of variance, collinearity, dependence of observations, problems with interactions, double zeros in multivariate analysis, zero inflation in generalized linear modelling, and the correct type of relationships between dependent and independent variables are discussed; and advice on how to address these problems when they arise is provided.

Analysing ecological data

As stated in the preface, finding a suitable and satisfactory model to explain the underlying patterns in ecological data is always a big challenge. The book tackles this difficulty and provides a

Analysing Ecological Data

Introduction.- Data management and software.- Advice for teachers.- Exploration.- Linear regression.- Generalised linear modelling.- Additive and generalised additive modelling.- Introduction to

Testing abundance-range size relationships in European carabid beetles (Coleoptera, Carabidae)

Examination of species’ characteristics revealed that widespread species are generally large bodied, generalists and are little influenced by human-altered landscapes, while species with restricted distributions are smaller bodies, specialists, and favour natural habitat.

Quasi-Poisson vs. negative binomial regression: how should we model overdispersed count data?

A dramatic difference on estimating abundance of harbor seals when using quasi-Poisson vs. negative binomial regression is presented and explained in light of the different weighting used in each regression method.