Data diving with cross-validation: an investigation of broad-scale gradients in Swedish weed communities

Abstract

1 Multivariate analysis of complex data sets is plagued by problems of subjectivity and of ®nding statistically valid ways to test a large number of plausible hypotheses. We show how patterns in the data can be identi®ed (data diving) as well as rigorously tested statistically by subdividing the data set. 2 We analysed data on weed biomass and environmental variables from more than 2000 plots in cereal and oil-seed crops in Sweden during 1970±94. Half the data set was used in an exploratory phase while the other half was used in a subsequent con®rmatory phase. 3 The exploratory analyses included multivariate statistics [detrended correspondence analysis (DCA) and canonical correspondence analysis (CCA)] with various options and combinations of variables, and led to the formation of hypotheses that were then tested. 4 We tested the hypotheses in a sequential analysis with CCA and Monte Carlo permutation tests: after establishing the in ̄uence of one set of environmental variables, this set was covaried out in subsequent analyses. In this way the in ̄uence of (i) season of sowing of the crop; (ii) geographical region; (iii) soil type; (iv) crop species; and (v) temporal trends was tested. The four latter were tested separately for springand autumn-sown crops. 5 The sowing season of the crop had an overwhelming in ̄uence on the weed ̄ora, and many weed species, both annual and perennial, showed strong associations with either autumn or spring. There were signi®cant di€erences in weed ̄ora composition between the geographical regions and soil types as well as between crop species. There were signi®cant temporal trends only in the weed ̄ora of autumnsown crops. 6 This study provides a protocol that combines exploratory `data diving' with strict hypothesis testing using direct gradient analysis methods such as CCA. Such two-phase analysis should improve the way complex data are analysed and patterns are interpreted.

10 Figures and Tables

Statistics

0200400'04'05'06'07'08'09'10'11'12'13'14'15'16'17
Citations per Year

773 Citations

Semantic Scholar estimates that this publication has 773 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Hallgren1999DataDW, title={Data diving with cross-validation: an investigation of broad-scale gradients in Swedish weed communities}, author={E J Hallgren and Michael W. Palmer}, year={1999} }