Applying the right statistics: analyses of measurement studies

  title={Applying the right statistics: analyses of measurement studies},
  author={J. Martin Bland and D. G. Altman},
  journal={Ultrasound in Obstetrics and Gynecology},
  • J. BlandD. Altman
  • Published 1 July 2003
  • Mathematics
  • Ultrasound in Obstetrics and Gynecology
The study of measurement error, observer variation and agreement between different methods of measurement are frequent topics in the imaging literature. We describe the problems of some applications of correlation and regression methods to these studies, using recent examples from this literature. We use a simulated example to show how these problems and misinterpretations arise. We describe the 95% limits of agreement approach and a similar, appropriate, regression technique. We discuss the… 

Agreed statistics: measurement method comparison.

An alternative approach, based on graphical techniques and simple calculations, is described, together with the relation between this analysis and the assessment of repeatability.

Conducting correlation analysis: important limitations and pitfalls

Why the coefficient is invalid when used to assess agreement of two methods aiming to measure a certain value, and better alternatives, such as the intraclass coefficient and Bland–Altman’s limits of agreement are discussed.

Measurement Consistency from Magnet Resonance Images 1

In almost all cases, using only one method is insufficient and it is recommended that several methods be used simultaneously, and in general, ANOVA performs the best.

Reliability, repeatability and reproducibility: analysis of measurement errors in continuous variables

  • J. BartlettC. Frost
  • Psychology
    Ultrasound in obstetrics & gynecology : the official journal of the International Society of Ultrasound in Obstetrics and Gynecology
  • 2008
The general concepts of agreement and reliability are distinguished to aid researchers in considering which are relevant for their particular application, and the fact that reliability depends on the population in which measurements are made, and not just on the measurement errors of the measurement method is highlighted.

Statistical Methods: Reliability Assessment and Method Comparison

Common statistical methods for assessing reliability and agreement between methods, including the intraclass correlation coefficient, coefficient of variation, Bland-Altman plot, limits of agreement, percent agreement, and the kappa statistic are described.

How Replicates Can Inform Potential Users of a Measurement Procedure about Measurement Error: Basic Concepts and Methods

This paper encourages clearly conceptually distinguishing between investigations of the measurement error of a single measurement procedure and the comparison between different measurement procedures or observers and describes the link to the existing general statistical methodology.

Improvements in the application and reporting of advanced Bland–Altman methods of comparison

The implementation of a freely available implementation accompanied by a formal description of the more advanced Bland–Altman comparison methods is provided and a standard format of reporting is proposed that would improve analysis and interpretation of comparison studies.

Longitudinal MRI data analysis in presence of measurement error but absence of replicates

This article proposes a novel method for the analysis of unreplicated longitudinal data under the presence of measurement errors using mixed-effect regression and develops a new EM-Variogram technique to estimate regression coefficients as well as variance components.

The Case for Using the Repeatability Coefficient When Calculating Test–Retest Reliability

A case is made for clinicians to consider measurement error (ME) indices Coefficient of Repeatability (CR) or the Smallest Real Difference (SRD) over relative reliability coefficients like the Pearson’s (r) and the Intraclass Correlation Coefficient (ICC) while selecting tools to measure change and inferring change as true.



Statistics Notes: Measurement error and correlation coefficients

This work considers the use of correlation coefficients to quantify measurement error, a variation between measurements of the same quantity on the same individual which has a simple clinical interpretation.

Measuring agreement in method comparison studies

The 95% limits of agreement, estimated by mean difference 1.96 standard deviation of the differences, provide an interval within which 95% of differences between measurements by the two methods are expected to lie.

Measurement in Medicine: The Analysis of Method Comparison Studies

This paper shall describe what is usually done, show why this is inappropriate, suggest a better approach, and ask why such studies are done so badly.

Renal volume measurements: accuracy and repeatability of US compared with that of MR imaging.

Renal volume was underestimated with US and a comparable underestimation was found when the ellipsoid formula was applied to MR images, indicating that the inaccuracy of US renal volume measurements occurred because the kidney does not resemble an ellipSOid and was not primarily related to the imaging modality.

Statistics Notes: Some examples of regression towards the mean

It is shown that regression towards the mean occurs whenever the authors select an extreme group based on one variable and then measure another variable for that group, and that the mean of the extreme group is now closer to themean of the whole population.

Single X-ray absorptiometry: Performance characteristics and comparison with single photon absorptiometry

The new single X-ray absorptiometry forearm bone densitometer described in this paper has performance characteristics which allows it to be used both for diagnostic purposes and for the follow-up of treatment.

Statistic Notes: Regression towards the mean

The statistical term “regression,” from a Latin root meaning “going back,” was first used by Francis Galton in his paper “Regression towards Mediocrity in Hereditary Stature.”1 Galton related the

Pulmonary emphysema: subjective visual grading versus objective quantification with macroscopic morphometry and thin-section CT densitometry.

Systematic overestimation and moderate interobserver agreement may compromise subjective visual grading of emphysema, which suggests that subjectiveVisual grading should be supplemented with objective methods to achieve precise, reader-independent quantification of empysema.