The Two Sample Problem with Censored Data


A medical investigator attempting to compare two different treatments for, say, prolongation of life among disease victims, often finds himself in the following situation: at time T, when it is necessary to end the experiment, or at least evaluate the results up to that time, a certain number of the patients in each treatment group will still be alive. His data will then be represented by two sets of numbers which might look like Xl, X2, X3+, X4, X5+, X6, ... , Xm and yl, Y2+, Y3+, Y4, * *, y.. Here xi and x2 would represent actual lifetimes, while X3+, a "censored" observation, represents a lifetime known only to exceed x3. If all the patients in both treatment groups were treated at time 0, then every + value would be equal to T, a situation that has been investigated by Halperin [1]. Frequently, however, patients enter the investigation at different times after it has begun, and the x+ and y+ values may range from 0 to T. Such a situation, of course, complicates the comparison of the two treatments, particularly if the mechanism censoring the x values is different from that censoring the y values. This may happen, for instance, if the x sequence was run some time ago, so that nearly all the patients have been observed to their death times, while the y sequence is begun later, and contains many censored observations. Gehan [2] and Gilbert [3] have independently proposed the same extension of the Wilcoxon statistic as a solution to the two sample problem with censored data. In this paper the problem is discussed further, and a different test statistic is proposed, which is shown to be, in some ways, superior to the GehanGilbert statistic.

3 Figures and Tables

Citations per Year

161 Citations

Semantic Scholar estimates that this publication has 161 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Efron2005TheTS, title={The Two Sample Problem with Censored Data}, author={Bradley Efron}, year={2005} }