1. We study the patients in a trial not to find out anything about them but to predict what may happen to future patients given these treatments. When a large sample size is needed to detect a statistically significant difference between treatments, the magnitude of the beneficial effect is moderate. This is reflected by the 95% confidence interval. If the upper limit is close to one (no-risk line), the precision of the effect estimate, although statistically significant (p<0.05), is uncertain. In other words, the clinical relevance of the considered intervention might be very small for part of the referring population. This may be the case for the ARDSnet  and PROWESS  trials where 95% confidence intervals were 0.65–0.93 and 0.69–0.94, respectively. Simply shifting ten events from one group to another can change the conclusions. Could these results reliably and easily be extrapolated and generalised? 2. The extent to which it is wise or safe to generalise should be judged in individual circumstances, and there may not be a consensus . Arguably, many RCTs use overrestriction inclusion criteria to maximise the effect of the intervention under investigation (or the power of the study), so that the degree of safe generalisability is reduced. For example, in the ARDSnet trial only 13% of the ARDS patients admitted in the participating intensive care units were enrolled in the trial. . These kinds of data are not available for the PROWESS trial. Not surprisingly, trials in phase IV, where the inclusion/exclusion criteria are not strictly controlled, may return controversial or even opposite results, and the intervention not implemented in daily care. Rigour of methodology in performing (multicentre) RCTs is welcome, but criteria for applicability of results to the local patient should be investigated more extensively. Generalisation of RCTs is an intriguing, still open, issue.