OBJECTIVE Comparisons of the performance of multiple health care providers are often based on hypothesis tests, those with resulting P-values below some critical threshold being identified as potentially extreme. Because of the multiple testing involved, the classical P-value threshold of, say, 0.05 may not be considered strict enough, as it will tend to lead to too many "false positives." However, we argue that the commonly used Bonferroni-corrected threshold is in general too strict for the problem in hand. The purpose of this article is to demonstrate a suitable alternative thresholding procedure that is already well established in other fields. STUDY DESIGN AND SETTING The suggested procedure involves control of an error measure called the "false discovery rate" (FDR). We present a worked example involving a comparison of risk-adjusted mortality rates following heart surgery in New York State hospitals during 2000-2002. It is shown that the FDR critical threshold lines can be drawn on a "funnel plot," providing a simple graphical presentation of the results. RESULTS The FDR procedure identified more providers as potentially extreme than the Bonferroni correction, while maintaining control of an intuitively sensible error measure. CONCLUSION Control of the FDR offers a simple guideline to determining where to draw critical thresholds when comparing multiple health care providers.