The authors evaluated the reproducibility of a clinical algorithm consensus development process across three different physician panels at a health maintenance organization. Physician groups were composed of primary care internists, who were provided with identical selections from the medical literature and first-draft "seed" algorithms on the management of two common clinical problems: acute sinusitis and dyspepsia. Each panel used nominal group process and a modified Delphi method to create final algorithm drafts. To compare the clinical logic in the final algorithms, the authors applied a new qualitative and quantitative comparison method, the Clinical Algorithm Patient Abstraction (CAPA). Dyspepsia algorithms from all physician groups recommended empiric anti-acid therapy for most patients, favored endoscopy over barium swallow, and had very similar indications for endoscopy. The average CAPA comparison score among final physician algorithms was 6.1 on a scale of 0 (different) to 10 (identical). Sinusitis algorithms from all groups proposed empiric antibiotic therapy for most patients. Indications for sinus radiographs were similar between two algorithms (CAPA = 4.9), but differed significantly in the third, resulting in lower CAPA scores (average CAPA = 1.9, P < 0.03). The clinical similarity of the algorithms produced by these physician panels suggests a high level of reproducibility in this consensus-driven algorithm development process. However, the difference among the sinusitis algorithms suggests that physician consensus groups using a consensus process that a health maintenance organization can do with limited resources will produce some guidelines that vary due to differences in interpretation of evidence and physician experience.