OBJECTIVES Reducing preventable deaths because of uncontrolled hemorrhage, tension pneumothorax, and airway loss is a priority. As part of a research initiative comparing different training models, this study evaluated the reliability and validity of a test that assesses combat medic performance during a polytrauma scenario using live animal models. METHODS Nine procedural checklists and seven global rating scales were piloted with four cohorts of soldiers (n = 94) at two U.S. training sites. Cohorts represented "novice" to "proficient" trainees. Procedure scores and a mean global score were calculated per subject. The intraclass correlation was calculated per procedure, with 0.70 as the threshold for acceptability. An overall difference among cohorts was hypothesized: Cohort 4 (proficient) > Cohort 3 (competent) > Cohort 2 (beginners) > Cohort 1 (novice) trainees. Data were analyzed using Kruskal-Wallis and analysis of variance. RESULTS At Site A, intraclass correlation coefficients ranged from 74% to 93% for 6 of 9 procedures. Cohorts differed significantly on hemorrhage control, needle decompression, cricothyrotomy, amputation management, chest tube insertion, and mean global scores. Cohort 4 outperformed the others, and Cohorts 2 and 3 outperformed Cohort 1. CONCLUSION The test differentiates novices from beginners, competent, and proficient trainees on difficult procedures and overall performance.