The use of simulation for competency assessment requires validation of the simulator’s performance metrics. This study evaluated whether the Simbionix GI Mentor II virtual reality simulator metrics differentiate gastrointestinal endoscopists with varying clinical experience (known-groups construct validity). For this study, 20 subjects (medical and surgical) were classified into two groups based on self-reported clinical experience with colonoscopy: a novice group (<5 scope experiences, n = 12) and an experienced group (>50 scope experiences, n = 8). Three virtual colonoscopy simulation modules of increasing difficulty were used (modules I-1, II-2, and I-7). The data reported by the simulator after each module were compared using the Wilcoxon–Mann–Whitney test. Data are expressed as median and interquartile range (IQR). A p value less than 0.05 was considered statistically significant. With module 1, only the time taken to reach the cecum was different between the groups: experienced group (1.6 min; IQR, 1.2–1.9 min) versus novice group (3.2 min; IQR, 2.4–4 min) (p < 0.01). With module 2, the two groups differed only in the time needed to reach the cecum (experienced group: 2.3 min; IQR, 1.6–2.3 min vs novice group: 3.3 min; IQR, 2.3–4.2 min; p = 0.03) and overall efficiency (experienced group: 94%; IQR, 94–96% vs novice group: 88%, IQR, 69–92%) (p < 0.01). In contrast, with the module 3 (the most difficult), performance differed between the groups for most of the parameters. The experienced group reached the cecum faster (5.7 min; IQR, 3.6–6.6 min vs. 14 min; IQR, 9–16 min; p < 0.01) and had fewer occasions of lost view (0.5; IQR, 0–1 vs. 2; IQR, 2–3; p < 0.01), fewer episodes of excessive pressure (2; IQR, 1–2 vs. 4.5; IQR, 2.5–6; p < 0.01), and greater overall efficiency (87%; IQR, 82–89% vs. 29%; IQR, 23–55%; p < 0.01). There were no differences in the percentage of time the patient was in pain or in the total time the colon was looped. The experienced group saw slightly less of the mucosa (91%; IQR, 89–92% vs 94%; IQR, 93–95%; p = 0.01). The GI Mentor II metrics differentiated novice colonoscopists from those with more clinical experience, but primarily when used to evaluate the more complex scenarios. In setting performance benchmarks, the case scenario must be taken into account.