A distribution-based method for assessing the differences between clinical trial target populations and patient populations in electronic health records.
Randomized controlled trials generate high-quality medical evidence. However, the use of unjustified inclusion/exclusion criteria may compromise the external validity of a study. We have introduced a method to assess the population representativeness of related clinical trials using electronic health record (EHR) data. As EHR data may not perfectly represent the real-world patient population, in this work, we further validated the method and its results using the National Health and Nutrition Examination Survey (NHANES) data. We visualized and quantified the differences in the distributions of age, HbA1c, and BMI among the target population of Type 2 diabetes trials, diabetics in NHANES databases, and a convenience sample of patients enrolled in selected Type 2 diabetes trials. The results are consistent with the previous study.