How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals.

  title={How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals.},
  author={Eric Wu and Kevin Wu and Roxana Daneshjou and David Ouyang and Daniel E. Ho and James Zou},
  journal={Nature medicine},
Medical artificial-intelligence (AI) algorithms are being increasingly proposed for the assessment and care of patients. Although the academic community has started to develop reporting guidelines for AI clinical trials1–3, there are no established best practices for evaluating commercially available algorithms to ensure their reliability and safety. The path to safe and robust clinical AI requires that important regulatory questions be addressed. Are medical devices able to demonstrate… Expand

Figures and Tables from this paper

Ensuring that biomedical AI benefits diverse populations
Key challenges to biomedical AI in outcome design, data collection and technology evaluation, and use examples from precision health to illustrate how bias and health disparity may arise in each stage are outlined. Expand
Evaluation and Real-World Performance Monitoring of Artificial Intelligence Models in Clinical Practice Purchase: Try It, Buy It, Check It.
Why regulatory clearance alone may not be enough to ensure AI will be safe and effective in all radiological practices and review strategies available resources for evaluating before clinical use and monitoring performance of AI models to ensure efficacy and patient safety are discussed. Expand
Lack of Transparency and Potential Bias in Artificial Intelligence Data Sets and Algorithms: A Scoping Review.
This scoping review identified 3 issues in data sets used to develop and test clinical AI algorithms for skin disease that should be addressed before clinical translation: sparsity of data set characterization and lack of transparency, nonstandard and unverified disease labels, and inability to fully assess patient diversity used for algorithm development and testing. Expand
Algorithm Change Protocols in the Regulation of Adaptive Machine Learning-Based Medical Devices.
It is argued that detail must be provided on how algorithm change protocols will be implemented in the EU in a manner that would allow the full benefits of AI/ML-based innovation for EU patients and health care systems to be realized. Expand
A comprehensive evaluation methodology for the publicly accessible AI services for medical diagnostics
A study develops and test an original methodology for a complex evaluation of open-access AI services in teleradiology, assessing the user experience, accessibility, safety, and diagnostic accuracy on the independent reference dataset. Expand
Towards Clinical Application of Artificial Intelligence in Ultrasound Imaging
This review introduces the global trends of medical AI research in US imaging from both clinical and basic perspectives and discusses US image preprocessing, ingenious algorithms that are suitable for US imaging analysis, AI explainability for obtaining informed consent, the approval process of medicalAI devices, and future perspectives towards the clinical application of AI-based US diagnostic support technologies. Expand
A scoping review of artificial intelligence applications in thoracic surgery.
  • K. Seastedt, Dana Moukheiber, +7 authors L. Celi
  • Medicine
  • European journal of cardio-thoracic surgery : official journal of the European Association for Cardio-thoracic Surgery
  • 2021
There is promise but also challenges for ML in thoracic surgery, and the transparency of data and algorithm design and the systemic bias on which models are dependent remain issues to be addressed. Expand
Low adherence to existing model reporting guidelines by commonly used clinical prediction models
Whether the documentation available for commonly used machine learning models developed by an electronic health record (EHR) vendor provides information requested by model reporting guidelines is assessed. Expand
Artificial intelligence for mechanical ventilation: systematic review of design, reporting standards, and bias.
Development of algorithms should involve prospective and external validation, with greater code and data availability to improve confidence in and translation of this promising approach. Expand


The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database
The first comprehensive and open access database of strictly AI/ML-based medical technologies that have been approved by the FDA is launched, which aims to raise awareness of the importance of regulatory bodies, clearly stating whether a medical device isAI/ML based or not. Expand
Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension
The CONSORT-AI (Consolidated Standards of Reporting Trials–Artificial Intelligence) extension is a new reporting guideline for clinical trials evaluating interventions with an AI component that will assist editors and peer reviewers to understand, interpret and critically appraise the quality of clinical trial design and risk of bias in the reported outcomes. Expand
Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension
The SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials–Artificial Intelligence) extension is a new reporting guideline for clinical trial protocols evaluating interventions with an AI component that will help promote transparency and completeness in clinical trial protocol reporting. Expand
Addressing health disparities in the Food and Drug Administration's artificial intelligence and machine learning regulatory framework
  • Kadija Ferryman
  • Computer Science, Medicine
  • J. Am. Medical Informatics Assoc.
  • 2020
Using the Food and Drug Administration's proposed framework for regulating machine learning tools in medicine, it is shown that addressing health disparities during the premarket and postmarket stages of review can help anticipate and mitigate group harms. Expand
MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports
A large dataset of 227,835 imaging studies for 65,379 patients presenting to the Beth Israel Deaconess Medical Center Emergency Department between 2011–2016 is described, making freely available to facilitate and encourage a wide range of research in computer vision, natural language processing, and clinical data mining. Expand
Minimum information about clinical artificial intelligence modeling: the MI-CLAIM checklist.
The MI-CLAIM checklist is presented, a tool intended to improve transparent reporting of AI algorithms in medicine and to improve transparency in the evaluation of algorithms used in medicine. Expand
Geographic Distribution of US Cohorts Used to Train Deep Learning Algorithms.
This study describes the US geographic distribution of patient cohorts used to train deep learning algorithms in published radiology, ophthalmology, dermatology, pathology, gastroenterology, andExpand
Cross-site performance of an algorithm Site SHC (N = 18688) BIDMC (N = 23204) NIH (N = 11196) SHC
  • Nat. Med
  • 2020
The Thirty-Third AAAI Conference on Artificial Intelligence 590–597 (Association for the Advancement of Artificial Intelligence, 2019)
  • 2019