Artificial intelligence in colonoscopy: polyp fiction or clinical reality?
Article information
We read the article by Rønborg et al.,1 “Assessing the potential of artificial intelligence to enhance colonoscopy adenoma detection in clinical practice: a prospective observational trial” with great interest. The authors reported a modest and statistically non-significant improvement in adenoma detection rate (ADR) using a computer-aided detection (CADe) system (34.7% vs. 30.5%; adjusted risk ratio, 1.12; p=0.37), concluding limited practical impact in real-world settings.
We commend this realistic evaluation and concur that experienced endoscopists generally gain minimal additional diagnostic benefits from artificial intelligence (AI) support, as their high baseline detection rates leave limited room for improvement.2 However, the true potential of AI-assisted systems likely emerges more distinctly among moderately experienced or less-experienced endoscopists, who may initially have higher adenoma miss rates.2 Although AI could significantly aid inexperienced endoscopists, reliance on AI alone is problematic, as their skills are expected to improve, and their dependence on AI should decrease.
A key point is that AI can only detect adenomas that are clearly presented on the screen. Optimal visualization, which is crucially dependent on the endoscopic technique, directly determines the effectiveness of AI. Reducing blind spots remains a matter of procedural expertise rather than technological enhancement alone. As procedural proficiency improves, endoscopists naturally present more mucosa clearly, inherently increasing ADR and minimizing AI's incremental contribution. Therefore, a foundational colonoscopy is essential.
Moreover, certain studies speculate that variability in AI effectiveness results from using research-grade rather than commercial systems, raising questions about their direct applicability in clinical practice owing to inherent differences in internal and external validity.3 However, we find this less likely to significantly impact general AI effectiveness because sufficient training data typically yields moderate-to-high detection accuracy even with basic convolutional neural network architectures.4,5 Variability in reported effectiveness likely reflects differences in clinical workflows, patient populations, and study designs rather than the fundamental limitations of AI systems themselves.6
Notably, another recent real-world study by Park and Bae3 demonstrated significant ADR improvements with a different AI system (45.1% vs. 38.8%, p=0.010). Their results notably showed increased detection of smaller and non-polypoid adenomas and sessile serrated lesions, suggesting the potential advantages of optimized AI systems, even in clinical practice. This contrasting finding underscores the fact that AI can provide substantial diagnostic benefits under certain conditions or specific system optimizations.7
In addition, variations in real-world studies may reflect the integration and acceptance of AI in clinical practice. Differences in endoscopist attitudes, procedural workflows, and patient populations can significantly affect CADe performance outcomes. Rønborg et al.1 utilized a pseudo-randomized design with real-world conditions and pragmatic patient selection, enhancing external validity and generalizability. However, inherent limitations such as lack of blinding and potential observer bias may have influenced their results.
In contrast, Park and Bae3 applied propensity score matching and strict adherence to a structured workflow, possibly contributing to clearer positive outcomes. Such methodological differences between studies highlight the importance of carefully considering design nuances when interpreting AI effectiveness. Therefore, future studies should clearly document the procedural conditions, endoscopist behaviors, and patient characteristics to better contextualize their findings.
Thus, while Rønborg et al.’s findings1 caution against exaggerated expectations, they should not overly diminish confidence in AI. Existing literature consistently shows that AI’s diagnostic performance moderately to highly experienced.4,5 With the forthcoming era of AI-integrated clinical practice, the focus should shift towards understanding how to strategically deploy AI tools that target specific clinical scenarios and user groups to maximize patient benefit.8
We encourage future large-scale multicenter trials with stratified analyses based on endoscopist experience, comprehensive clinical outcomes beyond ADR, and evaluation of the long-term impact of AI on training and clinical outcomes. Moreover, exploring integration strategies such as combining AI with blind-spot monitoring or automatic quality feedback systems may further enhance AI's practical utility of AI.
In addition, CADe systems could enhance training programs for less-experienced endoscopists by providing real-time feedback and fostering procedural discipline from the outset of their careers. Further research should explicitly investigate the educational impact of integrating AI into routine training curricula and examine whether this accelerates the development of endoscopic proficiency and enhances overall detection performance.8
Finally, it is important to assess clinical outcomes beyond ADR, such as interval colorectal cancer rates or polyp miss rates, using tandem colonoscopy studies to validate whether modest ADR improvements translate into meaningful patient benefits. Such outcomes ultimately determine the true clinical value of integrating AI into colonoscopy practice.
In conclusion, we thank the authors for their valuable real-world evidence highlighting both the strengths and limitations of current AI-assisted colonoscopy practices. We advocate continued nuanced research and strategic applications to fully realize AI’s potential of AI in improving colorectal cancer prevention.
Notes
Conflicts of Interest
Chang Seok Bang is currently serving as a section editor for Clinical Endoscopy; however, he was not involved in the peer reviewer selection, evaluation, or decision process of this article.
Funding
This research was supported by the Bio and Medical Technology Development Program of the National Research Foundation (NRF), funded by the Korean government (MSIT) (No. RS-2023-00223501).
Author Contributions
Conceptualization: CSB; Data curation: all authors; Investigation: all authors; Methodology: all authors; Project administration: CSB; Resources: all authors; Supervision: CSB; Validation: CSB; Writing–original draft: all authors; Writing–review & editing: all authors.
