Effectiveness of a novel artificial intelligence-assisted colonoscopy system for adenoma detection: a prospective, propensity score-matched, non-randomized controlled study in Korea
Article information
Abstract
Background/Aims
The real-world effectiveness of computer-aided detection (CADe) systems during colonoscopies remains uncertain. We assessed the effectiveness of the novel CADe system, ENdoscopy as AI-powered Device (ENAD), in enhancing the adenoma detection rate (ADR) and other quality indicators in real-world clinical practice.
Methods
We enrolled patients who underwent elective colonoscopies between May 2022 and October 2022 at a tertiary healthcare center. Standard colonoscopy (SC) was compared to ENAD-assisted colonoscopy. Eight experienced endoscopists performed the procedures in randomly assigned CADe- and non-CADe-assisted rooms. The primary outcome was a comparison of ADR between the ENAD and SC groups.
Results
A total of 1,758 sex- and age-matched patients were included and evenly distributed into two groups. The ENAD group had a significantly higher ADR (45.1% vs. 38.8%, p=0.010), higher sessile serrated lesion detection rate (SSLDR) (5.7% vs. 2.5%, p=0.001), higher mean number of adenomas per colonoscopy (APC) (0.78±1.17 vs. 0.61±0.99; incidence risk ratio, 1.27; 95% confidence interval, 1.13–1.42), and longer withdrawal time (9.0±3.4 vs. 8.3±3.1, p<0.001) than the SC group. However, the mean withdrawal times were not significantly different between the two groups in cases where no polyps were detected (6.9±1.7 vs. 6.7±1.7, p=0.058).
Conclusions
ENAD-assisted colonoscopy significantly improved the ADR, APC, and SSLDR in real-world clinical practice, particularly for smaller and nonpolypoid adenomas.
INTRODUCTION
Colorectal cancer (CRC) is the third most common cancer and the second leading cause of cancer-related deaths worldwide.1 Detecting and resecting precancerous polyps using endoscopic methods are crucial for CRC risk reduction.2 Hence, colonoscopy has become the standard procedure for diagnosing and preventing CRC.3,4 However, 24% to 26% of adenomas may be missed during colonoscopy, significantly contributing to the occurrence of interval CRC.5,6 To standardize colonoscopy screening and enhance its efficacy, various quality metrics have been established, with the adenoma detection rate (ADR) being the most critical.7,8 An inverse relationship has been established between the ADR calculated by endoscopists and the incidence and mortality of interval CRC.9,10 Various approaches and advancements in colonoscopy techniques have been developed to increase ADR, including improvements in bowel preparation, enhanced imaging techniques, and the use of additional endoscopic devices.11-14
Recently, in many randomized controlled trials (RCTs), the implementation of computer-aided detection (CADe) systems based on artificial intelligence (AI) technology during real-time colonoscopy showed promise in increasing adenoma detection.15-19 This new technique, CADe-assisted colonoscopy, has been recognized to be more effective in improving ADR than traditional techniques.20 However, it lacks substantial advantages in adenoma detection in real-world settings, raising concerns regarding its applicability in routine clinical practice.21-24 This variability in the effectiveness of CADe-assisted colonoscopy in real-world clinical practice may be attributed to integration and workflow challenges, differences in patient populations, and the risk of ascertainment bias.24 Additionally, systems and versions of CADe exhibiting incomplete performance, including high false-positive rates, might contribute to discrepancies in the effectiveness of CADe-assisted colonoscopy between RCTs and real-world clinical practice.23,25-27 Therefore, the limited generalizability of RCTs necessitates conducting additional studies using more recent CADe systems in a pragmatic setting.24 In our study, we utilized the ENdoscopy as AI-powered Device (ENAD system; AINEX Corporation), a novel CADe software aimed at reducing false-positive rates and increasing per-frame sensitivity while also implementing less stringent exclusion criteria and providing autonomy for endoscopists. We aimed to evaluate the effectiveness of this novel AI-assisted detection system in enhancing ADR and other quality indicators in real-world clinical practice.
METHODS
Study design and population
This prospective, propensity score-matched, non-randomized controlled study was conducted at the Gangnam Healthcare System, a tertiary healthcare center of Seoul National University Hospital (SNUH), between May 2022 and October 2022. The institution provides comprehensive medical check-ups and conducts approximately 10,000 screening and surveillance colonoscopies annually.
This study enrolled patients aged ≥20 years who underwent elective colonoscopy for CRC screening, surveillance, or symptom diagnosis. Individuals aged <45 years generally visit for screening purposes due to concerns such as a family history or for diagnostic colonoscopy due to symptoms such as diarrhea and abdominal discomfort. Patients were excluded from the propensity score-matching analysis if they had a history of inflammatory bowel disease, CRC, colorectal resection, failed total colonoscopy, or declined to participate in the study.
CADe system
We utilized the ENAD system, an updated version of previously reported software (SCAI System) developed collaboratively by the Division of Bioengineering at SNUH and AINEX Corporation.28,29 This system, based on a convolutional neural network-based CADe platform, was trained on the same database as the prototype, which comprised 197,673 images from 3,121 polyps sourced from the AI database of the SNUH Gangnam Center. Additionally, it included a different dataset containing 66,397 images of 8,756 polyps. The CADe system employs an advanced deep learning-based object detection algorithm (YOLOv4) that identifies objects by localizing them with green boxes on the screen for polyp detection. The novel updated system was trained using only high-quality, well-focused frames and post-processed using the Kalman filter algorithm. The validation process involved 15,863 images from 80 polyp video clips and 90,144 images from 50 non-polyp clips during the withdrawal times. This version achieved a lower false-positive rate, decreasing from 3.2% to 0.6% and reducing the number of false alarms while increasing the per-frame sensitivity from 86.4% to 87.1%.
Colonoscopy
Colonoscopies were conducted by eight board-certified experienced gastrointestinal endoscopists, who had each performed >2,000 colonoscopies, using high-definition equipment (CF-HQ260 or CF-HQ290; Olympus Co., Ltd.). In our institution, six of the 12 colonoscopy rooms were equipped with the CADe system. Patients who underwent colonoscopy in these rooms were classified into the ENAD group in which the CADe system assisted endoscopists. Patients who underwent colonoscopy in the non-CADe-equipped rooms were classified into the standard colonoscopy (SC) group. Endoscopists were randomly assigned to rooms by the scheduling nurse and performed the procedures in both the CADe-equipped and non-CADe-equipped rooms. The participating endoscopists received thorough training on the CADe system and used it in clinical practice for 8 months prior to the study. All endoscopists were informed of the study and were aware that their performance would be evaluated.
Details of each detected polyp, including size, location, and morphology according to the Paris classification,30 as well as withdrawal time and bowel preparation quality using the Boston bowel preparation scale (BBPS),31 were recorded. All the polyps were histologically diagnosed. Small polyps were resected at the discretion of the endoscopist. Unresected large polyps that required endoscopic submucosal dissection or piecemeal mucosal resection were biopsied. All specimens were sent to the pathology department and histopathologically evaluated by an experienced pathologist at SNUH.
Outcome measures
The primary outcome was a comparison of ADR between the SC and ENAD groups. ADR was defined as the proportion of patients with one or more histologically confirmed adenomas. The secondary outcomes included the mean number of adenomas per colonoscopy (APC), defined as the total number of adenomas divided by the number of colonoscopies performed. Other secondary outcomes included advanced ADR (AADR), sessile serrated lesion detection rate (SSLDR), advanced SSLDR (ASSLDR), non-neoplastic lesion detection rate (no clinically significant histology, including hyperplastic or other benign polyps), and withdrawal time. Advanced adenoma was defined as the presence of any of the following factors: high-grade adenoma (intramucosal carcinoma), adenomatous lesions with a villous component, adenomatous lesions >10 mm in size, or invasive cancer. Advanced sessile serrated lesions (SSLs) included those with dysplasia or >10 mm. Each detection rate was calculated by dividing the number of patients with pathologically confirmed lesions by the total number of patients who underwent colonoscopy. Additional exploratory endpoints included the evaluation of ADR and APC based on polyp size, morphology, and location (defined as proximal if the polyp was located from the cecum to the transverse colon and distal if it was located from the splenic flexure to the rectum).
Statistical analysis
Propensity scores were matched using age and sex as independent variables in a logistic regression model to minimize the bias between the SC and ENAD groups. Patients in the SC group were matched with those in the ENAD group at a 1:1 ratio without replacement using the nearest neighbor matching method with calipers of width equal to 0.2 of the standard deviation of the logit of the propensity score. The balance of the covariates was then assessed using standardized mean differences, of which all were <0.2 for the baseline variables, indicating an adequate balance between the groups. After extracting the matched pairs, outcome measurements were calculated. Categorical variables were described as frequency counts and percentages, whereas continuous variables were described as means and standard deviations. The Mann-Whitney U-test and Student t-test were used for nonparametric and parametric continuous variables, respectively. Fisher exact test or chi-square test was used for categorical data. Differences were expressed as relative risk (RR) with 95% confidence intervals (CIs). A two-sided p<0.05 was considered statistically significant. All statistical analyses were performed using R software ver. 4.3.1 (The R Foundation for Statistical Computing).
Ethical statements
This study was approved by the Institutional Review Board of SNUH (No. H-2107-235-1240) and complied with the principles of the Declaration of Helsinki. Informed consent was obtained from all patients before their inclusion in the study.
RESULTS
In total, 2,105 patients underwent colonoscopy. After excluding 254 patients who did not meet the inclusion criteria, 1,851 patients were enrolled. Propensity scores were matched based on sex and age, which could influence ADR. Finally, 1,758 patients were included in the analysis and evenly distributed into the ENAD and SC groups (Fig. 1). The baseline characteristics of the enrolled patients are shown in Table 1. Table 2 outlines the clinical outcomes of colonoscopies in each group. No significant differences in age or sex were observed between both groups. BBPS scores were adequate in 98.6% (867/879) of patients in the ENAD group and 99.2% (872/879) in the SC group, with no significant differences between the two groups. The ENAD group had a longer mean total withdrawal time, including polypectomy time (9.0±3.4 vs. 8.3±3.1 min, p<0.001) and mean inspection time (total withdrawal time minus total polypectomy time) (7.5±2.3 vs. 7.1±2.1 min, p<0.001) than the SC group. No significant differences in the mean withdrawal times were observed between the two groups in cases where no polyps were detected (6.9±1.7 vs. 6.7±1.7 min, p=0.058).

Flowchart showing participant selection. SC, standard colonoscopy; ENAD, ENdoscopy as AI-powered Device.
The ENAD group had significantly higher ADR (45.1% vs. 38.8%, p=0.010) and SSLDR (5.7% vs. 2.5%, p=0.001) than the SC group. However, no significant differences were observed between the groups in the AADR and ASSLDR. Moreover, the ENAD group demonstrated a higher ADR for adenomas <10 mm, nonpolypoid lesions, and proximal and distal colon lesions than the SC group (Table 3, Fig. 2).

Comparison of polyp detection rates (per-patient analysis) between the SC and ENAD groups. SC, standard colonoscopy; ENAD, ENdoscopy as AI-powered Device; SSL, sessile serrated lesion.
The APC was significantly higher in the ENAD group (0.78±1.17) than in the SC group (0.61±0.99) (incidence risk ratio, 1.27; 95% CI, 1.13–1.42). Regarding APCs stratified by morphology, size, and location, the mean number of APCs was higher in the ENAD group for nonpolypoid lesions and for both proximal and distal colon lesions. APC improved for adenomas ≥10 mm and <10 mm in diameter (Table 4).
DISCUSSION
The implementation of the novel CADe system, ENAD, during colonoscopy enhanced the ADR by 6.3% (RR, 1.17). Similarly, APC and SSLDR increased significantly in the ENAD group compared to the SC group. Moreover, compared to the SC group, the ENAD group demonstrated a significantly increased ADR in polyps <10 mm and nonpolypoid lesions and a higher mean number of APCs for nonpolypoid lesions across all sizes.
Several previous prospective RCTs have reported an increased ADR performance when comparing CADe with routine colonoscopy.15,16,32 Furthermore, a recent meta-analysis of 21 RCTs revealed that the use of CADe during colonoscopy increased the ADR compared with the control.18 However, most studies involved non-commercially available AI systems, and considering the inherent limitations of RCTs regarding internal and external validity, the results may not be directly applicable in clinical practice. Therefore, non-randomized observational studies that assess the effectiveness rather than the efficacy of CADe in real-world clinical practice, without an immediate and direct comparison between CADe-assisted colonoscopy and SC, are essential.18,21-23,33,34 However, the conclusions of previous studies are controversial. In a pragmatic implementation study, Ladabaum et al.22 used a minimalist deployment strategy without additional measures that could affect endoscopic behavior, leaving CADe use at the discretion of each endoscopist for every colonoscopy performed. In real-world clinical practice, CADe implementation without considering the endoscopist’s inclination and behavior may not lead to an improvement in the ADR (CADe vs. control: 40.1% vs. 41.8%, p=0.44). Levy et al.21 conducted a retrospective observational study and reported that AI-assisted colonoscopy did not improve performance; in fact, the ADR in the CADe group was lower than that in the non-CADe group (CADe vs. control: 30.3% vs. 35.2%; p=0.001). However, our study, which used real-world data with a concurrent comparator, demonstrated an increased ADR and mean number of APCs in the ENAD group. We employed relatively less stringent exclusion criteria and a non-randomized study design to include all patients to confirm the effectiveness of CADe-assisted colonoscopy in a setting similar to routine clinical practice. All eight endoscopists in our study were experts with approximately 8 months of experience with CADe prior to study initiation, allowing them to integrate seamlessly and comfortably into the workflow. The novel CADe system, with its improved performance, likely contributed to the increased ADR in a real-world environment, contrary to the results of other non-randomized studies.21-23
In a study by Levy et al.,21 the decrease in the ADR in the CADe group was attributed to diminished procedure time, potentially implying a false sense of confidence and overreliance on AI technology, resulting in less scrupulous performance. In contrast, our study observed that the total withdrawal time, including polypectomy time, was longer in the ENAD group than in the SC group, which may be naturally attributed to the detection of more adenomas. The withdrawal time in cases in which no polyps were detected was not significantly different between the two groups. Thus, the use of the CADe system did not result in longer observation times. Instead, with sufficient observation time, similar to colonoscopies without AI assistance, ADR improvement is achievable even with CADe systems. Another hypothesis is that the updated ENAD system used in this study, by reducing the false-positive rate, might reduce distractions for the endoscopist, decrease fatigue, and lessen the overlooking of alarms, potentially increasing the ADR. This could be a distinguishing feature compared with other CADe systems. Nonetheless, there was a slight tendency of minimal prolongation of withdrawal time in the CADe group even in the absence of polyps (p=0.058); however, the actual difference was only 0.2 minutes (6.7 vs. 6.9 minutes). Therefore, CADe systems should be improved to minimize false positives as much as possible. Further research on the interaction between AI and human endoscopists and the clinical impact of reducing false-positive rates is also needed.
In this study, the detection rates of SSLs and nonpolypoid adenomas were significantly higher in the ENAD group than in the SC group. In the initial RCT, the detection rate of SSLs did not increase.16,17 However, Shah et al.35 reported a 78% reduction in the SSL miss rate with CADe. Many post-colonoscopy CRCs may originate from serrated polyps, in accordance with the serrated pathway of carcinogenesis, likely because SSLs are often difficult to detect and have a high risk of incomplete removal during colonoscopy. This difficulty arises from their flat shape, indistinct margins, and color similar to that of the surrounding mucosa.36 Thus, the higher SSLDR observed in the ENAD group in our study is considered significant and encouraging, as it may help reduce the risk of interval cancer.
Although the ENAD group demonstrated APC improvement for adenomas with diameters <10 mm and ≥10 mm, it did not show a higher AADR, ASSLDR, or ADR for polyps >10 mm. Other studies have also indicated that AI fails to improve the detection rate of advanced adenomas.16,32,34,37 The lack of an increase in the AADR or ASSLDR may be attributed to the proficiency of AI in detecting lesions within the observed mucosa while potentially missing those in blind spots not exposed by the endoscopist or due to poor bowel preparation. Therefore, endoscopists’ observational skills remain crucial, particularly during mucosal exposure. Some studies have addressed this limitation by introducing a blind spot monitoring technology based on deep learning.38,39 Combining CADe systems with this technology enhances the detection rate of advanced lesions. Furthermore, automatic monitoring of bowel cleanliness and providing feedback on colonoscopy quality through AI could improve the overall quality of the procedure, reduce the need for bowel preparation recordings, and decrease the physician’s workload.40,41 This, in turn, might increase the detection rate of advanced and other lesion types. Since the prevalence of advanced lesions is relatively low,42 large-scale studies are required to statistically confirm the effect of CADe on increasing AADR/ASSLDR among high-performing expert groups.
We calculated the ADR and APC according to location and compared them between the ENAD and SC groups. ADR and APC for proximal and distal colon lesions were higher in the ENAD group than in the SC group. Wang et al.43 have reported significantly lower miss rates for right-sided colon cancer in a CADe group than in a conventional group. However, a recent meta-analysis has demonstrated that CADe systems significantly improve APC regardless of adenoma location, a finding that is consistent with ours.18 We hypothesized that over time, the performance of CADe systems has improved, or the awareness of the need to better expose the mucosa in the distal colon for effective AI recognition has increased. CADe may improve ADR and APC, regardless of the location.
This study had some limitations. As this was a propensity score-matched, prospective, non-blinded study, potential selection and observer biases among endoscopists who were aware of ENAD use may have been present. Additionally, the study was conducted at a single tertiary center by experienced endoscopists, which could limit the generalizability of our results. Future studies should focus on external validation, including diverse centers of various sizes and less-experienced endoscopists, to establish the effectiveness of AI in daily clinical settings. Finally, although this study confirmed an overall increase in the mean ADR among endoscopists, we did not examine the increase in ADR for each endoscopist. Future research should analyze the inter-observer reliability among endoscopists to identify practitioners experiencing a decrease in ADR. By understanding the characteristics of these practitioners, we can develop strategies to improve ADR using CADe. This approach would help ensure consistency in the effectiveness of the system across practitioners. Nevertheless, our study effectively demonstrated that the novel CADe-assisted colonoscopy was superior to SC in a setting that mirrored real-world clinical practice through propensity score matching to mitigate sex- and age-related biases that can affect ADR.
In conclusion, the use of the novel ENAD system during colonoscopy increased the ADR, APC, and SSLDR compared with SC in patients who underwent screening and surveillance colonoscopy. With the validation of these findings in more extensive studies involving various endoscopists, CADe-assisted colonoscopy could become the norm, thereby elevating colonoscopy quality and subsequently improving CRC prevention outcomes.
Notes
Conflicts of Interest
Jung Ho Bae holds equity in AINEX Corporation. The other author has no potential conflicts of interest.
Funding
This work was supported by the Seoul National University Medical Big Data Research Center (hellombrc@gmail.com). The funding source had no role in the study design, execution, data analyses and interpretation, or the decision to submit results.
Author Contributions
Conceptualization: JHB; Data curation: JHB; Formal analysis: JBP; Funding acquisition: JHB; Methodology: all authors; Writing–original draft: JBP; Writing–review & editing: JHB.