Hybrid AI-Human System Matches Experts in UC Scoring - EMJ

This site is intended for healthcare professionals

Hybrid AI-Human Scoring System Matches Expert Readers in Ulcerative Colitis Trials

A NEW study reports that a hybrid central reading approach combining two independently trained machine learning models with targeted human adjudication can deliver endoscopic scoring accuracy comparable to traditional expert review in ulcerative colitis research – while dramatically reducing workload and variability.

Accuracy Matches Human Experts

Endoscopic scoring is critical for assessing disease activity in ulcerative colitis, but traditional central reading suffers from inter-reader variation and the operational strain of multiple human reviewers. In this evaluation of 150 full-length endoscopic videos, researchers compared the standard two-reader-plus-adjudicator model with a hybrid system known as “2M+1H,” where two independent machine learning models produce scores and a gastroenterologist adjudicates only when they disagree.

The 2M+1H system achieved a quadratic weighted kappa of 0.78 against the reference standard, meeting the prespecified threshold for non-inferiority. Agreement remained high across key clinical endpoints, with 82.7% concordance for endoscopic improvement and 89.3% for remission.

Major Reduction in Human Workload

Crucially, the hybrid model reduced human reads by 81% compared with conventional central reading. Only discordant model outputs required adjudication, greatly streamlining review without sacrificing accuracy.

The study also highlighted a notable weakness in current practice: 16% of videos received different final scores depending on which human readers were assigned. By contrast, integrating machine learning appears to stabilise outcomes and reduce dependence on reader-specific interpretation patterns.

Toward More Reproducible Trial Endpoints

The findings suggest that introducing machine learning as a first-pass scoring tool may enhance both operational efficiency and reproducibility in ulcerative colitis trials. The authors note that further prospective validation is needed, particularly comparing hybrid scoring outputs with clinical, biomarker and histological outcomes.

If confirmed, the 2M+1H paradigm could reshape central reading workflows, reduce trial costs and support more consistent interpretation of endoscopic endpoints in inflammatory bowel disease research.

Reference

Gottlieb KT et al. Comparative evaluation of a hybrid machine learning-human adjudication paradigm for endoscopic scoring in ulcerative colitis. BMJ Open Gastroenterology. 2025;12:e001959.

Author:

Each article is made available under the terms of the Creative Commons Attribution-Non Commercial 4.0 License.

Rate this content's potential impact on patient outcomes

Average rating / 5. Vote count:

No votes so far! Be the first to rate this content.