BACKGROUND AND AIMS
With recent emphasis placed on using stone volumes in surgical outcomes research,1-6 there is a need to assess the accuracy and utility of available tools. The authors sought to assess the accuracy of AI-calculated volumes compared to semi-automated methods5,6 and evaluate whether cumulative stone diameter, semi-automated (SA) stone volume calculation, or a fully automated AI method was a better predictor of stone-free status following suction-augmented ureteroscopy (Figure 1).

Figure 1: Study schema and primary aims.
SFR: Stone-Free Rate; SFR-A: Stone-Free Rate-Grade A; SFR-C: Stone-Free Rate-Grade C.
METHODS
A total of 171 CT scans were included (96 pre-operation and 75 post-operation). Cumulative stone diameter was measured manually. Stone volumes were assessed using two semi-automated segmentation applications (QSAS [Mayo Clinic, Rochester, Minnesota, USA], 3D-Slicer [Chitubox, Schenzhen, China]), which require investigator annotation of the region of interest. A Mayo Clinic-developed AI programme calculated stone volumes in a fully automated fashion. Pearson correlation assessed the association between AI-estimated and semi-automated volumes. Sensitivity and specificity of the AI model were assessed for absolute stone-free rate (SFR-Grade A) on post-operative scans. Receiver operating characteristic analysis evaluated accuracy of pre-operative stone burden metrics in predicting stone-free status using SFR-Grade A (no residual fragments) and SFR-Grade C, stone-free rate with residual fragments between 2.1–4.0 mm in size (no fragments >4 mm) criteria.
RESULTS
AI-estimated stone volumes showed strong linear correlations with both 3D-Slicer (R=0.95; p<0.001; mean difference: -0.31 mm3; interquartile range: -13.06–25.81 mm3) and QSAS (R=0.95; p<0.001; mean difference: 0 mm3; interquartile range: -12.0–8.0 mm3) calculated volumes. Among post-operative scans, strong correlations persisted with QSAS (R=0.88; p<0.001) and 3D-Slicer (R=0.86; p<0.001).
Among 59 patients with eligible pre- and post-operative imaging, the AI model demonstrated a sensitivity of 85.7% and specificity of 88.9% for SFR-Grade A. In cases of incorrect stone-free determination, parenchymal stones were identified in both false-negative cases, and the largest false-positive residual burden (16 mm3) occurred in the right moiety of a horseshoe kidney.
Among 69 patients with a single ureteroscopy and complete follow-up imaging within 3 months, no significant difference was found between preoperative volumetric and diameter measurements in predicting SFR-Grade A. Cumulative pre-operative diameter outperformed QSAS-calculated volume in predicting SFR-Grade C (area under the curve: 0.78 versus 0.62; DeLong Test p=0.037). Sub-analysis of flexible and navigable ureteric access sheath cases revealed no difference between any measurement in predicting SFR-Grade A or -Grade C.
All pre-operative stone measurements correlated significantly with operative time; however, semi-automated volumes from 3D-Slicer (R2=0.41) and QSAS (R2=0.37) explained more variation in operative time than cumulative diameter (R2=0.25) or AI-estimated volume (R2=0.24).
CONCLUSION
A fully automated, AI-driven method for stone volume determination was highly accurate, offering an efficient option for estimating pre-operative or residual stone burden.
The AI model was 85.7% sensitive and 88.9% specific for determining SFR-Grade A without clinician annotation, with errors concentrated in small low-attenuation stones and anatomic variants. Pre-operative volumetric measurements did not outperform cumulative diameter in predicting stone-free status.



