Adaptive Medical AI Routing Improves Efficiency

MEDICAL question answering accuracy improved while inference time and token use fell substantially with a new intelligent routing framework designed to direct queries to the most appropriate large language model mode.

Why Medical Question Answering Needs Adaptive Routing

As large language models become more widely used in clinical and healthcare settings, selecting the most appropriate reasoning mode remains a significant challenge. Medical question answering tasks vary considerably in complexity, ranging from routine medication adjustments to rare disease diagnosis. These differences place varying demands on model reasoning capabilities.

Researchers developed SynapseRoute, an intelligent routing framework that dynamically assigns questions to either a high reasoning “thinking” mode or a faster and lower cost “non thinking” mode. While the thinking mode can support more complex inference, it is associated with greater computational cost, longer response times and a higher risk of hallucination. In contrast, the non-thinking mode offers faster and less expensive responses but may struggle with more demanding reasoning tasks.

Evaluating Medical Question Answering Performance

The study assessed questions from four medical datasets: USMLE, MedMCQA, PubMedQA and CareQA. Questions were automatically labelled as requiring either thinking or non-thinking modes using the model Qwen3-30B-a3b.

Multiple routing approaches were tested, including traditional machine learning methods and fine-tuned large language models. Performance was measured using standard evaluation metrics and a newly developed Accuracy–Inference–Token index. This metric integrates accuracy, latency and token cost while allowing adjustable weighting according to deployment priorities.

The data showed that approximately 58% of medical questions could be answered correctly using the non-thinking mode alone.

Accuracy Improvements with Lower Resource Use

Among the routing methods evaluated, logistic regression delivered the strongest performance, achieving an area under the curve of 0.82.

Compared with relying exclusively on the thinking mode, SynapseRoute improved overall accuracy: 0.8390; versus 0.8272. At the same time, the framework reduced inference time by 36.8% and lowered token consumption by 39.66%.

The findings suggest that dynamic routing can improve the practicality and cost efficiency of dual mode large language model systems for medical question answering. By matching question complexity to the appropriate reasoning mode, SynapseRoute may enable more efficient deployment of artificial intelligence systems while maintaining or improving performance. The researchers also propose that the Accuracy–Inference–Token index could provide a practical method for evaluating performance and resource trade-offs in future adaptive medical question answering systems.

Reference

Zhang W et al. SynapseRoute: an auto-route switching framework on dual-state large language model for medical question answering. BMJ Innovations. 2026; doi: 10.1136/bmjinnov-2025-001529.

Featured image: MohamadFaizal on Adobe Stock

Author:: Helena Bradbury

Dynamic Mode Selection Enhances Medical Question Answering

Why Medical Question Answering Needs Adaptive Routing

Evaluating Medical Question Answering Performance

Accuracy Improvements with Lower Resource Use

NHS Lung Cancer Screening Model Could Guide Future Imaging Programmes

Interbed Connectivity Expands Critical Care Surveillance

More articles

Application of AI in Laryngectomised Patients

AI in Healthcare: From Diagnosis to Decision-Making

Interview: James Zou

Featured journals

EMJ Innovations 10 [Supplement 1] 2026

EMJ Innovations 10.1 2026

Therapy Area

About Us

Dynamic Mode Selection Enhances Medical Question Answering

Why Medical Question Answering Needs Adaptive Routing

Evaluating Medical Question Answering Performance

Accuracy Improvements with Lower Resource Use

Related To This Subject

NHS Lung Cancer Screening Model Could Guide Future Imaging Programmes

Interbed Connectivity Expands Critical Care Surveillance

More articles

Application of AI in Laryngectomised Patients

AI in Healthcare: From Diagnosis to Decision-Making

Interview: James Zou

Featured journals

EMJ Innovations 10 [Supplement 1] 2026

EMJ Innovations 10.1 2026