Adaptive Medical AI Routing Improves Efficiency - EMJ

This site is intended for healthcare professionals

Dynamic Mode Selection Enhances Medical Question Answering

MEDICAL question answering accuracy improved while inference time and token use fell substantially with a new intelligent routing framework designed to direct queries to the most appropriate large language model mode. 

Why Medical Question Answering Needs Adaptive Routing 

As large language models become more widely used in clinical and healthcare settings, selecting the most appropriate reasoning mode remains a significant challenge. Medical question answering tasks vary considerably in complexity, ranging from routine medication adjustments to rare disease diagnosis. These differences place varying demands on model reasoning capabilities. 

Researchers developed SynapseRoute, an intelligent routing framework that dynamically assigns questions to either a high reasoning “thinking” mode or a faster and lower cost “non thinking” mode. While the thinking mode can support more complex inference, it is associated with greater computational cost, longer response times and a higher risk of hallucination. In contrast, the non-thinking mode offers faster and less expensive responses but may struggle with more demanding reasoning tasks. 

Evaluating Medical Question Answering Performance 

The study assessed questions from four medical datasets: USMLE, MedMCQA, PubMedQA and CareQA. Questions were automatically labelled as requiring either thinking or non-thinking modes using the model Qwen3-30B-a3b. 

Multiple routing approaches were tested, including traditional machine learning methods and fine-tuned large language models. Performance was measured using standard evaluation metrics and a newly developed Accuracy–Inference–Token index. This metric integrates accuracy, latency and token cost while allowing adjustable weighting according to deployment priorities. 

The data showed that approximately 58% of medical questions could be answered correctly using the non-thinking mode alone. 

Accuracy Improvements with Lower Resource Use 

Among the routing methods evaluated, logistic regression delivered the strongest performance, achieving an area under the curve of 0.82. 

Compared with relying exclusively on the thinking mode, SynapseRoute improved overall accuracy: 0.8390; versus 0.8272. At the same time, the framework reduced inference time by 36.8% and lowered token consumption by 39.66%. 

The findings suggest that dynamic routing can improve the practicality and cost efficiency of dual mode large language model systems for medical question answering. By matching question complexity to the appropriate reasoning mode, SynapseRoute may enable more efficient deployment of artificial intelligence systems while maintaining or improving performance. The researchers also propose that the Accuracy–Inference–Token index could provide a practical method for evaluating performance and resource trade-offs in future adaptive medical question answering systems. 

Reference 

Zhang W et al. SynapseRoute: an auto-route switching framework on dual-state large language model for medical question answering. BMJ Innovations. 2026; doi: 10.1136/bmjinnov-2025-001529. 

Featured image: MohamadFaizal on Adobe Stock 

Author:

Each article is made available under the terms of the Creative Commons Attribution-Non Commercial 4.0 License.

Rate this content's potential impact on patient outcomes

Average rating / 5. Vote count:

No votes so far! Be the first to rate this content.