AI Model Outperforms Physicians in Clinical Reasoning - EMJ

This site is intended for healthcare professionals

Advanced AI Language Model Outperforms Physicians in Reasoning Tasks

SUPERIOR performance to physicians demonstrated by artificial intelligence (AI) language model across a wide range of clinical reasoning and diagnostic tasks according to new research. 

The findings suggested that advanced AI systems may already exceed human clinicians in certain structured decision-making scenarios, including diagnosis and management planning. 

Large Language Model in Medicine Showed Higher Diagnostic Accuracy 

The study evaluated an advanced large language model (LLM) across multiple clinical reasoning benchmarks, comparing its performance with hundreds of physicians at different training levels. Across diagnostic cases, management scenarios, and emergency department evaluations, the model consistently outperformed human clinicians, demonstrating strong overall clinical reasoning capability. 

In one set of established clinical case conferences, the LLM correctly included the final diagnosis in up to 78% of cases, with the top suggested diagnosis correct in more than half. Performance further improved when broader diagnostic criteria were applied. The model also showed strong accuracy in selecting appropriate diagnostic tests and management strategies, performing at or above physician baselines across multiple settings. 

AI in Healthcare Outperformed Physicians in Emergency Cases 

In real-world emergency department cases, the model was assessed alongside attending physicians in blinded evaluations. The AI system identified the correct or near-correct diagnosis in up to 81.6% of cases at hospital admission, outperforming physicians at multiple diagnostic stages. 

Researchers reported that the performance gap was most pronounced during early triage, when clinical information was limited and rapid decision-making is critical. The study authors noted that both human and AI performance improved as more clinical data became available. 

Clinical Reasoning AI Showed Consistency Across Complex Tasks 

Beyond diagnosis, the large language model in medicine also demonstrated strong performance in estimating clinical probabilities, selecting appropriate investigations, and generating structured differential diagnoses. In several benchmarks, it exceeded performance from previous AI models as well as physician groups using standard clinical resources. 

The findings suggested that AI systems are increasingly capable of handling complex clinical reasoning tasks that traditionally rely on extensive medical training and experience. 

Implications and Future Role for AI in Healthcare and Clinical Decision-Making 

Despite strong performance, researchers highlighted that current evaluations of AI systems are largely based on structured, text-based clinical scenarios that may not fully capture the complexity of real-world patient care. They noted that clinical decision-making also relies on additional inputs such as imaging interpretation and bedside assessment, which remain challenging for current systems, and emphasised the need for prospective clinical trials to assess safety, efficiency, and real-world effectiveness before deployment. 

The study authors suggested that clinical reasoning AI could increasingly support diagnostic decision-making, particularly in settings with limited specialist access. However, they stressed that human–AI collaboration and rigorous real-world validation will be essential before wider clinical adoption. 

Reference 

Brodeur PG et al. Performance of a large language model on the reasoning tasks of a physician. Science. 2026;392:524-527. DOI:10.1126/science.adz4433. 

Featured Image: Chinnapong on Adobe Stock 

Author:

Each article is made available under the terms of the Creative Commons Attribution-Non Commercial 4.0 License.

Rate this content's potential impact on patient outcomes

Average rating / 5. Vote count:

No votes so far! Be the first to rate this content.