AI Model Outperforms Physicians in Clinical Reasoning

SUPERIOR performance to physicians demonstrated by artificial intelligence (AI) language model across a wide range of clinical reasoning and diagnostic tasks according to new research.

The findings suggested that advanced AI systems may already exceed human clinicians in certain structured decision-making scenarios, including diagnosis and management planning.

Large Language Model in Medicine Showed Higher Diagnostic Accuracy

The study evaluated an advanced large language model (LLM) across multiple clinical reasoning benchmarks, comparing its performance with hundreds of physicians at different training levels. Across diagnostic cases, management scenarios, and emergency department evaluations, the model consistently outperformed human clinicians, demonstrating strong overall clinical reasoning capability.

In one set of established clinical case conferences, the LLM correctly included the final diagnosis in up to 78% of cases, with the top suggested diagnosis correct in more than half. Performance further improved when broader diagnostic criteria were applied. The model also showed strong accuracy in selecting appropriate diagnostic tests and management strategies, performing at or above physician baselines across multiple settings.

AI in Healthcare Outperformed Physicians in Emergency Cases

In real-world emergency department cases, the model was assessed alongside attending physicians in blinded evaluations. The AI system identified the correct or near-correct diagnosis in up to 81.6% of cases at hospital admission, outperforming physicians at multiple diagnostic stages.

Researchers reported that the performance gap was most pronounced during early triage, when clinical information was limited and rapid decision-making is critical. The study authors noted that both human and AI performance improved as more clinical data became available.

Clinical Reasoning AI Showed Consistency Across Complex Tasks

Beyond diagnosis, the large language model in medicine also demonstrated strong performance in estimating clinical probabilities, selecting appropriate investigations, and generating structured differential diagnoses. In several benchmarks, it exceeded performance from previous AI models as well as physician groups using standard clinical resources.

The findings suggested that AI systems are increasingly capable of handling complex clinical reasoning tasks that traditionally rely on extensive medical training and experience.

Implications and Future Role for AI in Healthcare and Clinical Decision-Making

Despite strong performance, researchers highlighted that current evaluations of AI systems are largely based on structured, text-based clinical scenarios that may not fully capture the complexity of real-world patient care. They noted that clinical decision-making also relies on additional inputs such as imaging interpretation and bedside assessment, which remain challenging for current systems, and emphasised the need for prospective clinical trials to assess safety, efficiency, and real-world effectiveness before deployment.

The study authors suggested that clinical reasoning AI could increasingly support diagnostic decision-making, particularly in settings with limited specialist access. However, they stressed that human–AI collaboration and rigorous real-world validation will be essential before wider clinical adoption.

Reference

Brodeur PG et al. Performance of a large language model on the reasoning tasks of a physician. Science. 2026;392:524-527. DOI:10.1126/science.adz4433.

Featured Image: Chinnapong on Adobe Stock

Advanced AI Language Model Outperforms Physicians in Reasoning Tasks

Large Language Model in Medicine Showed Higher Diagnostic Accuracy

AI in Healthcare Outperformed Physicians in Emergency Cases

Clinical Reasoning AI Showed Consistency Across Complex Tasks

Implications and Future Role for AI in Healthcare and Clinical Decision-Making

Patient Acceptance of AI Shaped by Trust Barriers

Portable TB Swab Test Meets WHO Accuracy Targets in Global Study

More articles

Interview: James Zou

Beyond the 3D Printer: More Ideas to Level Up Medical Education

Storytelling for Social Health

Featured journals

EMJ Innovations 10 [Supplement 1] 2026

EMJ Innovations 10.1 2026

Therapy Area

About Us

Advanced AI Language Model Outperforms Physicians in Reasoning Tasks

Large Language Model in Medicine Showed Higher Diagnostic Accuracy

AI in Healthcare Outperformed Physicians in Emergency Cases

Clinical Reasoning AI Showed Consistency Across Complex Tasks

Implications and Future Role for AI in Healthcare and Clinical Decision-Making

Related To This Subject

Patient Acceptance of AI Shaped by Trust Barriers

Portable TB Swab Test Meets WHO Accuracy Targets in Global Study

More articles

Interview: James Zou

Beyond the 3D Printer: More Ideas to Level Up Medical Education

Storytelling for Social Health

Featured journals

EMJ Innovations 10 [Supplement 1] 2026

EMJ Innovations 10.1 2026