Collaborative AI Exams Set Accuracy Record in Medicine

A COUNCIL of AI models, working together in a structured dialogue, has set a new record in passing U.S. medical exams, achieving up to 97% accuracy on questions spanning all three steps of the USMLE. This multi-agent “AI exams ” approach saw five GPT-4 models iteratively deliberate, discuss, and self-correct their answers, outperforming any single-instance AI.

AI Exams: Redefining Model Collaboration

Past studies showed that large language models (LLMs) could pass medical licensing exams, but their responses to the same question varied and some contained errors or hallucinations. By building an “AI exams” council, researchers harnessed collective reasoning, with a facilitator algorithm prompting the models to deliberate, summarise responses, and refine answers. Consensus was reached in 97%, 93%, and 94% of cases for Step 1, Step 2 CK, and Step 3, respectively – significantly higher than previous AI models and single-agent performance.

Results and Strengths of AI Council

When initial responses didn’t agree, the council engaged in debate, reaching the right answer 83% of the time and correcting more than half of previous majority vote errors. “AI exams” performance improved odds of converting an incorrect answer to correct by a factor of 5 after deliberation. This process reduced semantic entropy, meaning answer variability decreased as consensus emerged. The findings reveal that what was previously seen as unpredictable model behaviour can be channelled as a strength – using dialogue to self-correct and adapt reasoning.

Implications: Next Steps for Collaborative AI Exams

While not yet tested in real clinical settings, collaborative AI exams could make medical AI safer and more reliable for healthcare. The study suggests future tools in clinical education and patient care should embrace varied AI perspectives, unlocking new possibilities by leveraging teamwork rather than demanding consistency from a single model.

Reference

Shaikh Y et al. Collaborative intelligence in AI: evaluating the performance of a council of AIs on the USMLE. PLOS Digital Health. 2025;DOI:10.1371/journal.pdig.0000787.

Collaborative AI Passes US Medical Exams with High Accuracy

AI Exams: Redefining Model Collaboration

Results and Strengths of AI Council

Implications: Next Steps for Collaborative AI Exams

Artificial Intelligence Applications in Medicine: A Rapid Overview of Current Paradigms

Advantages of Virtual Reality in Clinical Training

More articles

Designing Life-Saving Neonatal Incubators: Interview with James Roberts

Diagnostic and Surgical Challenges in Extradigital Glomus Tumour

Smart Contact Lens Technology for Wearable Biosensors and Drug Delivery

Featured journals

EMJ Innovations 9 [Supplement 2] 2025

EMJ Innovations 9.1 2025

Therapy Area

About Us

Collaborative AI Passes US Medical Exams with High Accuracy

AI Exams: Redefining Model Collaboration

Results and Strengths of AI Council

Implications: Next Steps for Collaborative AI Exams

Related To This Subject

Artificial Intelligence Applications in Medicine: A Rapid Overview of Current Paradigms

Advantages of Virtual Reality in Clinical Training

More articles

Designing Life-Saving Neonatal Incubators: Interview with James Roberts

Diagnostic and Surgical Challenges in Extradigital Glomus Tumour

Smart Contact Lens Technology for Wearable Biosensors and Drug Delivery

Featured journals

EMJ Innovations 9 [Supplement 2] 2025

EMJ Innovations 9.1 2025