A COUNCIL of AI models, working together in a structured dialogue, has set a new record in passing U.S. medical exams, achieving up to 97% accuracy on questions spanning all three steps of the USMLE. This multi-agent “AI exams ” approach saw five GPT-4 models iteratively deliberate, discuss, and self-correct their answers, outperforming any single-instance AI.
AI Exams: Redefining Model Collaboration
Past studies showed that large language models (LLMs) could pass medical licensing exams, but their responses to the same question varied and some contained errors or hallucinations. By building an “AI exams” council, researchers harnessed collective reasoning, with a facilitator algorithm prompting the models to deliberate, summarise responses, and refine answers. Consensus was reached in 97%, 93%, and 94% of cases for Step 1, Step 2 CK, and Step 3, respectively – significantly higher than previous AI models and single-agent performance.
Results and Strengths of AI Council
When initial responses didn’t agree, the council engaged in debate, reaching the right answer 83% of the time and correcting more than half of previous majority vote errors. “AI exams” performance improved odds of converting an incorrect answer to correct by a factor of 5 after deliberation. This process reduced semantic entropy, meaning answer variability decreased as consensus emerged. The findings reveal that what was previously seen as unpredictable model behaviour can be channelled as a strength – using dialogue to self-correct and adapt reasoning.
Implications: Next Steps for Collaborative AI Exams
While not yet tested in real clinical settings, collaborative AI exams could make medical AI safer and more reliable for healthcare. The study suggests future tools in clinical education and patient care should embrace varied AI perspectives, unlocking new possibilities by leveraging teamwork rather than demanding consistency from a single model.
Reference
Shaikh Y et al. Collaborative intelligence in AI: evaluating the performance of a council of AIs on the USMLE. PLOS Digital Health. 2025;DOI:10.1371/journal.pdig.0000787.