AI Antibiotic Chatbot Shows Clinical Promise - EMJ

This site is intended for healthcare professionals

Antibiotic Chatbot Shows Promise but Safety Gaps Remain

AI clinical decision

ARTIFICIAL Intelligence (AI) chatbot designed to deliver antibiotic prescribing advice based on local hospital guidelines provided accurate recommendations but still made clinically relevant errors and cannot yet be considered safe for routine clinical use, a new study finds.  

The system uses a retrieval-augmented generation (RAG) approach, in which a large language model is restricted to answering questions using hospital-specific antimicrobial guidelines. Researchers evaluated the tool using both simulated clinical scenarios and real questions written by infection specialists. 

High Accuracy in Routine Cases but Reduced Performance in Complex Scenarios 

Across 200 simulated cases, the chatbot attempted to answer 93% of queries and produced fully correct responses in 87% of those. A further 8% were partially correct, while 5% were incorrect. When judged across all questions, 81% of responses were fully correct. Performance was lower in more complex cases, particularly those involving renal impairment. 

In a second evaluation using 66 clinician-written questions, 81% of answers were rated fully correct, while 18% were partially correct. A small proportion contained incorrect additional information, including inappropriate antibiotic choices or dosing errors. Around 86% of responses were judged to contain no incorrect content. 

The model also demonstrated limitations in recognising when questions fell outside its scope. Of nine out-of-scope queries, just over half were correctly identified and met with a fallback response advising specialist input. 

Guideline Grounding Significantly Improves Accuracy Compared with General AI Models 

Researchers found that the system performed substantially better than large language models used without local guideline context. A locally deployed model without retrieval support achieved only 11% fully correct answers, while a more advanced general-purpose model reached 46% when prompted to provide locally relevant advice. Performance was also influenced by how effectively relevant guideline sections were retrieved and applied. 

Despite these limitations, the system was able to generate responses quickly, typically within 10–15 seconds, suggesting potential for real-time clinical use if accuracy and safety can be improved. 

The authors conclude that while retrieval-based AI systems grounded in local antimicrobial guidelines can significantly improve the quality of automated infection advice, further refinement is required before deployment in clinical settings. 

Reference 

Eyre DW et al. An antibiotic chatbot: evaluation of a retrieval-augmented generation approach for providing guideline-based antimicrobial advice. Infect Dis Pract. 2026;93:106789. 

Featured Image: Dragoș Asaftei on Adobe Stock 

Author:

Each article is made available under the terms of the Creative Commons Attribution-Non Commercial 4.0 License.

Rate this content's potential impact on patient outcomes

Average rating / 5. Vote count:

No votes so far! Be the first to rate this content.