Doctors Make Better Clinical Notes than AI Scribes – EMJ

This site is intended for healthcare professionals

Doctors Make Better Clinical Notes than AI Scribes, New Study Finds

DOCTORS make higher-quality clinical notes than AI scribes across five standardised care cases, according to a 2026 study.

This comes amid new findings that an AI speech assistant reduces nursing documentation time.

Ambient AI Scribes

Ambient AI scribes use generative AI to listen to consultations in healthcare settings and create summarised notes.

Researchers carried out a cross-sectional evaluation of notes from 11 AI scribe tools and 18 human note takers, that were rated by 30 blind human raters who used the modified Physician Documentation Quality Instrument (PDQI-9) to rank 10 domains of note quality on a 5-point Likert scale.

Across all five audio-recorded clinical scenarios, the human-generated notes received higher overall modified PDQI-9 scores than AI-generated notes.

Results Across Variable Cases

The largest difference in note quality was in an acute low back pain case amid background noise. Clinicians scored an average of 43.8 out of 50 whilst AI attained just 20.3.

The same trend was observed in two other scenarios: a case of chest pain where the clinician and patient were wearing masks, where human notes scored 42.2 compared with 34.8 for AI, and a nurse care manager encounter with a patient with heart failure, yielding 38.4 for human-generated notes and 32.8 for AI.

For the remaining two cases, a pharmacy encounter with a patient with an accented voice and a new patient consultation with both clinician and patient with accented voices, humans again scored better but the differences were not significant.

Pooled domain analysis revealed lower AI scores across all 10 assessed domains. The largest pitfalls of the AI scribes related to being thorough, organised, and useful.

Challenges in AI-Generated Clinical Notes

Researchers acknowledged that ambient AI scribes hold promise for reducing clinician burden.

However, they called for independent, vendor-neutral assessments of note quality as a critical step prior to large-scale clinical deployment.

Concerns persist regarding accuracy, completeness, and style.

Reference

Reddy A et al. Rapid evaluation of artificial intelligence technology used for ambient dictation in primary care: comparing the quality of documentation of artificial intelligence-generated and human-produced clinical notes. Ann Intern Med. 2026;DOI:10.7326/ANNALS-25-02772.

Featured image: amnaj on Adobe Stock

Author:

Each article is made available under the terms of the Creative Commons Attribution-Non Commercial 4.0 License.

Rate this content's potential impact on patient outcomes

Average rating / 5. Vote count:

No votes so far! Be the first to rate this content.