Domain-Aware Versus Machine Learning Imputation for Sparse Antimicrobial Susceptibility Data

Fredrick Mutisya; Taïoh Yokoyama; Sana Boujaafar; Cyprien de Turckheim; Mathieu Raad

doi:10.33590/emjmicrobiolinfectdis/XXJQ9810

BACKGROUND AND AIMS

Antimicrobial susceptibility testing datasets frequently miss data in a structured way due to selecting testing practices. Handling these gaps with generic statistical imputation may violate well-established microbiological rules such as intrinsic resistance or non-reportable combinations. This study aims to compare domain-aware completion based on established microbiology rules with several machine learning (ML)-based imputation strategies under controlled conditions.¹

MATERIALS AND METHODS

The authors evaluated imputation strategies using the Atlas dataset 2022 (Pfizer, New York, USA) for levofloxacin, meropenem, and gentamicin, selected to represent different drug classes and resistance patterns. SmartBiotic’s (Montreal, Canada) rule engine derived from the European Committee on Antimicrobial Susceptibility Testing (EUCAST) and Clinical and Laboratory Standards Institute (CLSI) expected resistance and susceptibility phenotypes was used for domain-aware completion. For validation, 20% of truly observed results were randomly masked per antibiotic, simulating missing completely at random conditions. Five reconstruction strategies of masked values were compared: global frequency imputation, ML with listwise deletion, ML with random undersampling, ML with synthetic minority oversampling technique (SMOTE), and a rule-based strategy. Outcomes were binarised (susceptible versus resistant/intermediate) and evaluated using accuracy, sensitivity, specificity, predictive values, F1 score, and Cohen’s kappa.

RESULTS

The authors’ inferred resistance rules added 75,730 new cells, mainly from intra-species inference (75,729 cells), while intrinsic resistance rules augmented all 55,549 rows with 502,847 additional cells (11.2%). In the masking experiment, the domain rule completion demonstrated high performance for levofloxacin (accuracy: 95%; sensitivity: 92%; specificity: 96%), meropenem (91%/97%/90%), and gentamicin (86%/57%/96%). With imbalance handling, ML with SMOTE oversampling achieved moderate sensitivity improvement (levofloxacin: 58%; meropenem: 92%; gentamicin: 54%) over unbalanced ML (43%/90%/41%). Random undersampling produced balanced profiles but lower overall performance (66–72% across metrics). Frequency imputation yielded 0% sensitivity across all antibiotics despite acceptable accuracy (69–79%).

CONCLUSION

These findings suggest that antimicrobial susceptibility testing missingness should first be addressed as a microbiological problem and only then as a statistical one. Rule-based systems predict only when applicable, yielding high specificity but variable coverage-depending sensitivity. ML required imbalance correction for meaningful resistance detection, with SMOTE oversampling offering optimal compromise. These results, consistent with recent ML approaches in AMR surveillance, support hierarchical imputation with deterministic approach first, then imbalance-aware ML for residual gaps. Future validation work on these approaches should assess performance under real-world settings.

Domain-Aware Versus Machine Learning Imputation for Sparse Antimicrobial Susceptibility Data

BACKGROUND AND AIMS

MATERIALS AND METHODS

RESULTS

CONCLUSION

Record U.S. Cyclosporiasis Outbreak Demands Urgent Clinical Vigilance

Carbapenem Resistance Genes Found in Hospital Pathogens

More articles

Practical Cases in Chronic Hepatitis B Care

FIFA Readiness and Special Pathogens

Closing the COVID-19 Prevention Gap

Featured journals

EMJ Microbiology & Infectious Diseases 7.1 2026

AMJ Microbiology & Infectious Diseases 4 [Supplement 4] 2026

Therapy Area

About Us

Domain-Aware Versus Machine Learning Imputation for Sparse Antimicrobial Susceptibility Data

BACKGROUND AND AIMS

MATERIALS AND METHODS

RESULTS

CONCLUSION

Related To This Subject

Record U.S. Cyclosporiasis Outbreak Demands Urgent Clinical Vigilance

Carbapenem Resistance Genes Found in Hospital Pathogens

More articles

Practical Cases in Chronic Hepatitis B Care

FIFA Readiness and Special Pathogens

Closing the COVID-19 Prevention Gap

Featured journals

EMJ Microbiology & Infectious Diseases 7.1 2026

AMJ Microbiology & Infectious Diseases 4 [Supplement 4] 2026