The dataset comprises 14,441 clinicopathologic correlations (CPCs) linking clinical, morphological and molecular data for myeloid neoplasms, supported by complete blood counts, peripheral blood smear findings, flow cytometric dysplasia screening and next-generation sequencing or PCR results. Two haematopathologists contributed 10,794 real-world reports from patients served in Egypt, with molecular testing performed in laboratories in the United States and Europe. These include 243 chronic myeloid leukaemia cases with 257 mutations or aberrations, 4,567 non-CML myeloid neoplasm cases with 7,883 tier I/II mutations, and 5,984 benign or inconclusive cases negative for myeloid mutations across 66 DNA/RNA genes.
Synthetic genomics for complex cases
In addition to the real-world data, the collection includes 3,647 CPCs generated with synthetic genomics to simulate new and follow‑up CML cases, as well as complex or rare non‑CML myeloid neoplasms. These synthetic records mirror challenging diagnostic scenarios, combining diverse blood count patterns, smear findings and curated mutational profiles to provide rich training material for clinicians and machine‑learning models. The dataset is organised as a machine‑readable Excel workbook with separate sheets for real non‑CML cases, real CML, real NGS‑negative cases and synthetic genomics.
International validation and future uses
Fifty‑one international experts from multiple centres stringently reviewed all records and confirmed 100% medical and clerical accuracy, benchmarking the content against the WHO 5th edition and other leading classifications. The authors state that the open‑access dataset is designed for clinical practice support, research, teaching and exam preparation, and for developing and validating multimodal automated diagnostic systems in haematology. By unifying morphology, laboratory parameters and genomics at scale, it is expected to serve as a reference model for structured reporting in myeloid neoplasms worldwide.
Reference
Elsafty A et al. 14,441 Genomics-based validated automated comprehensive clinicopathologic correlations for myeloid neoplasms. Scientific Data. 2025;12:1819.







