A new dataset of bone marrow cell images promises to transform the way clinicians and researchers diagnose and study blood cancers. Developed from samples of 125 patients with myelodysplastic syndrome (MDS), the collection includes 25,009 meticulously curated images representing 27 distinct cell categories. These categories span both normal and abnormal cells, capturing subtle morphological differences that are often difficult to identify under a microscope.
Expert Validation and Precision Processing
To ensure the highest diagnostic accuracy, each cell label was independently reviewed by up to three haematology experts. This rigorous validation process strengthens the dataset’s reliability for both clinical training and artificial intelligence (AI) model development. Cells were extracted using a precise cropping technique, in which bounding boxes were converted into squares and expanded by 10 per cent. This careful adjustment preserved the full integrity of cell boundaries and key structural features, while minimising the inclusion of neighbouring cells.
The result is a comprehensive and standardised image set that captures the complex cellular variations associated with pathological haematopoiesis in MDS. In particular, it documents abnormalities such as micro megakaryocytes—cells whose irregularities play a central role in diagnosing the condition.
AI Applications and Future Impact
Researchers suggest that this dataset could significantly accelerate innovation in AI-based diagnostics for blood disorders. Machine learning models trained on the images have the potential to recognise dysplastic cells automatically, supporting faster and more consistent diagnosis of MDS. Such tools could assist pathologists by highlighting suspect cells, thereby improving early detection and patient outcomes.
The creation of this dataset represents a key step toward integrating AI into routine haematology practice. By reducing subjectivity and improving efficiency in cytomorphological assessment, the technology could relieve growing diagnostic workloads across healthcare systems. Ultimately, it sets a new benchmark for how digital pathology and human expertise can work together to refine blood cancer diagnostics.
Reference
Shen D et al. A large dataset of bone marrow cells in myelodysplastic syndrome for classification systems. Scientific Data. 2025;12:1849.







