A GROWING integrity challenge is emerging across cancer research, as machine learning analysis reveals that nearly one in ten published papers shows similarities to suspected paper mill outputs, raising concerns about the reliability of decades of research.
A Thirty-Year Expansion Brings New Risks for Cancer Research
Over the past 30 years, cancer studies have expanded rapidly in volume and global reach, driven by rising disease burden and intense academic competition. Alongside this growth, concerns have mounted about paper mills, organisations that produce fraudulent manuscripts for sale. Previous estimates suggested around 3% of biomedical research may be affected, but cancer research has long been suspected to face higher risks due to publication pressures, template driven experimental designs, and demand for rapid outputs. Until now, large scale assessments capable of quantifying the problem across decades of cancer research have been limited.
AI Methods Reveal Concerning Patterns Across Cancer Research
Researchers trained a machine learning model using a fine-tuned BERT architecture to distinguish known paper mill publications from genuine cancer studies, relying solely on article titles and abstracts. The model was trained on 2,202 retracted papers and validated using independent expert datasets, achieving an accuracy of 0.91. When applied to 2,647,471 cancer studies published between 1999 and 2024, the system flagged 261,245 papers, representing 9.87% of the literature. Flagged papers increased steadily over time and were present not only in low impact journals but also within the top 10% by impact factor. Overrepresentation was observed in gastric, liver, and bone cancer research, as well as in fundamental laboratory studies. The analysis also identified substantial geographic clustering, with more than 170,000 flagged papers affiliated with Chinese institutions, accounting for 36% of that country’s cancer research output.
Implications for Clinical Trust and Publishing Practice
The findings suggest that paper mills represent a significant and growing threat to the integrity of cancer studies, with potential downstream effects on evidence synthesis, clinical guidelines, and patient care. While the authors stress that flagged papers are not automatically fraudulent, the scale of the signal highlights the need for coordinated action by publishers, funders, and institutions. Integrating machine learning tools into editorial workflows, alongside expert human review, may help strengthen safeguards. As generative AI evolves, maintaining trust in cancer research will require sustained vigilance, transparency, and reform across the research ecosystem.
Reference
Scancar B et al. Machine learning based screening of potential paper mill publications in cancer research: methodological and cross sectional study. BMJ. 2026;392:e087581.





