Performance and Accuracy of Natural Language Processing to Identify Disease Aetiology from Non-Structured Cardiac MRI Electronic Medical Record Reports - European Medical Journal


Performance and Accuracy of Natural Language Processing to Identify Disease Aetiology from Non-Structured Cardiac MRI Electronic Medical Record Reports

| Cardiology Download as | PDF
*Duygu Kocyigit,1 Alex Milinovich,2 Chan Mi Lee,3 Michael Silverman,3 Maleeha Ahmad,3 Mazen Hanna,1 Andrej Gabrovsek,1 Jian Jin,2 WH Wilson Tang,1 Richard Grimm,1 Leslie Cho,1 Brian Griffin,1 Scott Flamm,1 Deborah Kwon1

The authors have declared no conflicts of interest.

Cardiol. ; DOI/10.33590/emjcardiol/2009142.

Each article is made available under the terms of the Creative Commons Attribution-Non Commercial 4.0 License.


The utility of cardiac MRI (CMR) in patients with heart failure has been well demonstrated and continues to expand as MRI techniques evolve. Its main superiorities in this patient population include: accurate and reproducible quantification of ventricular systolic functions; enhanced discrimination of abnormal myocardial tissue characteristics (i.e., oedema, interstitial fibrosis, and replacement fibrosis); and assessment of valvular function/morphology, endocardium and pericardium in a single scan.1,2

CMR is now an essential part of the diagnosis of various types of heart failure, including cardiac amyloidosis, cardiac sarcoidosis, myocarditis, arrhythmogenic right ventricular cardiomyopathy, and iron overload cardiomyopathy. CMR findings also have prognostic implications, such as in hypertrophic cardiomyopathy.1,2These have resulted in an increasing demand and utility of CMR in routine clinical practice. However, the synthesis of imaging findings into a final or differential diagnosis is typically written in free-text, resulting in difficulties with accurately categorising cardiomyopathy types by generic query algorithms.

Natural language processing (NLP) is an analytical method that has been used to develop computer-based algorithms that handle and transform natural linguistics so that the information can be used for computation.3 It enables gathering and combining of information extracted from various online databases, and helps create solid outputs that could serve as research endpoints,  including sample identification and variable collection. In the field of imaging, NLP may also have several clinical applications, such as highlighting and classifying imaging findings, generating follow-up recommendations,  imaging protocols, and survival prediction models.4


There are scarce data on the utility of NLP in heart failure imaging, which focusses on extraction of left ventricular ejection fraction from echocardiography reports.5,6 In this study, the authors assessed the utility of NLP for heart failure aetiology extraction from CMR reports that were in a free-text, non-structured format. For this purpose, CMR records at a single centre from May 1995–May 2019 were examined for reports favouring or excluding cardiac amyloidosis, cardiac sarcoidosis, and myocarditis diagnoses using NLP via cTAKES (clinical text analysis knowledge extraction system). CMR reports of the extracted cases were reviewed manually (N=1262). Indeterminate cases, defined as having at least two differential diagnoses on the CMR report, were excluded (n=339). The accuracy of NLP was determined for cardiac amyloidosis, cardiac sarcoidosis, and myocarditis separately. This initial review was followed with five iterations for improving the accuracy of NLP, using a gradient boosting machine model with a word2vec model representation of the sentences of interest combined with indicators of diagnosis identified, certainty, polarity, and section header in the final algorithm.


Overall, this study demonstrates that NLP can be used as an accurate method to extract cardiac amyloidosis, cardiac sarcoidosis, and myocarditis diagnoses from CMR reports in patients with heart failure. Adjustments to the algorithm are essential to improve its accuracy because of variations in linguistic expression manners of CMR readers. Application of this analytical method enables  timesaving and accurate documentation of various heart failure aetiologies, with the potential for improving both heart failure care quality and performance, as well as facilitating future heart failure research.

Karamitsos TD et al. The role of cardiovascular magnetic resonance imaging in heart failure. J Am Coll Cardiol. 2009;54(15):1407-24. Peterzan MA et al. The role of cardiovascular magnetic resonance imaging in heart failure. Card Fail Rev. 2016;2(2):115-22. Cai T et al. Natural language processing technologies in radiology research and clinical applications. Radiographics. 2016;36(1):176-91. Sorin V et al. Deep learning for natural language processing in radiology-fundamentals and a systematic review. J Am Coll Radiol. 2020;17(5):639-48. Wagholikar KB et al. Extraction of ejection fraction from echocardiography notes for constructing a cohort of patients having heart failure with reduced ejection fraction (HFrEF). J Med Syst. 2018;42(11):209. Patterson OV et al. Unlocking echocardiogram measurements for heart disease research through natural language processing. BMC Cardiovasc Disord. 2017;17(1):151.