Life expectancy of patients with congenital heart disease (CHD) has increased in recent decades; however, late complications remain frequent and difficult to predict. Progress in data science has spurred the development of decision support systems and could aid physicians in predicting clinical deterioration and in the management of CHD patients. Newly developed artificial intelligence (AI) algorithms have shown performances comparable to humans in clinical diagnostics using statistical and computational algorithms and are expected to partly surpass human intelligence in the near future. Although much research on AI has been performed in patients with acquired heart disease, little data is available with respect to research on AI in patients with CHD. Learning algorithms in patients with CHD have shown to be promising in the interpretation of ECG, cardiac imaging, and the prediction of surgical outcome. However, current learning algorithms are not accurate enough to be implemented into daily clinical practice. Data on AI possibilities remain scarce in patients with CHD, and studies on large data sets are warranted to increase sensitivity, specificity, accuracy, and clinical relevance of these algorithms.
Improved medical treatment and surgical techniques has caused the life expectancy of patients with congenital heart disease (CHD) to be significantly prolonged.1-3 As these patients reach adulthood, late complications such as arrhythmias and congestive heart failure occur,1 resulting in reduced quality of life and life expectancy.4 Furthermore, these complications often result in unscheduled hospital visits or even emergency admissions.3,5-7 Although visits to the outpatient clinic are frequent, it remains difficult to predict and prevent clinical deterioration.
With the introduction of the electronic medical record and the ability to digitally store data for diagnostic modalities, such as ECG and echocardiography, large amounts of patient data have been generated over the past few decades. Using machine learning (ML) or deep learning (DL) for the analysis of these data has been a topic of interest for some years. Progress in data science has spurred the development of decision support systems which can aid physicians in the management of CHD patients and therapeutic decision making.4,8,9 Newly developed algorithms perform as well as humans in clinical diagnostics using statistical and computational algorithms to perform recognition, classification, and learning tasks, and are expected to outperform humans in the near future.10,11
Used in this context, artificial intelligence (AI) is an umbrella term for the use of computers to model intelligent behaviour. The terms ‘neural networks,’ DL, and ML are technical concepts that fall under this umbrella but are confusingly used interchangeably with the term AI in the literature. This review will focus on ML and DL. Learning algorithms learn from data given as an input, also called the training dataset. The algorithm then gets tested on a so-called test or validation dataset, which contains new unseen data. ML, specifically, refers to algorithms that can ‘learn’ patterns from training data and then use the learned patterns to classify previously unseen data.12 DL algorithms have an extra hidden layer, which allows them to automatically detect important features from the data, while in ML algorithms, the features need to be provided manually.4,13
Although several AI studies have been performed in patients with CHD over the past few years, data are limited compared to data in patients with acquired heart disease.4,8,9,13-16 Considering the propensity of CHD patients for developing arrhythmias and heart failure, the predictive abilities of the AI algorithms could prove to be lifesaving. Therefore, the aim of this review is to provide an overview of studies investigating the potential of AI algorithms with respect to the various imaging modalities in patients with CHD.
Medline® (Northfield, Illinois, USA) and EMBASE (Elsevier, Amsterdam, the Netherlands) were used to search for studies published up to 9th August 2019. The search was developed iteratively for synonyms of ‘congenital heart disease,’ ML, DL, and AI, both controlled vocabulary (Medical Subject Headings [MeSH]) and free-text words. Nonhuman studies, case reports, biomarker studies, and reviews were excluded. The reference list and cited articles were checked for additional references.
Selection of Studies
Studies were included if they applied AI algorithms for diagnostics (heart sound, echocardiography, MRI, CT, electrocardiogram analysis, and classification/prediction models, for example) in CHD patients. Since the terms AI, DL, and ML are used interchangeably, all three terms were included in this review. All potential articles were read in full by two authors (Ms Marinka D. Oudkerk Pool and Mr Dirkjan Kauw). Disagreements concerning eligibility were resolved by discussion.
Extraction of Data
The extracted data from each paper were author, publication year, total number of patients (both training and test set), patient population, data used for analysis (input data in the algorithm), primary outcome (goal of the study), the used AI algorithm, and accuracy of the proposed AI algorithm. For comparison between the different techniques the sensitivity (SE), specificity (SP), and accuracy were used.
Accuracy is defined as the number of correctly classified results compared to the ‘true’ value (either positive or negative), as assessed by the gold standard technique.
True positive (TP) is the proportion of actual positives that are correctly identified as such. True negative (TN) is the proportion of actual negatives correctly identified as such. False positive (FP) is a negative value identified as a positive value, and false negative (FN) is a positive value identified as a negative value. SE, SP, and accuracy can be defined using equations:17-20
SE=100 x (TP / (TP + FN))
SE=100 x (TP / (TP + FP))
accuracy = 100 x (TP + TN) / (TP + TN + FP +FN)
In total, 63 articles were potentially eligible for this review after removing duplicates. Forty-eight articles were considered irrelevant because they focussed on biomarkers, genes, mechanical AI, were not based on cardiology (either neurology or mechanical ventilation), diagnostics (over the phone or medication), or comparison study in which two or more imaging modalities were compared. Twenty-seven articles were read in full, after which an additional 11 articles were excluded. Two additional articles were selected by going through the references. The final analysis consisted of 18 articles (Figure 1).
Table 1 shows an overview of each selected article found in this search. No articles on CT in CHD patients were found during this search.4,8,9,13-16,21-31
In total, 15,244 patients were analysed: 10,354 adults (35% male; 33% female; 32% no gender described; mean age of 33.30 ±13.00 years), 1,858 children (>2 years old; mean age of 9.22 ±1.09 years [42% of patients]; 58% no age described), and 4,099 were infants (<2 years old; age not described). Diagnoses of CHD of varying complexity were made in 14,532 (95.33%) patients and 712 (4.67%) patients were included as a healthy control group.
In the 18 analysed articles, 15 different AI algorithms were used. The most used technique was the artificial neural network (ANN) in four of the articles (22%). Table 2 gives an overview of all techniques used in this review. Nine out of 18 articles analysed the use of ML in heart sounds (50%). The other articles analysed echocardiography (n=3, 17%), MRI (n=1, 6%), ECG (n=1, 6%), as well as prediction or classification models (n=4, 22%).
Heart Sound Analysis
Nine articles aimed to distinguish between pathological or innocent murmurs using ML on sound recordings from an electronic stethoscope.
The study by DeGroff et al.24 aimed to determine pathological from innocent murmurs using spectral resolution and frequency range as input. Using an ANN, they found high SP and SE (both >90.0%). Sepehri et al.25 also found high accuracy (93.6%) using an ANN based on spectral and timing properties of the sound recordings of heart sounds of murmurs. The algorithm was trained on 60 normal and 60 pathological heart sound recordings.
To evaluate the algorithm, it was tested with 60 either innocent or pathological murmurs to correctly identify first and second heart sounds. Other articles used multiple algorithms to distinguish between a pathological or innocent murmur, namely linear discriminant analysis, support vector machine, a combination between hidden Markov model and support vector machine, ANN, and the Arash-band. ANN was the most frequently used algorithm (n=3, 33%), and yielded the highest accuracy, SE, and SP.
AI algorithms were used on echocardiographic data to distinguish between structurally normal or pathological hearts, or to determine cardiac cavity volumes and function. The algorithm can be trained to detect change in echogenicity in the collected data, which can be seen in the wall of the heart. In this manner, Diller et al.4 found accuracy of 98% in distinguishing between transposition of the great arteries (TGA) after an atrial switch operation, congenitally corrected-TGA, and normal controls using a convolutional neural network (CNN) algorithm. The endocardial border was marked by two researchers and compared to the border marked by a CNN algorithm. A knowledge-based article written by Neukamm et al.29 only looked at SE and SP and found that making a 3D model out of the 2D echocardiogram data is feasible in 97% of cases; however, results for assessing the ejection fraction (EF) were unsatisfactory and MRI remains the method of choice.
In one of the articles by Nyns et al.,30 MRI was used as an input for knowledge-based reconstruction of the volume of the right ventricle after atrial switch operation in patients with a TGA. In a knowledge-based reconstruction, the input is compared to a database that contains information on the 3D model of the place of interest and tries to reconstruct based on this database.30 The knowledge-based reconstruction was compared to the gold standard, which is the Simpson’s method. The Simpson’s method is a geometric model in which the right ventricle is calculated based on the sum of a cylinder (base of the heart to the tricuspid valve).32 The accuracy of the end-diastolic volume (82%), end-systolic volume (93%), and EF (73%) were compared. Knowledge-based reconstruction is a feasible, accurate, and fast method compared to the gold standard for measuring right ventricle volumes in patients after arterial switch operation.30
One article was found using ML on ECG of patients with CHD. In this article by Yang et al.,13 the authors aimed to distinguish atrial septum defect from patients with nonatrial septum defect and healthy controls’ ECG. The QRS and T wave measurements from lead I, lead II, and all precordial leads were used as input. A SE of 91.4% and SP of 91.7% was found, with an accuracy of 91.5% using an ANN.
In the classification and prediction models, the ML algorithms were used to predict clinical deterioration, to classify surgical risk, or to classify the heart disease using patient characteristics. If the output of the network is categorical, it will make a prediction model. If the output has discrete values, the algorithm will do a classification of the data.33 Ruiz-Fernandez et al.8 found an accuracy of 99.9% in classifying the risk of mortality in paediatric surgery using the multilayer perceptron algorithm. The goal of this study was to develop a clinical decision support system to help cardiologists decide whether surgery was indicated. Ruiz et al.14 investigated early prediction of critical events in infants using a naïve Bayesian model. Thirty-four routinely collected data points, such as heart rate, CO2, and lactate, were used as input for the models. The model was able to detect future events up to 1 hour away with a SE of 84.0% and a SP of 81.0%. Diller et al.15 used DL techniques (statistical learning which extracts features from raw data) to categorise diagnostic group, disease complexity, and New York Heart Association (NYHA) class, with an accuracy of 90.2%. In addition, they also estimated prognosis of the disease of adults with all types of CHD and to decide if patients needed to be discussed in the multidisciplinary team. Lastly, Chiogna et al.31 used a decision tree algorithm to classify neonates with CHD into 27 disease classes, compared to an expert opinion. Input data consisted of routinely clinical data acquired at birth, such as ECG data, pO2, heart size based on the chest X-ray, partial pressure of CO2, and oligemic lung fields. Accuracy of 59.0% was achieved.
This review provides an overview of the possibilities of AI for patients with CHD. Although AI algorithms have been used for patients since 2001, relatively few articles have been published on this subject. However, AI algorithms are gaining popularity in healthcare and especially in cardiology.24 This is also demonstrated in this review since most articles are of relatively recent date (earliest dated 2015). In this review the authors found high SE and SP in most categories (echocardiographic data, ECG data, and in prediction/classification models), which means AI algorithms have great potential as an additional diagnostic tool in patients with CHD. However, the SE, SP, and accuracy are not yet high enough to be able to implement these algorithms safely in daily practice.
Most of the articles used ML on heart sounds, with high SP and SE. The highest accuracy (94%) was found using the ANN algorithm. Heart sound analysis is noninvasive, inexpensive to perform, and remains an important diagnostic tool in both adults and children. Overall, the techniques that were used distinguished between healthy and pathological sounds only, which might be useful as a primary screening tool. Heart sound analysis in patients with acquired heart disease also showed high SE, SP, and accuracy.34 Ari et al.35 managed to distinguish between aortic insufficiency, aortic stenosis, atrial septal defect, mitral regurgitation, mitral stenosis, or normal heart sound with an accuracy of 92%. These techniques can establish a diagnosis but do not yet determine the severity of the valve lesion. However, one could argue that heart sound analysis using ML should not be the main objective, as other noninvasive methods (echocardiography and cardiovascular MRI) are likely to be more informative if interpreted by ML techniques. The technique could be used as a screening tool by general practitioners to distinguish who should be sent to the hospital for further check-up.
Learning algorithms on noninvasive cardiac imaging (echocardiographic and MRI) shows a high accuracy when using a CNN algorithm, especially in the assessment of cardiac volumes.4 AI algorithms have been used for echocardiographic imaging since 2006, but only started gaining popularity since 2012. Asch et al.36 trained a ML algorithm to automatically estimate the left ventricular EF on a database of >50 echocardiographic studies, including the apical 2- and 4-chamber views, and were compared to the left ventricular EF as assessed by the echocardiographist or cardiologist. The ML algorithm proved less sensitive (90% versus 93%), but more specific (92% versus 87%), and accurate (92% versus 89%), which makes the algorithms highly feasible in daily practice. However, patients with CHD tend to develop problems in their right ventricle. Genovese et al.37 analysed 3D quantification of the right ventricle size and function in 56 patients receiving both cardiac magnetic resonance and 3D echocardiography exam on the same day. Echocardiographic volumes were analysed with a ML technique and compared with the cardiac MRI using the Bland-Altman and linear regression analyses. The automated ML analysis was correct in 18 patients (32%) but needed corrections in the remaining 38 patients. Although an intraclass correlation coefficient of 97% could be reached for the end-diastolic volume, 98% for the end-systolic volume, and 95% for the EF, the accuracy of a ML algorithm remains strongly correlated with the image quality. It seems likely that with increasing image quality, ML algorithms for its interpretation will become more reliable.
As with the echocardiography, cardiac MRI combined with ML has been gaining popularity and is already being used in daily practice. Ruijsink et al.38 tried to analyse the cardiac magnetic resonance imaging using DL algorithms to automate ventricular function assessment for both ventricles, and reached SE of 95%, SP of 83%, and an accuracy of 89%. In the technique described by Nyns et al.,30 the right ventricle was automatically analysed, but the key anatomical landmarks needed to be selected beforehand. More research is needed to evaluate if the DL algorithm could also make this process quicker with comparable SE, SP, and accuracy.
Remarkably, the sole article on ECG in CHD patients dates from 2002, although a lot of ML is conducted on ECG data in the general cardiac population33,39-43. Adult patients with CHD of 10 experience arrhythmias, which makes them a suitable group to use ML techniques with to predict events.44 However, the baseline ECG recordings of these patients often already has an abnormal appearance and differs between patients with the same congenital heart defect, further complicating the analysis of the ECG. In patients with acquired heart disease, the most analysed arrhythmia is atrial fibrillation. Using ML, ECG characteristics during sinus rhythm can be determined to establish the presence of an atrial fibrillation signature during sinus rhythm, with high SE (79%), SP (80%), and accuracy of (79%).20 This algorithm could be used in patients with CHD as it seems suitable in left and right bundle branch block, premature ventricular contraction, atrial premature beat, and paced beat. Further research on ML ECG interpretation in patients with CHD is warranted and seems feasible.
ML can also be used to make prediction or classification models, which are used to determine in which group a specific outcome would fit. In patients with acquired heart disease the prediction models are mostly used to determine an outcome after surgery or to make a definitive risk model; however, these models perform poorly in predicting outcomes.45,46 The model by Ruiz et al.14 in 2016 gave a low accuracy when classifying different diagnoses of congenital heart defects based on a questionnaire. The model made by Ruiz-Fernandez et al.8 in 2019 could be used in clinical practice because of the high accuracy, but no SE or SP is given. If the SE, SP, or accuracy is low, more research must be carried out or alternative endpoints must be chosen. A solution could be found by comparing the results to human analysis; if it is better than the current gold standard, it could be implemented in clinical practice. However, an accuracy, SE, and SP above 95% should be pursued before implementation is preferred.
The use of AI algorithms in cardiology has gained enormous interest in recent years and is predicted to grow even more in the upcoming years. In patients with acquired heart disease, ML and DL is already being used for imaging modalities and outcome prediction. In patients with CHD, on the other hand, the authors found only 18 articles on learning algorithms. These algorithms have high potential in the population of patients with CHD. Cardiac evaluation in the hospital with ECG and imaging techniques are frequent and a great amount of data is generated from these patients. Moreover, patients with CHD are vulnerable to cardiac morbidity and mortality and often experience complications. Prediction of deterioration in these patients could save lives.
There are a lot of different AI algorithms, with none being superior over the others for their specific task. The authors found that most techniques gave comparable SE, SP, and accuracy. It is more important to choose the right patient features or input data and perform preprocessing of the data to be used.47-49 Because a large number of different algorithms were used, it was difficult to compare them. It is important avoid deviation in the different algorithms to keep track of the focus: implementation in the clinical setting. In an ideal situation, the authors would like to have an algorithm that would be able to calculate the same values (in case of calculating ventricle volumes) as an imaging physician would. Bigger and more diverse datasets are needed to train the algorithms better and to be applicable to every patient in clinic. A limitation of current algorithms is that even though the accuracy, SE, and SP can be high, mistakes can still occur. This raises questions regarding the extent to which physicians should trust the algorithm and what safeguards are in place if the conclusion of the algorithm is incorrect. However, these questions are beyond the scope of the current review.
AI algorithms are increasingly applied in healthcare and draw a lot of attention given the large potential the technology promises. Results of recent studies on AI algorithms in patients with CHD indeed show promising results, as the algorithms aid analysis of ECG, cardiac imaging, and helps to predict outcomes. However, current data on AI algorithms in patients with CHD is still limited and larger scale studies are warranted to provide algorithms that could assist physicians better in the future with high SE, SP, and accuracy.