1 Introduction
In cardiovascular magnetic resonance (CMR) imaging, static cardiac imaging techniques, such as late gadolinium enhancement (LGE) (Kellman and Arai, 2012; Kellman et al., 2002; Akçakaya et al., 2012; Basha et al., 2017), mapping (Kellman and Hansen, 2014; Messroghli et al., 2017; Aherne et al., 2020), or three-dimensional (3-D) whole heart coronary angiography (Munoz et al., 2020; Greil et al., 2017; Cruz et al., 2017; Forman et al., 2014) are increasingly being performed to qualitatively and quantitatively assess the cardiac anatomy and function. It is important to acquire the data during the phase of the cardiac cycle with least motion, called a resting phase (RP), especially mid- or end-diastolic (ED) phases (Kramer et al., 2020; Isma’eel et al., 2009; Kramer et al., 2013), or in patients with a fast heart rate during the end-systolic (ES) phase.
In standardized CMR protocols (Kramer et al., 2020, 2013), the guidelines recommend using the diastolic RP with a duration of less than 200 ms as the acquisition window for static cardiac imaging. In certain situations, e.g., high heart rate or patients with arrhythmias, especially in terms of mapping acquisition, the systolic RP is preferably chosen. As outlined in (Kim et al., 2001; Seifarth et al., 2007; Hofman et al., 1998), electrocardiogram-based heuristics enable the ED phase selection based on a trigger time at 75% of the RR interval for most patients, however it can be suboptimal due to the magnetohydrodynamic effect (Abi-Abdallah et al., 2007), and not generalizable, especially for patients with high or irregular heart rates. For advanced applications such as high-resolution angiography, accurate determination of RP is necessary. As different structures in the heart rest at different times of the cardiac cycle, ideally a targeted RP for the anatomy of interest should be determined. For coronary angiography, for example, it is suggested to accurately determine the RP of the right coronary artery (RCA) (Kramer et al., 2020, 2013; Shechter et al., 2005; Johnson et al., 2004; Wang et al., 2001).
The selection of the RPs is typically performed based on visual inspection on a CINE series acquired prior to the static imaging. In current clinical practice, an expert is required to select either the end-systolic, mid- or end-diastolic phase for acquisition. To tackle the complex and time-consuming manual task of RP selection for the static cardiac imaging, some previous studies have been introduced to perform the RP determination automatically. In previously conducted studies (Wang et al., 2001; Stuber et al., 1999), a calibration scan-based approach using navigator echoes has been presented, however this approach requires significant user experience and interaction in order to accurately plan the navigator position. A threshold-based clustering algorithm was proposed to track low-motion periods (Suever et al., 2011). However to track the RCA area, manual tracking is required. Another approach (Jahnke et al., 2005) was proposed using image based cross-correlation of CINE series for the automatic selection of RPs, proving to be advantageous in terms of image quality. An extension of the previous method (Ustun et al., 2007) calculates the myocardial displacement from the cross-correlation calculation. These methods, however, also require user interaction to position the region of interest (ROI) enclosing the heart. An alternative technique with automated RCA positioning (Sato et al., 2009) using template matching algorithm was proposed to automatically select RP based on image intensity differences. The intensity difference calculation as well as the template matching algorithm can be sensitive to artifacts. Further, a method attempts to determine the cardiac motion resolution-independently (Huang et al., 2014) using intensity standard deviation calculation, and additionally proposed the motion extraction based on deformable model-based registration. Another idea was proposed in which the feasibility of motion correction algorithm for quantifying the rest periods of the coronary arteries is shown (Shechter et al., 2005). However, the computed RP is based on the entire field-of-view and do not provide localized RPs for specific anatomies, and the detection is limited to two RPs due to the two local minima search. In a recent study (Asou et al., 2018), an automated RP selection algorithm was introduced based on a motion area map generated from the high-speed component of the motion within a CINE series, however the high-speed component is not necessarily related to the anatomical structures.
Deep Learning approaches have the potential to automate clinical workflows, and the state-of-the-art methods using convolutional neural networks (CNN) are currently used for image localization and segmentation (Krizhevsky et al., 2012; Szegedy et al., 2015; Simonyan and Zisserman, 2014; Ronneberger et al., 2015). The CNN-based models are particularly used for learning the optimal spatial features from input data, especially images, thus performing a specific task such as localization or segmentation. Beside the power of CNN models for learning the spatial information, 3-D convolutional operators are used to learn spatial and temporal information for capturing features in a 3-D input data (Ji et al., 2012; Tran et al., 2015). Registration approaches are commonly used when it comes to motion analysis, object tracking, etc. (Dyke et al., 2019; Chefd’Hotel et al., 2002; Szeliski and Shum, 1996; Spinei et al., 1998; Rueckert et al., 1999), by estimating a smooth correspondence function mapping between the coordinates from a reference image and those in a target image. These techniques can be used for calculating the motion of a target with the deformation fields within CINE series quantitatively.
In this work, we propose a fully automated prototype system combining the advantages of the 3-D CNN and registration algorithms for detecting localized RPs of the RCA from 4-chamber view (4CH) CINE series. As the CINE series are time-resolved images, a 3-D CNN based model is trained to perform landmark detection over the cardiac phases. The proposed system combines the deep neural network for landmark detection and a registration algorithm. The motion within a localized anatomy is quantified in order to automatically classifying the systolic and diastolic RPs of the RCA. The duration of the RPs is quantified in both cases. To test the robustness and feasibility, the proposed framework was integrated into the scanner software and validated on patient data acquired on 1.5T and 3T scanners at multiple centers and different CINE sequences.
2 Methods
2.1 System Overview

The proposed prototype system consists of three main steps that are executed consecutively (Figure 1). The first step is to localize the ROI (see details in section 2.2) from the input which is a 4CH CINE series. ROI can be chosen for any anatomical structures, that are of interest displayed in 4CH images, such as the RCA. The localization can be performed by neural networks trained for landmark detection tasks. In case of landmark detection, the output can be pixel coordinates that are used for cropping the image to the localized anatomy (see details in section 2.3). The cropped series containing the ROI are used for motion quantification (see details in section 2.4) by performing the elastic image registration (Chefd’Hotel et al., 2002). By taking the median of the magnitude of the deformation fields from consecutive frames, the motion values over time points are calculated. Finally, the RPs are classified by an absolute threshold, defined based on the correlations of the predicted RPs by varying the thresholds with the expert annotations (see details in section 2.5). The frames which have lower motion values than the threshold value, are selected as RPs. The detected RPs are then used to plan the subsequent static image acquisition.
2.2 Localization
In this work, the landmark detection neural network is built to automatically detect the location of RCA in a 4CH time-resolved series. The densely connected neural network (3-D DenseNet) architecture is trained to regress the x- and y- coordinates of the RCA over time and it is described in detail here. As a preprocessing step, the 4CH CINE series are interpolated to a fixed spatial and temporal size of to be independent from resolution. Further, the min-max pixel intensity normalization was applied to rescale the different intensity range in [0,1]. The 3-D DenseNet proposed in (Huang et al., 2017) was trained under supervised learning. The weights of the network were updated by using the Adam optimizer (Kingma and Ba, 2014) with and the mean-squared-error loss function (MSE) as follows:
(1) |
Layers | Output Size | 3-D DenseNet |
---|---|---|
Convolution | Conv, stride | |
Pooling | max-pool, stride | |
Dense block | ||
Transition block | Conv, avg-pool, stride | |
Dense block | ||
Transition block | Conv, avg-pool, stride | |
Dense block | ||
Transition block | Conv, avg-pool, stride | |
Dense block | ||
Classification block | 3-D adaptive avg-pool, Conv |
where is the predicted pixel coordinates at the time from dataset, the ground truth, the number of frames, and the number of the datasets. The ground truth is generated in a semi-supervised manner, where the RCA pixel coordinates in the first frame are manually annotated and propagated to the next frames using the deformation fields describing the displacement between and . The deformation field are generated by using the elastic image registration (Chefd’Hotel et al., 2002). Each frame was then corrected manually if the propagated coordinates were not accurate. The total number of convolutional layers (Conv) is 122, and before each Conv a 3-D batch normalization (BN) (Ioffe and Szegedy, 2015) and rectified linear unit (ReLU) activation functions (Nair and Hinton, 2010) are applied. After the initial 3-D Conv and max pooling operator, the feature maps were forwarded through 4 concatenated dense blocks (DB) and transition blocks (TB). The number of 4 concatenated DBs are set to 6, 12, 24, 16, in which after each DB, a TB is applied. In each DB, 2 Convs, each followed by 3-D BN and ReLU operators with the increase of the feature maps with 12 are applied. Each layer obtains additional inputs from all preceding layers and forwards the feature maps to all subsequent layers. In each TB, a Conv with BN and ReLU followed by 3-D average pooling operator is applied for spatial and temporal down-sampling. As the last step, a global average pooling and convolutional operator are used to regress the coordinates from the extracted features maps. The detailed architectural details can be found in the Table 1.
2.3 Cropping
From the output of the localization task, a ROI can be simply selected from the pixel coordinates. The detected pixel coordinates are transformed back to the original coordinate system, and then the cropping is performed. Given the predicted pixel coordinates of RCA by 3-D DenseNet, the bounding box is defined by taking the minimum and maximum x- and y-coordinates of the points in the coordinate plane from a time-resolved series and calculating the average of these x and y-coordinates. The size of the bounding box is selected based on prior knowledge about the size of the anatomy, chosen as .
2.4 Motion Quantification
The motion values are quantitatively determined using elastic image registration (Chefd’Hotel et al., 2002). Consecutive frames of the CINE series, and for all timepoints , are registered to obtain deformation fields such that minimizes the dissimilarity measure related to . The motion curve describing the amount of RCA motion is then computed as the median of the weighted magnitudes of the deformation fields as follows:
(2) |
where is a Gaussian weighting function centered at the midpoint of the detected location of the RCA between and at the time point t:
(3) |
while denotes the midpoint of the detected RCA position. This Gaussian weighting ensures that the motion curve represents mainly the motion of the RCA, while still being robust to slight imprecisions of the localization results. The standard deviation was empirically chosen as . Figure 2 shows the Gaussian weighting functions overlaid on the anatomical images at each time point, as well as the weighted deformation fields corresponding to subsequent image pairs. The quantification of motion can be considered in different ways, and the detailed analysis of obtaining the RCA motion values can be found in section 3.2.

2.5 Resting Phase Classification
After the motion curve is quantified, a classification is required to obtain the RP. This classification is performed within a time interval , i.e. the first and the last ms of a cardiac cycle are excluded. accounts for the time needed for preparation pulses before the data acquisition window can start. is a safety margin at the end of the cardiac cycle to account for heart rate variability, i.e. the detected resting phase should not be too late in the cardiac cycle in case subsequent cardiac cycles during the 3-D acquisition are shorter, which can lead to unstable measurements (Kim et al., 2001; Seifarth et al., 2007). Based on the ground truth annotation of RPs, the , and are empirically chosen to be 80 ms. The RPs can be determined with an absolute threshold from the motion values. The frames with motion values lower than the absolute threshold value are assigned as RPs which can be described as follows:
where is the absolute threshold value, and is the obtained motion value at trigger time . The threshold value is obtained based on sensitivity (true positive rate) and specificity (true negative rate) analysis from manual annotations. The optimal absolute threshold is chosen as the one achieved with best balanced accuracy.
2.6 Data
2.6.1 Training and Validation Dataset for the RCA Detection Network
Data used for training and evaluating the RCA detection network was acquired on 1.5T and 3T clinical MRI scanners (MAGNETOM Aera, Avanto, Prisma, Skyra, Trio TIM; Siemens Healthcare, Erlangen, Germany) at multiple centers (n=1000). The dataset was split into 70% training, 15% validation and 15% testing set for the RCA detection. Details about the datasets used for training, validating, testing the RCA detection network, and evaluating the classified resting phases are shown in Table 2. A medical expert with more than 10 years of cardiac MRI experience manually annotated the RPs on 76 datasets from the testing set for the analysis of the system.
2.6.2 Study Dataset
Data used for evaluating the proposed system was acquired on 1.5T and 3T clinical MRI scanners (MAGNETOM Aera, Avanto fit, Skyra, Skyra fit, Sola, Vida; Siemens Healthcare, Erlangen, Germany) at multiple centers (n=102). The proposed system was integrated into the scanner software and tested online. The dataset from the study was not used for training, testing the RCA network or the threshold analysis, and was not mixed with the RCA Detection Network dataset. The test datasets consist of non-pre-selected patient data with a minimum heart rate of 35 and a maximum heart rate of 97 who are clinically referred for coronary assessment. None of the patients have not received beta-blockers or a target heart rate was determined. Details about the study datasets are listed in Table 2.
RCA Detection Network Dataset (Resting Phase annotation) | Study Dataset | |
Number of patients | 1000 (76) | 102 |
Age | 55.0 19.0 (59.0 17.2) | 39.3 10.9 |
Gender | 64 % male (68% male) | 69% male |
Heart Rate [bpm] | 64.7 11.6 (68.0 13.2) | 70.1 12.3 |
Field Strength | , (, ) | , |
Spatial Resolution | 1.4 0.1 (1.5 0.2) | 1.7 0.1 |
Temporal Resolution | 33.8 9.9 (35.8 6.8) | 37.3 7.8 |
FOV [mm x mm] | 311.3 30.6 344.4 26.9 (312.2 31.7 357.4 23.6) | 283.5 9.2 345.9 8.9 |
Number of Frames | 25.8 2.3 (25.6 1.6) | 26.0 2.0 |
2.7 Experiments
2.7.1 Localization
The RCA detection is validated by calculating the mean and standard deviation of the Euclidean distance between the predicted pixel coordinates and the ground truth pixel coordinates.
(4) |
where denotes the number of annotated test dataset, the number of time frames in each CINE series and the predicted and the ground truth RCA position. The robustness and performance of the network was qualitatively validated on 12 oblique and diverse oriented 4CH CINE and 9 clinically acquired unseen study dataset with different field strength scanners. Further, the network was evaluated by a box plot showing the performance of the prediction at each frame.
2.7.2 Motion Quantification
In order to find the best approach to quantify motion values, several approaches were evaluated on the annotated datasets for quantifying RCA motion from a cropped CINE series. The first approach is to quantify motion based on the distance between detected pixel coordinate over each adjacent time point as follows:
(5) |
The second is to aggregate the magnitudes of the deformation fields within the ROI without the Gaussian weighting using percentile or mean:
(6) |
(7) |
where is the percentile.
Our last proposed approach is to aggregate the weighted deformation field magnitudes to calculate the motion values as described in the following:
(8) |
(9) |
where is the magnitudes of the deformation fields. The percentile analysis is performed to quantify motion from the deformation fields, which is based on calculating the balanced accuracy, sensitivity and specificity by varying from 10 to 100 by 10 percentile steps. Further, the mean value is calculated as well, and compared with the percentile analysis. In addition, the motion values of the above mentioned 9 clinically acquired dataset are extracted and qualitative validated.
2.7.3 Threahold Analysis
The analysis for finding the optimal threshold value to determine the RPs from the quantified motion value is performed by the binary classification task with varying the threshold from 0.01 to 1 by 0.01 steps by calculating the sensitivity and specificity. The analysis is performed separately for 1.5T and 3T and using the annotated datasets. The performance with the selected threshold is analyzed based on area under curve (AUC) from the receiver operating characteristic (ROC) curve (Swets, 1979), and confusion matrices are evaluated on testing and study datasets. For all different approaches of motion quantification, the threshold analysis is performed such that each approach can be fairly compared. Based on thresholding, the accuracy of classified RPs is evaluated as described in section 2.5.
2.7.4 RP Classification
To evaluate the performance of RP classification, the mean absolute error (MAE) and the standard deviation between the system predicted and annotated start and end time points of systolic and diastolic RP are calculated as follows:
The number of RP annotated dataset is denoted as , and the type can be the classified systolic (sys) or diastolic (dia) RP. The performance was validated by two different measures, firstly the difference of the time window and secondly, the number of images between the predicted and ground truth annotation. The time window specifies the accuracy in milliseconds, whereas the frame in number of images. The validation was performed on the testing datasets in which the RPs are manually annotated by a medical expert used for the threshold analysis. The performance of the system was analyzed based on Bland-Altmann analysis. The RPs was not counted when it was very short (<30 ms, n=10), i.e., not resolvable by the temporal resolution of the acquisition. Additionally, the results of the system validation were presented by sensitivity, specificity and accuracy. To overcome the imbalanced classes (RP, no RP), the balanced accuracy is calculated based on true positive rate, and false true negative rate as follows: Accuracy = , where TPR is true positive rate, and TNR true negative rate. Further, the ranges of each annotated RP type and predicted RP type were compared.
To evaluate the robustness of the proposed system, different CINE sequences (Cartesian segmented, Cartesian segmented with small field-of-view, Cartesian segmented Compressed Sensing Prototype (CS), Cartesian Real-time CS, Radial real-time) were acquired and the predicted RPs of each sequence were compared with each of the expert annotation. Further, the computation time of the proposed system was measured at the beginning and the end of the proposed system.
3 Results
3.1 Localization
The mean and standard deviation error between the prediction and ground truth of the fully convolutional 3-D DenseNet with 122 layers was 4.6 ± 1.8 mm. The box plot in Figure 5, shows the Distance Error in mm between the and the in each frame. Robustness results for unseen datasets are presented in two ways, first in Figure 3 which shows the performance on oblique and diverse oriented cases (n=12) and second in Figure 4, which visualized the worst cases. The quantitative localization results of a part of the study dataset (n=9) acquired with breath-hold and free-breathing CINE sequences are shown in Figure 6 above. The first frame of each CINE series and the corresponding RCA cropped series are shown. Each case was visualized with the first frame of the corresponding CINE series marked with a red box showing the position of ROI defined based on the network prediction and beside it with the cropped series enclosing the area with the outer edge of the cross-section of the RCA overlayed with the generated heatmap.





3.2 Motion Quantification
Metric | Percentile | Accuracy | Sensitivity | Specificity | Threshold [a.u.] |
---|---|---|---|---|---|
- | 71.4 | 50.5 | 92.3 | 0.50 | |
10 | 75.1 | 55.5 | 94.6 | 0.08 | |
20 | 81.5 | 67.3 | 95.8 | 0.13 | |
30 | 84.1 | 72.2 | 96.0 | 0.18 | |
40 | 86.9 | 77.6 | 96.2 | 0.23 | |
50 | 87.2 | 78.1 | 96.4 | 0.28 | |
60 | 87.7 | 80.1 | 95.3 | 0.34 | |
70 | 86.5 | 76.2 | 96.7 | 0.38 | |
80 | 83.2 | 69.0 | 97.5 | 0.42 | |
90 | 81.7 | 65.8 | 97.7 | 0.49 | |
100 | 78.3 | 61.6 | 95.1 | 0.68 | |
- | 89.3 | 84.7 | 93.9 | 0.34 | |
10 | 62.4 | 29.5 | 95.2 | 0.01 | |
20 | 72.8 | 51.9 | 93.6 | 0.03 | |
30 | 82.7 | 74.2 | 91.2 | 0.07 | |
40 | 87.4 | 80.8 | 94.0 | 0.12 | |
50 | 90.1 | 85.4 | 94.8 | 0.20 | |
60 | 88.0 | 78.9 | 97.2 | 0.28 | |
70 | 85.8 | 74.4 | 97.2 | 0.42 | |
80 | 85.1 | 73.4 | 97.0 | 0.62 | |
90 | 83.6 | 71.2 | 96.0 | 0.98 | |
100 | 66.4 | 33.2 | 99.5 | 1.0 | |
- | 86.2 | 76.6 | 95.8 | 0.36 |
The motion values obtained by calculating the distance between the predicted pixel coordinates in each adjacent time points achieved 61.1% accuracy for 1.5T, and 52.8% for 3T. As the motion quantification analysis in the Table 3 shows, the 50 percentile/median of Gaussian weighting achieved 90.1% accuracy, whereas the accuracy was 87.2% without weighting the deformation field. The median performed 91.0% accuracy for 1.5T and 88.9% for 3T. The best accuracy achieved by taking the mean metric was 89.3%, without Gaussian weighting. The motion values quantified based on the median approach with Gaussian weighting in 9 clinically acquired dataset are shown in Figure 6 bottom.
3.3 Threshold Analysis
For each percentile and mean analysis, the threshold selected based on binary classification task is listed in the right column in the Table 3. From the motion values obtained by taking the median, the selected was 0.2. The resulting ROC curve is plotted in Figure 7, and the accuracy over each threshold step is plotted in Figure 7 in the top row, in which the threshold is marked by an orange vertical line. On the below row in Figure 7 shows the confusion matrices of each annotated datasets. On the above one, the performance of the threshold on the testing dataset is shown, while on the below one the performance of the threshold on the study dataset is displayed.

3.4 RP Classification
Magnetic Field | Number datasets | Threshold | Start Systolic RP | End Systolic RP | Start Diastolic RP | End Diastolic RP |
---|---|---|---|---|---|---|
N=22 | 0.20 | 14.4 19.8 | 15.0 16.9 | 5.7 13.1 | 12.2 16.1 | |
N=80 | 0.20 | 19.7 22.4 | 12.2 16.6 | 15.8 21.3 | 8.9 15.2 | |
& | N=102 | 0.20 | 18.7 22.1 | 12.7 16.7 | 13.2 20.0 | 9.7 15.5 |
Magnetic Field | Number datasets | Threshold | Start Systolic RP | End Systolic RP | Start Diastolic RP | End Diastolic RP |
---|---|---|---|---|---|---|
N=22 | 0.20 | 0.46 0.63 | 0.54 0.63 | 0.16 0.36 | 0.37 0.48 | |
N=80 | 0.20 | 0.70 0.82 | 0.43 0.59 | 0.49 0.63 | 0.28 0.49 | |
& | N=102 | 0.20 | 0.66 0.80 | 0.45 0.60 | 0.40 0.59 | 0.31 0.49 |
Magnetic Field | Number datasets | Threshold | Accuracy | Sensitivity | Specificity |
---|---|---|---|---|---|
N=22 | 0.20 | 93.4 | 90.1 | 96.8 | |
N=80 | 0.20 | 92.6 | 90.7 | 94.5 | |
& | N=102 | 0.20 | 92.7 | 90.5 | 95.0 |
The detailed results about the performance of the predicted systolic and diastolic RP on the study datasets are listed in Table 4, Table 5. The Bland-Altmann plots showing the performance of start and end detected time point for each systolic and diastolic RP is shown in Figure 8. and for 1.5T datasets was 11.8 ± 16.5 ms (0.38 ± 0.53 frame) and 14.2 ± 18.9 ms (0.48 ± 0.63 frame) for 3T datasets. By using the selected , the proposed system resulted in 93.4% accuracy, sensitivity at 90.1% and specificity 96.8% for 1.5T and 92.6%, 90.7% and 94.5% for 3T datasets. and was 13.6 ± 18.6 ms (0.46 ± 0.62 frame) when using the datasets independent from field strength for analysis. The accuracy, sensitivity and specificity achieved by the defined absolute , <0.2, was 92.7%, 90.5% and 95.0%. The automatically classified RPs resulted in a mean max error of 30 ms, meaning that it deviates by roughly one frame.

The datasets with RPs with less than 30 ms were discarded (n=9). Further, there was no systolic RP annotated by the expert in 14 cases, and no diastolic RP in 12 cases. These phases were excluded from the analysis. The matched with the annotation, or was off one frame in 93.6%, the in 97.5%, the in 93.8% and in 96.3%, respectively. was detected earlier/later than the expert’s annotation in 27.8%/5% and in 20.9%/16.0%. was detected earlier/later than the expert’s annotation in 11.4%/29.1% and in 15.2%/14.8%. In average, the was selected earlier than the ground truth in 24.3% of the cases and later in 10.6%. In 10 cases, outliers were present, off by 2 or more frames. In 10 cases, there were outliers present for the , off by 2 or more frames. For the , it was off by 2 or more frames in 5 cases.
The range of annotated systolic RP was 61.1 ± 24.1 ms and the predicted range of systolic RP was 75.5 ± 32.9 ms. Further the range of annotated diastolic RP was 156.0 ± 102.1 ms and the predicted range was 158.2 ± 104.3 ms.

In Figure 9, the robustness of the proposed system is shown in which the system was tested in different sequences including a rescan from a single volunteer. The visualized different CINE outputs are acquired in the order from top to bottom and with 5min between the first and last acquisition. The predicted start and end systolic phases were matched with annotation in most cases, except in one CINE sequence, where the systolic RP was detected by the system but not by the expert, and in a repeat scan, the end time point was off one frame. The start diastolic RP was detected one frame earlier in 2 cases, and end diastolic RP was detected two frames off in real-time sequences. In an example case, the automatically detected RPs were used for the later 3-D static cardiac acquisition targeted to the RCA. The 3-D RCA visualization with the automatically classified RPs showed no residual motion artifacts (Figure 10). The computation time of the proposed system was averaged 1.5 seconds.

4 Discussion
The detection of the RCA ROI is successfully and robustly performed by the 3-D DenseNet on the testing dataset and on the study cases. The 3-D based Conv networks leverage the spatial and temporal information from the time-resolved input, rather than learning only spatial information per time point. The size of the fixed ROI (mm2) was sufficient in all cases for depicting the RCA at each time point. Further, the network was robustly performed on diverse oriented cases (Figure 3) and different CINE sequences (Figure 9) which allows the proposed system to be integrated into different clinical protocols.
In terms of quantifying motion values, the approach of taking the Euclidean distance from the predicted RCA pixel coordinate over cardiac phases highly relies on the performance of the network, furthermore, taking the pixel distance measurement for the displacement metric of the anatomy-of-interest between the consecutive frames was not accurate as shown in Table 3. The approaches deriving from the motion values by taking the deformation fields defined by the elastic image registration show a clear advantage, from which the highest accuracy was the one using the weighted deformation fields (see in Table 3). The approach with deformation fields is clearly more robust to slight inaccuracies of localization results. The Gaussian weighting further improves the performance of the system, as it allows to focus on the target-of-interest, and eliminates the area which is not of interest, such as the blood flow in the atrium contained in the RCA ROI.
The metric for assessing the motion values from the weighted deformation fields was reasonably chosen as 50 percentile by the accuracy analysis. The absolute threshold value is selected based on AUC-ROC analysis, evaluated on the testing datasets from the different 1.5T, 3T scanners and CINE sequences and no specific data selection, thus shows versatility in the results (Figure 7). The classification of the RP by the absolute threshold value is possible due to the quantitative outputs of the deformation field-based approach which enables the detection of the phases with minimal motion, which can be either end-systolic or mid-diastolic or end-systolic and mid-diastolic RPs. In Figure 7, the quantified motion values of the clinically acquired dataset with different CINE sequences and the corresponding predicted RPs were well matched with the expert’s annotation. As shown in Table 4, Table 5, the proposed framework performed robust in different field strengths as well. Interestingly, in the dataset visualized in the center, there was no RP found by a medical expert, and the proposed system was also able to classify the non-RP case demonstrating the advantage of the system. In such cases, the system gives the user the possibility to take the quantified curve as a reference and select the phase with minimum motion based on the motion curve.
Based on the Bland-Altmann plots (Figure 8), the systolic RP did not perform as well as the diastolic RP, especially the classification of the start systolic RP was challenging. The predicted systolic RP window was usually slightly longer than the experts’ annotations. However, the mean range difference was 15ms, which is negligible since the temporal resolution of CINE series is 30ms. In a recent study, the proposed system was evaluated in a clinical validation, in which the interobserver variability was extensively validated and revealed that the automatically detected RPs were consistent with those of the experts (Ogawa et al., 2022). As shown in Figure 10, the RCA was sharply visualized without severe residual motion artifacts that was acquired during the automatically detected RPs.
The study presents some limitations. First, the proposed method depends on two factors: the performance of the RCA localization network and the registration algorithm that may lead to inaccurate results in the presence of severe artifacts in reconstructed CINE series, even though the network performed well on difficult cases as shown in Figure 3 and Figure 4. Second, the absolute thresholding used for classification, can be further investigated, whether other algorithms can be used for binary classification. Third, in this study, the focus was highly on method evaluation on patient dataset, however the clinical feasibility and interobserver variability study using this method was performed in (Ogawa et al., 2022). In the following research, a further clinical study can be of interest to compare the image quality between once acquired with the automatic approach and the other with manually determined resting phases.
Previous methods have shown the feasibility to detect the imaging acquisition window automatically using the shim box volume positioning and cross-correlation calculation (Jahnke et al., 2005; Ustun et al., 2007). However, these semi-automated methods still require the careful positioning of the shim volume coverage. Further, the RP detection is targeted to the whole heart instead of a specific anatomy of interest. These methods were validated on healthy in vivo subjects on a single field strength. Moreover, approaches based on the standard deviation of pixel intensity (Huang et al., 2014) or difference of gradient magnitudes (Piccini et al., 2017) were introduced, allowing the RP detection in real time. This method however performs the RP detection globally based on the entire field-of-view, and the detected RPs are always two RPs as the search is done by two local minima. Several approaches have been proposed for the automated determination of targeted RP on regions such as RCA (Sato et al., 2009; Asou et al., 2018). A template matching was performed for finding the area with the outer edge of the cross-section of the coronary artery, however the template was defined based on randomly selected five datasets (Sato et al., 2009). In a more recent study, the regions were extracted from the high-speed component within CINE series by the frequency domain analysis, however by doing so, the cardiac anatomy structures can be disregarded. The authors stated that this method was validated on healthy volunteers, and uncertain the performance on large dataset especially with high heart rates. In Adam et al. (2020), the deep learning-based RP detection network is built by combining the CNN and Long short-term memory models taking the CINE series as input, and outputs the binary output which is either a RP or not a RP. Therefore, this method is not quantitative and unclear which cardiac structure is weighted, or whether the network tries to detect global RP.
5 Conclusions
To our knowledge, this proposed study is the first to present the fully automated localized RP detection framework from a CINE series that was validated with a large dataset with multiple 1.5T and 3T scanners acquired with different CINE protocols, such as with free-breathing or breath-held techniques. We investigated the robustness and feasibility of the proposed system for fully automated systolic and diastolic RP detection. The proposed system can improve the workflow efficiency, automation, and standardization of the static cardiac imaging that broaden the applicability towards any static cardiac imaging. The RP detection system can be applied in various applications, such as 2-D and 3-D LGE, mapping, 3-D coronary imaging, or any other applications in which the information of RP of heart can be useful. Future work will focus on clinical validations, improving the accuracy of RP classification and integration of automatic detection of other regions, such as the atria and ventricles.
Acknowledgments
This work was supported by Siemens Healthineers GmbH.
Ethical Standards
The local ethics committee approved the study, and a written informed consent was provided by all subjects. The declaration of consent to participate in the study was signed by all subjects.
Conflicts of Interest
The authors declare no conflict of interest and no financial disclosure in relation to this study. A. Maier is Associate Editor of the Journal of Machine Learning for Biomedical Imaging (MELBA). Seung Su Yoon, Elisabeth Preuhs, Michaela Schmidt, Christoph Forman, Teodora Chitiboi, Puneet Sharma, Juliano L. Fernandes, and Jens Wetzl are employees of Siemens Healthcare GmbH.
References
- Abi-Abdallah et al. (2007) Dima Abi-Abdallah, Vincent Robin, Agnès Drochon, and Odette Fokapu. Alterations in human ecg due to the magnetohydrodynamic effect: a method for accurate r peak detection in the presence of high mhd artifacts. In 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pages 1842–1845. IEEE, 2007.
- Adam et al. (2020) Naledi Adam, James Clough, Ronald Mooiweer, Phuoc Duong, Li Huang, Reza Razavi, Kuberan Pushparajah, Amedeo Chiribiri, Andrew King, and Sébastien Roujol. Fully automated detection of the quiescent phases of the cardiac cycle from cine images using deep learning. In Proceedings of the Joint Annual Meeting ISMRM-ESMRMB (28th Annual Meeting & Exhibition), IS for Magnetic Resonance in Medicine, Ed, 2020.
- Aherne et al. (2020) Emily Aherne, Kelvin Chow, and James Carr. Cardiac t1 mapping: techniques and applications. Journal of Magnetic Resonance Imaging, 51(5):1336–1356, 2020.
- Akçakaya et al. (2012) Mehmet Akçakaya, Hussein Rayatzadeh, Tamer A Basha, Susie N Hong, Raymond H Chan, Kraig V Kissinger, Thomas H Hauser, Mark E Josephson, Warren J Manning, and Reza Nezafat. Accelerated late gadolinium enhancement cardiac mr imaging with isotropic spatial resolution using compressed sensing: initial experience. Radiology, 264(3):691–699, 2012.
- Asou et al. (2018) Hiroya Asou, Naoyuki Imada, Yuichi Nishiyama, Tomoyasu Sato, and Katsuhiro Ichikawa. Automated determination of cardiac rest period on whole-heart coronary magnetic resonance angiography by extracting high-speed motion of coronary arteries. Clinical imaging, 52:183–188, 2018.
- Basha et al. (2017) Tamer A Basha, Mehmet Akcakaya, Charlene Liew, Connie W Tsao, Francesca N Delling, Gifty Addae, Long Ngo, Warren J Manning, and Reza Nezafat. Clinical performance of high-resolution late gadolinium enhancement imaging with compressed sensing. Journal of Magnetic Resonance Imaging, 46(6):1829–1838, 2017.
- Chefd’Hotel et al. (2002) Christophe Chefd’Hotel, Gerardo Hermosillo, and Olivier Faugeras. Flows of diffeomorphisms for multimodal image registration. In Proceedings IEEE International Symposium on Biomedical Imaging, pages 753–756. IEEE, 2002.
- Cruz et al. (2017) Gastao Cruz, David Atkinson, Markus Henningsson, Rene M Botnar, and Claudia Prieto. Highly efficient nonrigid motion-corrected 3d whole-heart coronary vessel wall imaging. Magnetic resonance in medicine, 77(5):1894–1908, 2017.
- Dyke et al. (2019) Roberto M Dyke, Yu-Kun Lai, Paul L Rosin, and Gary KL Tam. Non-rigid registration under anisotropic deformations. Computer Aided Geometric Design, 71:142–156, 2019.
- Forman et al. (2014) Christoph Forman, Davide Piccini, Robert Grimm, Jana Hutter, Joachim Hornegger, and Michael O. Zenge. Reduction of Respiratory Motion Artifacts for Free-Breathing Whole-Heart Coronary MRA by Weighted Iterative Reconstruction. Magnetic Resonance in Medicine, pages 1–11, 2014. doi: 10.1002/mrm.25321. URL https://www5.informatik.uni-erlangen.de/Forschung/Publikationen/2014/Forman14-ROR.pdf.
- Greil et al. (2017) Gerald Greil, Animesh Aashoo Tandon, Miguel Silva Vieira, and Tarique Hussain. 3d whole heart imaging for congenital heart disease. Frontiers in pediatrics, 5:36, 2017.
- Hofman et al. (1998) Mark BM Hofman, Samuel A Wickline, and Christine H Lorenz. Quantification of in-plane motion of the coronary arteries during the cardiac cycle: implications for acquisition window duration for mr flow quantification. Journal of Magnetic Resonance Imaging, 8(3):568–576, 1998.
- Huang et al. (2017) Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017.
- Huang et al. (2014) Teng-Yi Huang, Yu-Sheng Tseng, and Tzu-Chao Chuang. Automatic calibration of trigger delay time for cardiac mri. NMR in Biomedicine, 27(4):417–424, 2014.
- Ioffe and Szegedy (2015) Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. PMLR, 2015.
- Isma’eel et al. (2009) Hussain Isma’eel, Yasmin S Hamirani, Ramona Mehrinfar, Songshuo Mao, Naser Ahmadi, Vahid Larijani, Subu Nair, and Matthew J Budoff. Optimal phase for coronary interpretations and correlation of ejection fraction using late-diastole and end-diastole imaging in cardiac computed tomography angiography: implications for prospective triggering. The international journal of cardiovascular imaging, 25(7):739–749, 2009.
- Jahnke et al. (2005) Cosima Jahnke, Ingo Paetsch, Kay Nehrke, Bernhard Schnackenburg, Axel Bornstedt, Rolf Gebker, Eckart Fleck, and Eike Nagel. A new approach for rapid assessment of the cardiac rest period for coronary mra. Journal of Cardiovascular Magnetic Resonance, 7(2):395–399, 2005.
- Ji et al. (2012) Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 3d convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence, 35(1):221–231, 2012.
- Johnson et al. (2004) Kevin R Johnson, Salil J Patel, Amy Whigham, Alex Hakim, Roderic I Pettigrew, and John N Oshinski. Three-dimensional, time-resolved motion of the coronary arteries. Journal of Cardiovascular Magnetic Resonance, 6(3):663–673, 2004.
- Kellman and Arai (2012) Peter Kellman and Andrew E Arai. Cardiac imaging techniques for physicians: late enhancement. Journal of magnetic resonance imaging, 36(3):529–542, 2012.
- Kellman and Hansen (2014) Peter Kellman and Michael S Hansen. T1-mapping in the heart: accuracy and precision. Journal of cardiovascular magnetic resonance, 16(1):1–20, 2014.
- Kellman et al. (2002) Peter Kellman, Andrew E Arai, Elliot R McVeigh, and Anthony H Aletras. Phase-sensitive inversion recovery for detecting myocardial infarction using gadolinium-delayed hyperenhancement. Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine, 47(2):372–383, 2002.
- Kim et al. (2001) W Yong Kim, Matthias Stuber, Kraig V Kissinger, Niels Trolle Andersen, Warren J Manning, and René M Botnar. Impact of bulk cardiac motion on right coronary mr angiography and vessel wall imaging. Journal of Magnetic Resonance Imaging: An Official Journal of the International Society for Magnetic Resonance in Medicine, 14(4):383–390, 2001.
- Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Kramer et al. (2020) Christopher M Kramer, Jörg Barkhausen, Chiara Bucciarelli-Ducci, Scott D Flamm, Raymond J Kim, and Eike Nagel. Standardized cardiovascular magnetic resonance imaging (cmr) protocols: 2020 update. Journal of Cardiovascular Magnetic Resonance, 22(1):1–18, 2020.
- Kramer et al. (2013) CM Kramer, J Barkhausen, SD Flamm, RJ Kim, and E Nagel. Society for cardiovascular magnetic resonance board of trustees task force on standardized p. standardized cardiovascular magnetic resonance (cmr) protocols 2013 update. J Cardiovasc Magn Reson, 15(1):91, 2013.
- Krizhevsky et al. (2012) Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25:1097–1105, 2012.
- Messroghli et al. (2017) Daniel R Messroghli, James C Moon, Vanessa M Ferreira, Lars Grosse-Wortmann, Taigang He, Peter Kellman, Julia Mascherbauer, Reza Nezafat, Michael Salerno, Erik B Schelbert, et al. Clinical recommendations for cardiovascular magnetic resonance mapping of t1, t2, t2* and extracellular volume: a consensus statement by the society for cardiovascular magnetic resonance (scmr) endorsed by the european association for cardiovascular imaging (eacvi). Journal of Cardiovascular Magnetic Resonance, 19(1):1–24, 2017.
- Munoz et al. (2020) Camila Munoz, Aurelien Bustin, Radhouene Neji, Karl P Kunze, Christoph Forman, Michaela Schmidt, Reza Hajhosseiny, Pier-Giorgio Masci, Martin Zeilinger, Wolfgang Wuest, et al. Motion-corrected 3d whole-heart water-fat high-resolution late gadolinium enhancement cardiovascular magnetic resonance imaging. Journal of Cardiovascular Magnetic Resonance, 22(1):1–13, 2020.
- Nair and Hinton (2010) Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In Icml, 2010.
- Ogawa et al. (2022) Ryo Ogawa, Tomoyuki Kido, Yasuhiro Shiraishi, Yuri Yagi, Seung Su Yoon, Jens Wetzl, Michaela Schmidt, and Teruhito Kido. Neural network–based fully automated cardiac resting phase detection algorithm compared with manual detection in patients. Acta Radiologica Open, 11(10):20584601221137772, 2022.
- Piccini et al. (2017) Davide Piccini, Robin Demesmaeker, Gabriella Vincenti, Tobias Kober, and Matthias Stuber. Automated cardiac resting phase detection in 2d cine mr images for acquisition window selection in high-resolution coronary mri. Prod. Intl. Soc. Mag. Reson. Med, 25:2862, 2017.
- Ronneberger et al. (2015) Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
- Rueckert et al. (1999) Daniel Rueckert, Luke I Sonoda, Carmel Hayes, Derek LG Hill, Martin O Leach, and David J Hawkes. Nonrigid registration using free-form deformations: application to breast mr images. IEEE transactions on medical imaging, 18(8):712–721, 1999.
- Sato et al. (2009) Tetsuo Sato, Tomohisa Okada, Shigehide Kuhara, Kaori Togashi, and Kotaro Minato. An approach for automatic selecting of optimal data acquisition window for magnetic resonance coronary angiography. In Medical Imaging 2009: Image Processing, volume 7259, page 72592A. International Society for Optics and Photonics, 2009.
- Seifarth et al. (2007) Harald Seifarth, Susanne Wienbeck, Michael Puesken, Kai-Uwe Juergens, David Maintz, Christian Vahlhaus, Walter Heindel, and Roman Fischbach. Optimal systolic and diastolic reconstruction windows for coronary ct angiography using dual-source ct. American Journal of Roentgenology, 189(6):1317–1323, 2007.
- Shechter et al. (2005) Guy Shechter, Jon R Resar, and Elliot R McVeigh. Rest period duration of the coronary arteries: implications for magnetic resonance coronary angiography. Medical physics, 32(1):255–262, 2005.
- Simonyan and Zisserman (2014) Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Spinei et al. (1998) Adrian Spinei, Denis Pellerin, and Jeanny Hérault. Spatiotemporal energy-based method for velocity estimation. Signal processing, 65(3):347–362, 1998.
- Stuber et al. (1999) Matthias Stuber, René M Botnar, Peter G Danias, Kraig V Kissinger, and Warren J Manning. Submillimeter three-dimensional coronary mr angiography with real-time navigator correction: comparison of navigator locations. Radiology, 212(2):579–587, 1999.
- Suever et al. (2011) Jonathan D Suever, Pierre J Watson, Robert L Eisner, Stamatios Lerakis, Robert E O’Donnell, and John N Oshinski. Time-resolved analysis of coronary vein motion and cross-sectional area. Journal of Magnetic Resonance Imaging, 34(4):811–815, 2011.
- Swets (1979) John A Swets. Roc analysis applied to the evaluation of medical imaging techniques. Investigative radiology, 14(2):109–121, 1979.
- Szegedy et al. (2015) Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015.
- Szeliski and Shum (1996) Richard Szeliski and Heung-Yeung Shum. Motion estimation with quadtree splines. IEEE Transactions on pattern analysis and machine intelligence, 18(12):1199–1210, 1996.
- Tran et al. (2015) Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 4489–4497, 2015.
- Ustun et al. (2007) Ali Ustun, Milind Desai, Khaled Z Abd-Elmoniem, Michael Schar, and Matthias Stuber. Automated identification of minimal myocardial motion for improved image quality on mr angiography at 3 t. American Journal of Roentgenology, 188(3):W283–W290, 2007.
- Wang et al. (2001) YI Wang, Richard Watts, Ian R Mitchell, Thanh D Nguyen, Jeffrey W Bezanson, Geoffrey W Bergman, and Martin R Prince. Coronary mr angiography: selection of acquisition window of minimal cardiac motion with electrocardiography-triggered navigator cardiac motion prescanning—initial results. Radiology, 218(2):580–585, 2001.