FlowReg: Fast Deformable Unsupervised Medical Image Registration using Optical Flow

Sergiu Mocanu1, Alan R. Moody2, April Khademi1,3,4Orcid
1: Ryerson University, Toronto, 2: University of Toronto, 3: St. Michael’s Hospital, Unity Health Network, Toronto, 4: Institute for Biomedical Engineering, Science and Technology (iBEST)
Publication date: 2021/09/03
https://doi.org/10.59275/j.melba.2021-3581
PDF · Code · arXiv

Abstract

In this work we propose FlowReg, a deep learning-based framework that performs unsupervised image registration for neuroimaging applications. The system is composed of two architectures that are trained sequentially: FlowReg-A which affinely corrects for gross differences between moving and fixed volumes in 3D followed by FlowReg-O which performs pixel-wise deformations on a slice-by-slice basis for fine tuning in 2D. FlowReg-A warps the moving volume using gross global parameters to align rotation, scale, shear, and translation to the fixed volume. A correlation loss that encourages global alignment between the moving and the fixed volumes is employed to regress the affine parameters. The deformable network FlowReg-O operates on 2D image slices and is based on the optical flow CNN network that is adapted to neuroimaging with three loss components. The photometric loss minimizes pixel intensity differences, the smoothness loss encourages similar magnitudes between neighbouring vectors, and a correlation loss that is used to maintain the intensity similarity between fixed and moving image slices. The proposed method is compared to four open source registration techniques ANTs, Demons, SE, and Voxelmorph for FLAIR MRI applications. In total, 4643 FLAIR MR imaging volumes (approximately 255,000 image slices) are used from dementia and vascular disease cohorts, acquired from over 60 international centres with varying acquisition parameters. To quantitatively assess the performance of the registration tools, a battery of novel validation metrics are proposed that focus on the structural integrity of tissues, spatial alignment, and intensity similarity. Experimental results show FlowReg (FlowReg-A+O) performs better than iterative-based registration algorithms for intensity and spatial alignment metrics with a Pixelwise Agreement (PWA) of 0.65, correlation coefficient (R) of 0.80, and Mutual Information (MI) of 0.29. Among the deep learning frameworks evaluated, FlowReg-A or FlowReg-A+O provided the highest performance over all but one of the metrics. Results show that FlowReg is able to obtain high intensity and spatial similarity between the moving and the fixed volumes while maintaining the shape and structure of anatomy and pathology.

Keywords

registration validation · flair-mri · neuroimaging · cnns · deep learning · unsupervised · registration · flowreg

Bibtex @article{melba:2021:015:mocanu, title = "FlowReg: Fast Deformable Unsupervised Medical Image Registration using Optical Flow", author = "Mocanu, Sergiu and Moody, Alan R. and Khademi, April", journal = "Machine Learning for Biomedical Imaging", volume = "1", issue = "September 2021 issue", year = "2021", pages = "1--40", issn = "2766-905X", doi = "https://doi.org/10.59275/j.melba.2021-3581", url = "https://melba-journal.org/2021:015" }
RISTY - JOUR AU - Mocanu, Sergiu AU - Moody, Alan R. AU - Khademi, April PY - 2021 TI - FlowReg: Fast Deformable Unsupervised Medical Image Registration using Optical Flow T2 - Machine Learning for Biomedical Imaging VL - 1 IS - September 2021 issue SP - 1 EP - 40 SN - 2766-905X DO - https://doi.org/10.59275/j.melba.2021-3581 UR - https://melba-journal.org/2021:015 ER -

2021:015 cover

Disclaimer: the following html version has been automatically generated and the PDF remains the reference version. Feedback can be sent directly to publishing-editor@melba-journal.org

1 Introduction

Magnetic resonance imaging (MRI) offers non-invasive visualization of soft tissue that is ideal for imaging the brain. The etiology and pathogenesis of neurodegeneration, and the effects of treatment options have been heavily investigated in T1, T2, Proton Density (PD), Diffusion Weighted (DW), and Fluid-Attenuated Inversion Recovery (FLAIR) MR sequences (Udaka et al., 2002)(Hunt et al., 1989)(Sarbu et al., 2016)(Trip and Miller, 2005)(Guimiot et al., 2008) for dementia (Oppedal et al., 2015) and Alzheimer’s Disease (AD) (Kobayashi et al., 2002). As cerebrovascular disease (CVD) has been shown to be a leading cause of dementia, there is growing interest into examining cerebrovascular risk factors in the brain using neuroimaging. CVD markers, such as white matter lesions (WML), predict cognitive decline, dementia, stroke, death, and WML progression increases these risks (Debette and Markus, 2010), (Alber et al., 2019). Research into the vascular contributions to dementia and neurodegeneration could be valuable for developing new therapies (Frey et al., 2019), (Alber et al., 2019), (Griffanti et al., 2018), (Malloy et al., 2007). FLAIR MRI is preferred for WML analysis (Wardlaw et al., 2013) (Badji and Westman, 2020) since the usual T2 high signal from cerebrospinal fluid (CSF) is suppressed, highlighting the white matter disease (WMD) high signal. This is due to increased water content secondary to ischemia and demyelination and much more robustly seen in FLAIR than with T1 and T2 sequences (Lao et al., 2008), (Khademi et al., 2011), (Kim et al., 2008), Piguet et al. (2005).
One family of algorithms heavily relied upon in neuroimaging research is image registration, which is the process of aligning two images (one fixed and one moving) so they are in the same geometric space. Structures and changes between two images can be directly compared when images are registered to align longitudinal scans of the same patient to assess disease change over time (Csapo et al., 2012), (El-Gamal et al., 2016), map patient images to an atlas for template-based segmentation (Iglesias and Sabuncu, 2015), (Phellan et al., 2014), or to correct for artifacts such as patient motion and orientation (Mani and Arivazhagan, 2013). As medical images are composed of highly relevant anatomical and pathological structures such as brain tissue, ventricles, and WMLs, it is important that the shape and relative size of each structure is maintained in the registered output. MR images are highly complex so obtaining correspondence while maintaining structural and anatomical integrity presents a challenging task.
Traditionally, the process of registration, or aligning two images has been framed as an optimization problem, which searches for the transform T𝑇T between a moving (Im)subscript𝐼𝑚(I_{m}) and a fixed (If)subscript𝐼𝑓(I_{f}) image by optimizing some similarity criteria between the fixed and moving images T=argmaxT𝒞(If,T(Im))𝑇subscript𝑇𝒞subscript𝐼𝑓𝑇subscript𝐼𝑚T=\arg\max_{T}\mathcal{C}\left(I_{f},T\left(I_{m}\right)\right). This optimization can be calculated via gradient descent and ends when maximum similarity is found or a maximum number of iterations is obtained. The similarity between the fixed and (transformed) moving images is calculated via a cost function 𝒞𝒞\mathcal{C}, such as mutual information (MI), cross-correlation (CC), and mean-squared error (MSE) (Maes et al., 1997) (Avants et al., 2008). Registrations can be done globally via affine transformation (translation, rotation, shear, and scale) or on a per-pixel level through the use of non-uniform deformation fields (each pixel in the moving image has a target movement vector).
Registration algorithms that involve an iterative-based approach are computationally expensive and any calculated characteristics are not saved after an intensive computational procedure; the transformation parameters are discarded and not used for the next pair of images. In large multi-centre datasets this can create large computation times or non-optimal transformations. To overcome the non-transferable nature of traditional image registration algorithms, machine learning models that learn transformation parameters between images have been gaining interest (Cao et al., 2017) (Sokooti et al., 2017) (Balakrishnan et al., 2018).
Recently, several convolutional neural network-based (CNN) medical image registration algorithms have emerged to address the non-transferable nature, lengthy execution and high computation cost of the classic iterative-based approaches (Balakrishnan et al., 2018; Uzunova et al., 2017). In 2017, researchers adapted an optical flow CNN model, FlowNet (Dosovitskiy et al., 2015), to compute the deformation field between temporally spaced images (Uzunova et al., 2017). Although promising, this approach required ground truth deformation fields during training, which is intractable for large clinical datasets. To overcome these challenges, Voxelmorph was developed as a completely unsupervised CNN-based registration scheme that learns a transformation without labeled deformation fields (Jaderberg et al., 2015). Further work by Fan et. al Fan et al. (2018) has shown that Generative Adversarial Networks (GAN) can be used to generate deformation fields. These fields are then used to warp the moving image until the discriminator is unable to distinguish between the registered and fixed image. Others have suggested a sequential affine and deformable 3D network for brain MRI registration Zhu et al. (2020). In another work, Zhao et al. (2019) proposed a fully unsupervised method based on CNNs that includes cascaded affine and deformable networks to perform alignment in 3D in one framework. The proposed method is inspired by these pioneering works but instead an affine alignment is performed in 3D first, followed by a 2D fine-tuning on a slice-by-slice basis.
Global differences such as head angle or brain size can vary significantly between patients and these global differences are likely to be mainly realized in 3D. Additionally, there are local and fine anatomical differences that are more visible on a per slice basis. Therefore, to get maximal alignment between neuroimages, both global differences in 3D and local differences in 2D should be addressed. Other design considerations include dependence on ground truth data which is impractical to obtain for large datasets. Lastly, and importantly, registration algorithms must maintain the structural integrity of important objects such as WML and ventricles.
To this end, this paper proposes a CNN-based registration method called FlowReg: Fast Deformable Unsupervised Image Registration using Optical Flow that addresses these design considerations in a unified framework. FlowReg is composed of an affine network FlowReg-A for alignment of gross head differences in 3D and a secondary deformable registration network FlowReg-O that is based on the optical flow CNNs (Ilg et al., 2017) (Yu et al., 2016) for fine movements of pixels in 2D, thus managing both global and local differences at the same time. In contrast to previous works that perform affine and deformable registration strictly in 3D, we postulate that performing affine registration in 3D followed by 2D refinement will result in higher quality registrations for neuroimages. FlowReg is fully unsupervised, and ground truth deformations are not required. The loss functions are modified to ensure optimized performance for neuroimaging applications. Lastly, this framework does not require preprocessing such as brain extraction and is applied to the whole head of neuroimaging scans.
As an additional contribution, a battery of novel validation metrics are proposed to quantify registration performance from a clinically salient perspective. Validation of registration performance is not a simple task and many methods have employed manually placed landmarks (Balakrishnan et al., 2018), (Uzunova et al., 2017), (de Vos et al., 2019). In medical image registration, it is of interest to maintain the structural integrity of anatomy and pathology in the moving images, while obtaining maximum alignment with the fixed volume. To measure this, three groups of metrics are proposed: structural integrity, spatial alignment, and intensity similarity. Structural (tissue) integrity is measured via volumetric and structural analysis with the proportional volume (PV), volume ratios ΔVΔ𝑉\Delta V and surface-surface distance (SSD) metrics. Spatial alignment is measured via pixelwise agreement (PWA), head angle (HA) and dice similarity coefficient (DSC). Intensity similarity is measured using traditional metrics: mutual information (MI) and correlation coefficient (Pearson’s R) as well as one additional new one, the mean-intensity difference (MID).
The performance of FlowReg is compared to four established and openly available image registration methods: Demons (Thirion, 1995, 1998; Pennec et al., 1999), Elastix (Klein et al., 2010), ANTs (Avants et al., 2008), and Voxelmorph (Balakrishnan et al., 2018). Three are traditional non-learning iterative based registration methods while Voxelmorph is a CNN-based registration tool. Performance is measured over a large and diverse multi-institutional dataset collected from over 60 imaging centres world wide of subjects with dementia (ADNI) (Mueller et al., 2005) and vascular disease (CAIN) (Tardif et al., 2013). There are roughly 270,000 images and 4900 imaging volumes in these datasets with a range of imaging acquisition parameters. The rest of the paper is structured as follows: Section 2.1 describes the FlowReg architecture and Section 2.2 outlines the validation metrics. The data used, experimental setup, and results are shown in Section 3 followed by the discussions and conclusions in Section 4 and Section 5.

2 Methods

In this section, we introduce FlowReg, Fast Deformable Unsupervised Image Registration using Optical Flow, with focus on neuroimaging applications. Given a moving image volume denoted by M(x,y,z)𝑀𝑥𝑦𝑧M(x,y,z), where M𝑀M is the intensity at the (x,y,z)Z3𝑥𝑦𝑧superscript𝑍3(x,y,z)\in Z^{3} voxel, and the fixed image volume F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z) the system automatically learns the registration parameters for many image pairs, and uses that knowledge to predict the transformation T𝑇T for new testing images. The section closes with a battery of novel registration validation metrics focused on structural integrity, intensity and spatial alignment.

Refer to caption
Figure 1: FlowReg consists of affine (FlowReg-A) and optical flow (FlowReg-O) networks.

2.1 FlowReg

FlowReg, is based exclusively on CNN architectures, and the alignment is fully unsupervised, meaning that registration parameters are regressed without ground truth knowledge. Registration with FlowReg is completed with a two phase approach shown in Fig. 1. FlowReg-A is a 3D affine registration network that corrects for global differences and FlowReg-O refines the affine registration results on a per slice basis through a 2D optical flow-based registration method from the video processing field (Dosovitskiy et al., 2015), (Ilg et al., 2017), (Garg et al., 2016), (Yu et al., 2016). The affine component is trained first and the affine parameters are obtained for each volume. Once all the volumes are registered using the affine components, FlowReg-O is trained to obtain the deformation fields. The held out test set is used to test the full FlowReg pipeline end-to-end.

2.1.1 FlowReg-A: Affine Registration in 3D

Refer to caption
Figure 2: Flowreg-A model structure. A pair of 3D input volumes are concatenated, each yellow box represents the output of 3D convolutional layers, the numbers at the bottom of the box are the number of feature maps generated by each convolutional kernel. The last layer (purple) is a fully connected layer with 12 nodes and a linear activation function.

The proposed affine model FlowReg-A warps the moving volume using gross global parameters to align head rotation, scale, shear, and translation to the fixed volume. This is beneficial when imaging the brain in 3D, since each the orientation of subjects’ heads can vary. Additionally, images are often of different physical dimensions depending on the scanner type and parameters used for acquisition. To normalize these global differences, we propose a completely unsupervised CNN-based 3D affine registration method (i.e. volume registration), where the transformation parameters are learned.
The CNN network used to regress the affine matrix parameters is shown in Figure 2 and described in Table 6. The network architecture and hyperparameter selection is similar to the encoder arm of the FlowNet-Simple network, with changes made to the input size and the number of 3D convolution kernels. The affine model is comprised of six convolutional layers and one fully-connected layer which is used to regress the flattened version of the three-dimensional affine matrix, A𝐴A:

A=[abcdefghijkl],𝐴matrix𝑎𝑏𝑐𝑑𝑒𝑓𝑔𝑖𝑗𝑘𝑙A=\begin{bmatrix}a&b&c&d\\ e&f&g&h\\ i&j&k&l\end{bmatrix},(1)

where A𝐴A contains the rotation, scale, and translation parameters. Given this affine transformation matrix, the original image volume may be transformed by Mw(x,y,z)=A×M(x,y,z)subscript𝑀𝑤𝑥𝑦𝑧𝐴𝑀𝑥𝑦𝑧M_{w}(x,y,z)=A\times M(x,y,z).
To solve for the transformation parameters A𝐴A, a correlation loss was used to encourage an overall alignment of the mean intensities between the moving and the fixed volumes:

corr3D(F,Mw)=1i=1N(FiF¯)(MwiMw¯)i=1N(FiF¯)2i=1N(MwiMw¯)2subscriptsubscriptcorr3𝐷𝐹subscript𝑀𝑤1superscriptsubscript𝑖1𝑁subscript𝐹𝑖¯𝐹subscript𝑀subscript𝑤𝑖¯subscript𝑀𝑤superscriptsubscript𝑖1𝑁superscriptsubscript𝐹𝑖¯𝐹2superscriptsubscript𝑖1𝑁superscriptsubscript𝑀subscript𝑤𝑖¯subscript𝑀𝑤2\begin{split}\ell_{\operatorname{corr}_{3D}}\left(F,M_{w}\right)=1-\frac{\sum_{i=1}^{N}\left(F_{i}-\overline{F}\right)\left(M_{w_{i}}-\overline{M_{w}}\right)}{\sqrt{\sum_{i=1}^{N}\left(F_{i}-\overline{F}\right)^{2}}\sqrt{\sum_{i=1}^{N}\left(M_{w_{i}}-\overline{M_{w}}\right)^{2}}}\end{split}(2)

where F𝐹F is the fixed volume, Mwsubscript𝑀𝑤M_{w} is the moving volume warped with the calculated matrix, N𝑁N is the number of voxels in the volume, Fisubscript𝐹𝑖F_{i} is the ithsuperscript𝑖𝑡i^{th} element in F𝐹F, Mwisubscript𝑀subscript𝑤𝑖M_{w_{i}} is the ithsuperscript𝑖𝑡i^{th} element in Mwsubscript𝑀𝑤M_{w}, and F¯¯𝐹\overline{F}, Mw¯¯subscript𝑀𝑤\overline{M_{w}} are the mean intensities of the fixed and moving volumes, respectively. Using the correlation loss function, the parameters are selected during training that ensures global alignment between the fixed and moving image volumes.

2.1.2 FlowReg-O: Optical Flow-based Registration in 2D

The optical flow component of the registration framework, FlowReg-O, is used to perform fine-tuning of the affine registration results on a slice-by-slice basis (i.e. in 2D). FlowReg-O is an adapted version of the original optical flow FlowNet architecture, used in video processing frameworks. Optical flow is a vector field that quantifies the apparent displacement of a pixel between two temporally separated images. A video is composed of a number of frames at a certain frame-rate per second, Fs𝐹𝑠\frac{F}{s} and the optical flow measures the motion between objects and pixels across frames and can be used to calculate the velocity of objects in a scene (Horn and Schunck, 1981). For the task of medical image registration, instead of aligning neighbouring frames, we will be aligning moving and fixed images. The same principles as the original optical flow framework are adopted here, where the displacement vector is found and used to warp pixels between moving and fixed images in 2D.
The proposed deformable registration network is identical to the FlowNet Simple architecture in terms of the convolutional layers and hyperparameter selection, but adapted to grayscale medical image sizes and content. See Fig 3 for the FlowReg-O architecture, which is also described in Table 7. The original FlowNet architecture was implemented on the synthetically generated dataset ”Flying Chairs” with known optical flow values for ground truths, thus dissimilarities are calculated as a simple endpoint-error (EPE) (Dosovitskiy et al., 2015; Ilg et al., 2017). Since then, unsupervised methods have been proposed to train optical flow regressor networks based on a loss that compares a warped image using the regressed flow and its corresponding target image with the use of Spatial Transformer Networks (Jaderberg et al., 2015) (Yu et al., 2016). In medical imaging, the optical flow ground truths are impossible to obtain, so the same unsupervised principle is adopted here, but the losses have been conformed for medical images. In addition to photometric and smoothness loss components which were used in the original work (Yu et al., 2016), FlowReg-O utilizes an additional correlation loss term, with each loss encouraging overall similarity between the fixed and moving images while maintaining small 2D movements in the displacement field.

Refer to caption
Figure 3: Flowreg-O model structure. A pair of 2D input images are concatenated first followed by 2D convolutional layers (yellow). Numbers below layers correspond to the number of feature maps. Skip connections between the upscaling decoder (blue) arm are concatenated (gray boxes) with the output of the encoder layers. The flow at seven resolutions are labeled with flow above the corresponding outputs. I is the input image resolution 256×256256256256\times 256.

The total loss function is a summation of three components: photometric loss photosubscript𝑝𝑜𝑡𝑜\ell_{photo} to keep photometric similarity through the Charbonnier function, the smoothness loss smoothsubscript𝑠𝑚𝑜𝑜𝑡\ell_{smooth} which ensures the deformation field is smooth (and limits sharp discontinuities in the vector field), and the correlation loss corrsubscript𝑐𝑜𝑟𝑟\ell_{corr}, which was added to enforce global similarity in the intensities between the moving and fixed images. The total loss for FlowReg-O is

(𝐮,𝐯;F(x,y),Mw(x,y))=γphoto(𝐮,𝐯;F(x,y),Mw(x,y))+ζcorr(𝐮,𝐯;F(x,y),Mw(x,y))+λsmooth(𝐮,𝐯)𝐮𝐯𝐹𝑥𝑦subscript𝑀𝑤𝑥𝑦𝛾subscript𝑝𝑜𝑡𝑜𝐮𝐯𝐹𝑥𝑦subscript𝑀𝑤𝑥𝑦𝜁subscript𝑐𝑜𝑟𝑟𝐮𝐯𝐹𝑥𝑦subscript𝑀𝑤𝑥𝑦𝜆subscript𝑠𝑚𝑜𝑜𝑡𝐮𝐯\begin{split}\mathcal{L}(\mathbf{u,v};F(x,y),M_{w}(x,y))=&\gamma\cdot\ell_{photo}(\mathbf{u,v};F(x,y),M_{w}(x,y))+\\ &\zeta\cdot\ell_{corr}(\mathbf{u,v};F(x,y),M_{w}(x,y))+\lambda\cdot\ell_{smooth}(\mathbf{u,v})\end{split}(3)

where 𝐮,𝐯𝐮𝐯\mathbf{u,v} are the estimated horizontal and vertical vector fields, F(x,y)𝐹𝑥𝑦F(x,y) is the fixed image, Mw(x,y)=M(x+u,y+v)subscript𝑀𝑤𝑥𝑦𝑀𝑥𝑢𝑦𝑣M_{w}(x,y)=M(x+u,y+v) is the warped moving image, and γ𝛾\gamma, ζ𝜁\zeta, and λ𝜆\lambda are weighting hyper-parameters.
The photometric loss, adopted from Yu et al. (2016), is the difference between intensities of the fixed image and the warped moving image and evaluates to what degree the predicted optical flow is able to warp the moving image to match the intensities of the fixed image on a pixel-by-pixel basis:

photo(𝐮,𝐯;F(x,y),Mw(x,y))=1Ni,jρ(F(i,j)Mw(i,j)))\begin{split}\ell_{photo}(\mathbf{u,v};&F(x,y),M_{w}(x,y))=\frac{1}{N}\sum_{i,j}\rho(F(i,j)-M_{w}(i,j)))\end{split}(4)

where N𝑁N is the number of pixels and ρ𝜌\rho is the Charbonnier penalty function which is used to reduce contributions of outliers. The Charbonnier penalty is defined by:

ρ(x)=(x2+ϵ2)α𝜌𝑥superscriptsuperscript𝑥2superscriptitalic-ϵ2𝛼\rho(x)=(x^{2}+\epsilon^{2})^{\alpha}(5)

where x=(FMw)𝑥𝐹subscript𝑀𝑤x=(F-M_{w}), ϵitalic-ϵ\epsilon is a small value (0.0010.0010.001), and α𝛼\alpha regulates the difference in intensities between the moving and fixed images such that large differences can be damped to keep the magnitudes of the deformation vectors within a reasonable limit. The effect of the α𝛼\alpha parameter on the Charbonnier function is shown in Fig. 4. For smaller α𝛼\alpha values, the Charbonnier function suppresses the output magnitude which is used to regress finer movements in the displacement field.

Refer to caption
Figure 4: Charbonnier function (Eqn. 5) for α=𝛼absent\alpha= 0.50.50.5, 0.40.40.4, 0.30.30.3, and 0.20.20.2.

The smoothness loss is implemented to regularize the flow field. The loss component encourages small differences between neighbouring flow vectors in the height and width directions and is defined by

smooth(𝐮,𝐯)=jHiW[ρ(ui,jui+1,j)+ρ(ui,jui,j+1)+ρ(vi,jvi+1,j)+ρ(vi,jvi,j+1)],subscript𝑠𝑚𝑜𝑜𝑡𝐮𝐯superscriptsubscript𝑗𝐻superscriptsubscript𝑖𝑊delimited-[]𝜌subscript𝑢𝑖𝑗subscript𝑢𝑖1𝑗𝜌subscript𝑢𝑖𝑗subscript𝑢𝑖𝑗1𝜌subscript𝑣𝑖𝑗subscript𝑣𝑖1𝑗𝜌subscript𝑣𝑖𝑗subscript𝑣𝑖𝑗1\begin{split}\ell_{smooth}(\mathbf{u,v})=\sum_{j}^{H}\sum_{i}^{W}[&\rho(u_{i,j}-u_{i+1,j})+\rho(u_{i,j}-u_{i,j+1})+\\ &\rho(v_{i,j}-v_{i+1,j})+\rho(v_{i,j}-v_{i,j+1})],\end{split}(6)

where H𝐻H and W𝑊W are the number of rows and columns in the image and ui,jsubscript𝑢𝑖𝑗u_{i,j} and vi,jsubscript𝑣𝑖𝑗v_{i,j} are displacement vectors for pixel (i,j)𝑖𝑗(i,j) and ρ𝜌\rho is the Charbonnier function. This loss measures the difference between local displacement vectors and minimizes the chances of optimizing to a large displacement between neighbouring pixels.
Lastly, we added an additional correlation loss component to encourage an overall alignment of the mean intensities between the moving and the fixed 2D images (similar to FlowReg-A), as in:

corr2D(F,Mw)=1i=1N(FiF¯)(MwiMw¯)i=1N(FiF¯)2i=1N(MwiMw¯)2,subscriptsubscriptcorr2𝐷𝐹subscript𝑀𝑤1superscriptsubscript𝑖1𝑁subscript𝐹𝑖¯𝐹subscript𝑀subscript𝑤𝑖¯subscript𝑀𝑤superscriptsubscript𝑖1𝑁superscriptsubscript𝐹𝑖¯𝐹2superscriptsubscript𝑖1𝑁superscriptsubscript𝑀subscript𝑤𝑖¯subscript𝑀𝑤2\begin{split}\ell_{\operatorname{corr}_{2D}}\left(F,M_{w}\right)=1-\frac{\sum_{i=1}^{N}\left(F_{i}-\overline{F}\right)\left(M_{w_{i}}-\overline{M_{w}}\right)}{\sqrt{\sum_{i=1}^{N}\left(F_{i}-\overline{F}\right)^{2}}\sqrt{\sum_{i=1}^{N}\left(M_{w_{i}}-\overline{M_{w}}\right)^{2}}},\end{split}(7)

where F𝐹F is the fixed image, Mwsubscript𝑀𝑤M_{w} is the moving image warped with the calculated flow, N𝑁N is the number of pixels in the image, Fisubscript𝐹𝑖F_{i} is the ithsuperscript𝑖𝑡i^{th} element in F𝐹F, Mwisubscript𝑀subscript𝑤𝑖M_{w_{i}} is the ithsuperscript𝑖𝑡i^{th} element in Mwsubscript𝑀𝑤M_{w}, and F¯¯𝐹\overline{F}, Mw¯¯subscript𝑀𝑤\overline{M_{w}} are the mean intensities of the fixed and moving images, respectively. A summary of the loss components and how they are implemented for the FlowReg-O network is is shown in Fig. 5.

Refer to caption
Figure 5: Overview of loss components for the deformable registration network, FlowReg-O.

2.2 Validation Metrics

There are three categories of metrics that are proposed that each measure a particular aspect of registration accuracy that have clinical relevance, including: structural integrity, spatial alignment, and intensity similarity. The validation measures for each category are shown in Table 1 and the flow diagram to compute each of the metrics is shown in Figure 21. In total, there are nine metrics computed, and each metric is summarized in Table 1. A more detailed explanation of how to compute each metric is available in Appendix Appendix B.

Table 1: Summary of validation metrics. M(x,y,z)𝑀𝑥𝑦𝑧M(x,y,z), F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z) and A(x,y,z)𝐴𝑥𝑦𝑧A(x,y,z) are the moving, fixed, and generated atlas volumes respectively, with spatial coordinates (x,y,z)𝑥𝑦𝑧(x,y,z). vols𝑣𝑜subscript𝑙𝑠vol_{s} is the volume of a structure s𝑠s, N𝑁N is the number of images, bMsubscript𝑏𝑀b_{M} and bFsubscript𝑏𝐹b_{F} are the brain masks of the moving and fixed, and p𝑝p, q𝑞q and r𝑟r are distances between pixels.
MetricEquation
Structural IntegrityProportional VolumePV=volsvolb𝑃𝑉𝑣𝑜subscript𝑙𝑠𝑣𝑜subscript𝑙𝑏PV=\frac{vol_{s}}{vol_{b}}
Volume RatioΔVs=volorigvolregΔsubscript𝑉𝑠𝑣𝑜subscript𝑙𝑜𝑟𝑖𝑔𝑣𝑜subscript𝑙𝑟𝑒𝑔\Delta V_{s}=\frac{vol_{orig}}{vol_{reg}}
Surface-Surface-DistanceSSD = 1N[i=1Nargmin(xs,ys,zs)S(p+q+r)]1𝑁delimited-[]superscriptsubscript𝑖1𝑁subscriptargminsubscript𝑥𝑠subscript𝑦𝑠subscript𝑧𝑠𝑆𝑝𝑞𝑟\frac{1}{N}\left[\sum_{i=1}^{N}\mathrm{argmin}_{(x_{s},y_{s},z_{s})\in S}\left(\sqrt{p+q+r}\right)\right]
Spatial AlignmentHead Angleθ(°\theta(\degree) from midsagital plane
Pixel-Wise AgreementPWA(z)=1Nj1NxyjJ(x,y)(Mj(x,y,z)F(x,y,z))2𝑃𝑊𝐴𝑧1subscript𝑁𝑗1subscript𝑁𝑥𝑦subscript𝑗𝐽subscript𝑥𝑦superscriptsubscript𝑀𝑗𝑥𝑦𝑧𝐹𝑥𝑦𝑧2PWA(z)=\frac{1}{N_{j}}\frac{1}{N_{xy}}\sum_{j\in J}\sum_{(x,y)}(M_{j}(x,y,z)-F(x,y,z))^{2}
Brain-DSCDSC=2|bMbF||bM|+|bF|𝐷𝑆𝐶2subscript𝑏𝑀subscript𝑏𝐹subscript𝑏𝑀subscript𝑏𝐹DSC=\frac{2|b_{M}\cap b_{F}|}{|b_{M}|+|b_{F}|}
Intensity SimilarityMutual InformationI(M;F)=fFmMp(M,F)(m,f)log(p(M,F)(m,f)pM(m)pF(f))𝐼𝑀𝐹subscript𝑓𝐹subscript𝑚𝑀subscript𝑝𝑀𝐹𝑚𝑓subscript𝑝𝑀𝐹𝑚𝑓subscript𝑝𝑀𝑚subscript𝑝𝐹𝑓I(M;F)=\sum_{f\in F}\sum_{m\in M}p_{(M,F)}(m,f)\log\left(\frac{p_{(M,F)}(m,f)}{p_{M}(m)p_{F}(f)}\right)
Correlation Coefficientr(M,F)=i=1n(MiM¯)(FiF¯)i=1n(MiM¯)2i=1n(FiF¯)2𝑟𝑀𝐹superscriptsubscript𝑖1𝑛subscript𝑀𝑖¯𝑀subscript𝐹𝑖¯𝐹superscriptsubscript𝑖1𝑛superscriptsubscript𝑀𝑖¯𝑀2superscriptsubscript𝑖1𝑛superscriptsubscript𝐹𝑖¯𝐹2r(M,F)=\frac{\sum_{i=1}^{n}(M_{i}-\overline{M})(F_{i}-\overline{F})}{\sqrt{\sum_{i=1}^{n}(M_{i}-\overline{M})^{2}}\sqrt{\sum_{i=1}^{n}(F_{i}-\overline{F})^{2}}}
Mean Absolute Intensity DifferenceMAID(A,F)=1Nii|pA(i)pF(i)|𝑀𝐴𝐼𝐷𝐴𝐹1subscript𝑁𝑖subscript𝑖subscript𝑝𝐴𝑖subscript𝑝𝐹𝑖MAID(A,F)=\frac{1}{N_{i}}\sum_{i}|p_{A}(i)-p_{F}(i)|

3 Experiments and Results

In this section the data and the experimental results are detailed.

3.1 Data

The performance of FlowReg is evaluated in a large and diverse FLAIR MRI data repository. Over 270,000 FLAIR MR images were retrieved from two datasets which comprises roughly 5000 imaging volumes from over 60 international imaging centres. This comprises one of the largest FLAIR MRI datasets in the literature that is being processed automatically to the best of our knowledge. The first dataset is from the Canadian Atherosclerosis Imaging Network (CAIN) Tardif et al. (2013) and is a pan-Canadian study of vascular disease. The second dataset is from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (Mueller et al., 2005) which is an international study for Alzheimer’s and related dementia pathologies. The acquisition and demographics information is shown in the Appendix in Table 5 and 4. Based on image quality metrics supplied with the ADNI database, scans with large distortions, motion artifacts, or missing slices, were excluded from the study. In total there were 310 volumes excluded based on this criteria. For training, validation and testing an 80/10/10 data split was employed and volumes were randomly sampled from CAIN and ADNI. The resulting splits were 3714 training volumes (204,270 images), 465 validation volumes (25,575 images) and 464 test volumes (25,520 images). See Figure 19 for example slices from several volumes of the ADNI and CAIN test set, exhibiting wide variability in intensity, contrast, anatomy and pathology.
To measure the validation metrics proposed in Section 2.2, two sets of images are required. Firstly, all volumes in the test set (464 volumes) are used to compute the intensity and spatial alignment metrics: HA, PWA, DSC, MI, Corr, MAID. Second, to compute the structural integrity metrics (PV, volume ratio and SSD metrics), binary segmentation masks of the structures of interest are required and 50 CAIN and 20 ADNI volumes were sampled randomly from the test set for this task. For the objects of interest, ventricles and WML objects are selected since they represent clinically relevant structures that characterize neurodegeneration and aging. Manual segmentations for the ventricular and WML regions were generated by a medical student trained by a radiologist. These objects are used to examine the structural integrity before and after registration. To generate brain tissue masks, the automated brain extraction method from (Khademi et al., 2020) is utilized to segment cerebral tissue in FLAIR MRI. The atlas used is this work as the fixed volume F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z) has the dimensions of 256×256×5525625655256\times 256\times 55 and is further detailed in (Winkler et al., ). The moving image volumes M(x,y,z)𝑀𝑥𝑦𝑧M(x,y,z) comes from the image datasets described in Table 5 and no pre-processing was done to any of the volumes other than resizing M(x,y,z)𝑀𝑥𝑦𝑧M(x,y,z) to the atlas resolution (256×256×5525625655256\times 256\times 55) through bilinear interpolation.

3.2 Experimental Setup

FlowReg-A and FlowReg-O models were trained sequentially. First, FlowReg-A is trained using 3D volume pairs of M(x,y,z)𝑀𝑥𝑦𝑧M(x,y,z) and F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z) and using the optimized model parameters, the training volume set is affinely registered to the atlas using the found transformation (Dalca et al., 2018). Subsequently, the globally aligned volumes are used to train FlowReg-O, on a slice-by-slice basis, using paired moving M(x,y)𝑀𝑥𝑦M(x,y) and fixed F(x,y)𝐹𝑥𝑦F(x,y) images to obtain the fine-tuning deformation fields in 2D. For FlowReg-A and FlowReg-O the Adam optimizer was used (Kingma and Ba, 2014) with a β1=0.9subscript𝛽10.9\beta_{1}=0.9 and β2=0.999subscript𝛽20.999\beta_{2}=0.999, and a learning rate of lr=104𝑙𝑟superscript104lr=10^{-4}. FlowReg-A training was computed for 100 epochs using a batch size of four pairs of volumes from M(x,y,z)𝑀𝑥𝑦𝑧M(x,y,z) and F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z). FlowReg-O was trained using the globally aligned 2D images for 10 epochs using a batch size of 64 image pairs (2D) at seven two-dimensional resolutions: 256×256256256256\times 256, 128×128128128128\times 128, 64×64646464\times 64, 32×32323232\times 32, 16×16161616\times 16, 8×8888\times 8, and 4×4444\times 4. The loss hyper-parameters were set as γ=1𝛾1\gamma=1, ζ=1𝜁1\zeta=1, and λ=0.5𝜆0.5\lambda=0.5 as per the original optical flow work Yu et al. (2016). During the testing phase, the deformation field in the last layer of the decoding arm is used to warp the moving test images as this resolution provides per pixel movements and is generated at the same resolution of the input image. Using the trained models for the complete FlowReg pipeline, the testing performance is compared to that of VoxelMorph, ANTs, Demons and SimpleElastix.
Training for CNN models was performed using a NVIDIA GTX 1080Ti, with Keras (Chollet et al., 2015) as a backend to Tensorflow (Abadi et al., 2015) for FlowReg-A, FlowReg-O, and Voxelmorph. ANTs registration was performed in Python using the Symmetric Normalization (SyN) with default values (Avants et al., 2008). Demons algorithm was implemented in Python using SimpleITK (Johnson et al., 2013). Similarly, the Pythonic implementation of Elastix (Klein et al., 2010) was employed for SimpleElastix (Marstal et al., 2016) as an add-on to SimpleITK. As a preprocessing step, prior to running Voxelmorph, the volumes were first affinely registered using ANTs-Affine. Voxelmorph was then trained for 37,000 iterations (to avoid observed overfitting) using the same training dataset utilized for training FlowReg. Voxelmorph performs 3D convolutions, thus resizing was necessary to keep the output of the decoder the same size as the pooling and upsampling layers. Both the atlas and the moving volumes were resized to 256×256×6425625664256\times 256\times 64. This resolution was chosen to be a binary multiple of 2nsuperscript2𝑛2^{n} to ensure that the decoder arm of the U-net style network is able to rebuild the learned feature maps to the original input size and warp the volumes accordingly.
To validate the registration methods, the metrics described in Section 2.2 and shown in Fig. 21 were used. The structural integrity validation metrics (PV, volume ratio and SSD) used binary masks for the corresponding brain, ventricle, and WML masks and the resultant deformation field or transformation from each registration method. The PV calculation includes PV from the ventricles and WMLs with respect to the whole brain volume. The SSD is computed between the ventricle surface and the brain surface only; WML are not included for this metric since small lesion loads and irregular WML boundaries can create large differences in the SSD which may not be related to overall integrity. Finally, for all registration methods and all test data, the large-scale metrics are computed: HA, PWA, DSC, MI, R and MAID were calculated between the registered volumes and atlas for all testing volumes. Warped masks were binarized with a threshold of 0.1 so as to avoid the non-integer values obtained from interpolation.

3.3 Results

In this section the experimental results will be presented. First, the effect of α𝛼\alpha on the Charbonnier penalty function (Equation 5) for the optical flow photometric and smooth loss functions in FlowReg-O was analyzed since this parameter plays a major role in reducing over-fitting and overtly large deformations. Using the held out validation set, the value for α𝛼\alpha is selected based on the effect this parameter has on the images, which includes visual analysis of the distortion on the images, the magnitude of the optical flow field as well as the average flow magnitude. The model with the appropriate α𝛼\alpha is used to train the final model for the remainder experiments.
The registration performance of FlowReg-O is studied for α𝛼\alpha from 0.100.100.10 to 0.450.450.45 in 0.050.050.05 increments by training the network, registering images from the holdout set, and visual inspection. See Figure 19 and Figure 20 for images and flow magnitudes for different α𝛼\alpha in Appendix A. At α0.25𝛼0.25\alpha\geq 0.25 values there is more distortion and smearing within the brain, ventricle shapes are warped, and there is distortion of the WMLs. At α0.15𝛼0.15\alpha\leq 0.15 there was little to no pixel movement. These findings are confirmed by computing the average flow magnitude per pixel in Figure 6, which shows low pixel movement for low α𝛼\alpha and large displacements for larger α𝛼\alpha. Based on these findings, the ”sweet-spot” for α𝛼\alpha value lies within 0.20.20.2 and 0.250.250.25 to ensure moderate pixel movement without distortion. To ensure there are no overt movements and to be conservative side, we have selected α=0.2𝛼0.2\alpha=0.2 for FlowReg-O. The overall effect of FlowReg-A and FlowReg-O with α=0.2𝛼0.2\alpha=0.2 are used for the final model.

Refer to caption
Figure 6: The average flow magnitude per pixel for FlowReg-O at various α𝛼\alpha values.
Refer to caption
Figure 7: Registration results for seven slices from a single volume. Top row is the atlas that is used as the fixed volume F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z) and the second row contains the moving volume M(x,y,z)𝑀𝑥𝑦𝑧M(x,y,z). The remaining rows show registration results of ANTs, Demons, SimpleElastix, VoxelMorph, FlowReg-A and FlowReg (FlowReg-A+O).

Using the finalized FlowReg model, test performance was compared to of each of the other registration methods using the proposed validation metrics. All of the testing data (464 volumes) were registered using the models and algorithms described previously. Example registration results for all of the methods are shown in Figure 7. Bottom, top and middle slices were chosen to show the spectrum of slices that need to be accurately warped and transformed from a 3D imaging perspective. In the first row, slices from the fixed (atlas) volume F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z) are shown, followed by the corresponding slices from the moving volume M(x,y,z)𝑀𝑥𝑦𝑧M(x,y,z). The first column contains images with the ocular orbits and cerebellum in both in the moving and fixed. In the middle slices of the volume, the ventricles are visible and some periventricular WMLs as well. The top slice of the fixed volume is the top of the head and is included since it is comprised of small amounts of brain tissue. The remaining rows display the results of registering the moving images M(x,y,z)𝑀𝑥𝑦𝑧M(x,y,z) to the fixed images F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z) for each of the respective tools.
For ANTs registration with the affine and deformable component (SyN) there is good alignment on the middle slices but the top slice has some pixels missing, and the lower slices have the ocular orbits in multiple slices (which are not present in the atlas for these slices) indicating poor alignment in 3D. Demons exhibits large deformations for all three slices and therefore, this tool is not ideal for clinical applications involving FLAIR MRI. SimpleElastix seems to align the images over most spatial locations, except the lower slices as they contain ocular orbits for slices that do not anatomically align with the atlas. Voxelmorph exhibits similar trends with good alignment in middle slices. The lower slices however contain ocular orbits in slices that are not present in the atlas. The remaining rows show results for the proposed work. First the results of only the affine component, FlowReg-A, is displayed. There is excellent alignment in the bottom and middle slices as well as in the top image slices indicating high anatomical alignment with the atlas. When combining FlowReg-A with FlowReg-O in the last row, the overall similarity and alignment with the atlas is improved. The shape of the head is more similar to that of the fixed volume slices, and also the top slice is more anatomically aligned.

Refer to caption
Figure 8: Visual comparison between Voxelmorph and FlowReg-A+O. The top row contains the fixed volume F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z) and the columns display sample registration results over four subjects for each method.

To examine the effect of anatomical alignment between the two CNN methods (VoxelMorph and the proposed FlowReg), see Figure 8. The top row has the fixed volume F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z) and the remaining columns contain sequential registration results for four subjects using VoxelMorph and FlowReg. FlowReg consistently provides a more accurate alignment of the slice data, and is more consistent in the anatomy it is representing across cases. This is especially evident in the bottom slices, where images registered with Flowreg contain ocular orbits only in the correct slices, and in the top slices, where the top of the head is more accurate in size, shape and anatomy as compared to Voxelmorph.
To quantitatively compare performance across the methods, the average of the evaluations metrics from the testing set are shown in Table 2. The results for each category of validation metric will be described next.

Table 2: Average registration performance (structural integrity, spatial alignment and intensity-based metrics). Bold values are best performance across methods while underline is best among deep learning methods. Xsimilar-to-or-equalsabsent𝑋\simeq X indicates a value closest to X is best, \downarrow lowest value is best, \uparrow highest value is best. ΔPVvent×103\Delta PV_{vent}\times 10{{}^{-}3}, ΔPVwml×103Δ𝑃subscript𝑉𝑤𝑚𝑙superscript103\Delta PV_{wml}\times 10^{-3}, MAID×103𝑀𝐴𝐼𝐷superscript103MAID\times 10^{-3}, MAIDzp×103𝑀𝐴𝐼𝐷𝑧𝑝superscript103MAID-zp\times 10^{-3}.
ANTsDemonsSEVMFlowReg-AFlowReg-A+O
StructuralΔVbrain1similar-to-or-equalsΔsubscript𝑉𝑏𝑟𝑎𝑖𝑛1\Delta V_{brain}\simeq 11.281.471.251.011.141.11
ΔVvent1similar-to-or-equalsΔsubscript𝑉𝑣𝑒𝑛𝑡1\Delta V_{vent}\simeq 11.271.311.600.640.960.86
ΔVwml1similar-to-or-equalsΔsubscript𝑉𝑤𝑚𝑙1\Delta V_{wml}\simeq 10.841.010.810.350.670.53
ΔPVvent0similar-to-or-equalsΔ𝑃subscript𝑉𝑣𝑒𝑛𝑡0\Delta PV_{vent}\simeq 00.52-2.807.19-23.11-7.31-11.52
ΔPVwml0similar-to-or-equalsΔ𝑃subscript𝑉𝑤𝑚𝑙0\Delta PV_{wml}\simeq 0-5.17-4.74-5.67-20.17-7.14-11.20
ΔSSD0similar-to-or-equalsΔ𝑆𝑆𝐷0\Delta SSD\simeq 0-4.70-8.66-21.8129.6314.5813.82
SpatialHA-ς𝜍absent\varsigma\downarrow6.215.162.98110.836.893.91
PWA-ΣΣabsent\Sigma\downarrow1.242.401.921.471.150.65
Brain-DSC\uparrow0.880.770.870.840.860.85
IntensityMI\uparrow0.240.130.160.200.250.29
R\uparrow0.640.410.390.600.650.80
MAID0similar-to-or-equalsabsent0\simeq 05.246.265.995.545.085.33
MAID-zp0similar-to-or-equalsabsent0\simeq 00.531.991.351.160.860.84

3.3.1 Structural Integrity

To ensure structures of interest are not distorted and integrity is maintained, the following structural integrity metrics are examined: change in proportional volume (ΔPVΔ𝑃𝑉\Delta PV), the volume ratio (ΔVΔ𝑉\Delta V), and the change in the structural similarity distance (ΔSSDΔ𝑆𝑆𝐷\Delta SSD). The average of the metrics are listed in Table 2 and the corresponding plots are shown in Figures 10, 9, and 11.
PV measures the proportional volumes of WMLs and ventricles before and after each registration. It is quantified as the PV change, ΔPVΔ𝑃𝑉\Delta PV, and the results are shown in Figure: 10 while the average measures are in Table 2. The results nearest the zero line indicate the least change in PV compared to pre-registration and the least deformation. The PV metric shows that for all registration techniques the relative volumes of objects are mostly enlarged after deformation. The only cases where structures were decreased in size were the ventricles for the non-learning based methods ANTs and SimpleElastix for the ventricles. The least amount of distortion as quantified by the PV difference, for both ventricles and WMLs, is seen using the ANTs and Demons registration methods. The largest change in the WML and ventricles is seen in with VoxelMorph. FlowReg-A and FlowReg-O slightly enlarge both ventricles and WML. FlowReg-A has a PV difference for the ventricles of 7.3×1037.3superscript103-7.3\times 10^{-3} and a WML with 7.1×1037.1superscript103-7.1\times 10^{-3}. FlowReg-A+O, the combination of the affine and optical flow steps, shows approximately a 11.5×10311.5superscript103-11.5\times 10^{-3} change in PV for the ventricles and 11.2×10311.2superscript103-11.2\times 10^{-3} change in PV for the WML class. Since the two steps are performed in a sequential manner, the difference between FlowReg-A and FlowReg-A+O would provide the amount of change in PV provided by FlowReg-O, which was found to be 4.2×1034.2superscript1034.2\times 10^{-3} and 4.1×1034.1superscript1034.1\times 10^{-3} for ventricles and WMLs, respectively. These values are closer to that of ANTs registration.
The volume ratio metrics quantify how much the structures of interest have decreased (>1absent1>1) or increased in volume (<1)absent1(<1) after registration. Ideally, structures would remain the same size after registration (ΔVs=1Δsubscript𝑉𝑠1\Delta V_{s}=1). As shown in Table 2 and Figure 9, the volume ratio of the brain is most unchanged through registration with VoxelMorph, followed closely by the proposed work (FlowReg-A and the total pipeline FlowReg-A+O). The ventricles are most similar to the original using FlowReg-A and FlowReg-A+O, where the size of the ventricles were increased slightly. In contrast to traditional registration algorithms where ventricles mostly decrease in size, both FlowReg and VoxelMorph increase the size of the ventricles, where the size of the ventricles in VoxelMorph have approximately doubled. In terms of WML, Demons was the most favourable as the WML volume was almost unchanged after registration. This may be due to the fact this registration scheme seemed to mainly warp the boundary surrounding the head. Compared to the deep learning methods, in terms of WML enlargement, it seems that the traditional registration methods are more favourable in this regard, with FlowReg-A providing the lowest volume increase out of all deep learning methods.

Refer to caption
Figure 9: Structural volumetric ratio for brain, ventricles and WML.
Refer to caption
Figure 10: Proportional Volume (PV) difference of ventricles and WMLs.
Refer to caption
Figure 11: Surface to surface Distance (SSD) percent change of ventricles.

The third integrity metric considered is SSD, which measures the shape of an anatomic object (this case the ventricles) with respect to the boundary of the brain. To measure the extent to which the shape of the ventricles has changed in shape before and after registration, the difference, or percent change in SSD (ΔSSDΔ𝑆𝑆𝐷\Delta SSD) is measured over the testing dataset and reported in Figure 11 for each of the registration methods. The lowest values are observed after registration using ANTs and Demons, followed by FlowReg-A+O. FlowReg-A+O has a change of around 13.8%percent13.813.8\% after registration which when compared to FlowReg-A, the difference is about 1%percent11\% which is the assumed contribution from FlowReg-O only. Since majority of the warping is done in 2D and largely affects the outer region of the brain and head, this metric exhibits that FlowReg-O maintains the shape of the brain and ventricles. The highest structural change when measured with the SSD validation metric is noticed using SimpleElastix and VoxelMorph.

3.3.2 Spatial Alignment

Refer to caption
Figure 12: Head angle (HA𝐻𝐴HA) measure. Blue solid line indicates the average μ𝜇\mu and the green line is the spread, σHAsubscript𝜎𝐻𝐴\sigma_{HA}.

Figure 12 displays the head angle (HA) results computed over all testing volumes. For best performance, the HA𝐻𝐴HA would be ideally 0 (i.e. aligned with the midsagital plane), with minimal spread across the dataset to indicate consistency. The spread of the HA metric is denoted by ςHAsubscript𝜍𝐻𝐴\varsigma_{HA} (average values can be found in Table 2) and is shown by the green line in Figure 12 which indicates three standard deviations away from the mean. A tighter clustering around 0 degrees indicates less deviation from the midsagittal plane (or lower HA over the entire registered dataset). As seen, the lowest spread from the mean is seen by SimpleElastix and FlowReg-A+O registration methods, indicating these methods produce the most consistent spatial alignment with the midsaggital plane. It is also noted that the performance of FlowRegA+O compared to the affine only FlowReg-A volumes shows a reduction in the spread of the HA through the application of the optical flow algorithm, which indicates that FlowReg-O improves overall alignment. The largest spread (or higher variability of the HA) is obtained by Voxelmorph.
Pixelwise Agreement (PWA) is measured by calculating the per-slice mean-squared error (MSE) when compared to respective slices from the original atlas F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z). A lower value of PWA indicates intensity and spatial alignment across slices in a registered dataset. PWA is computed on a slice-by-slice basis for slice z𝑧z by PWA(z)𝑃𝑊𝐴𝑧PWA(z) which is summed over all slices in the volumes to get a volume-based PWA, zPWA(z)subscript𝑧𝑃𝑊𝐴𝑧\sum_{z}PWA(z). The slice- and volume-based PWA for the registered, testing dataset are reported in Fig. 13. The lowest PWA over all slices is FlowReg-A+O followed by FlowReg-A, indicating there is maximal intensity and spatial alignment across slices using the proposed work. The highest error is seen by Demons, SE, and VoxelMorph.

Refer to caption
Figure 13: Pixelwise agreement (PWA). Left: the slice-wise alignment error PWA(z)𝑃𝑊𝐴𝑧PWA(z) as a function of slice number z𝑧z in the registered dataset. Right: average PWA over all slices zPWA(z)subscript𝑧𝑃𝑊𝐴𝑧\sum_{z}PWA(z).

The last spatial alignment measure investigated is the DSC𝐷𝑆𝐶DSC between the registered brain masks from the moving volumes, and the brain mask of the fixed atlas. Figure 14 and Table 2 contains the average DSC𝐷𝑆𝐶DSC values for each registration method over the testing dataset. The largest agreement is for ANTs, SE, FlowReg-A, and FlowReg-A+O. The lowest spatial overlap comes from the Demons method. To visualize spatial alignment, a heatmap is generated for each method by averaging the binary masks of the same slice in the registered output. Figure 15 shows the heatmaps for a bottom, middle and top slice over all methods. As can be seen, there is consistency in the lower slices for FlowReg, as there is minimal ghosting artifacts in the heatmap. However, with other methods, such as Demons or VM, there are many areas with inconsistencies in the posterior regions (likely where the ocular orbits occur). In the middle slices, most methods seem to have good alignment, and the performance is somewhat comparable on the top slices.

Refer to caption
Figure 14: Dice Similarity Coefficient (DSC) between fixed and registered brain masks.
Refer to caption
Figure 15: The average brain mask generated by each registration method. Red areas indicate high agreement, and blue indicates poor agreement.

3.3.3 Intensity Alignment

The last set of evaluation metrics investigated are the intensity alignment measures, and the average over the entire testing dataset is shown in Table 2. Intensity alignment measures, mutual information (MI) and correlation (R), investigate how the probability mass functions of the registered volumes compare to the atlas’ intensity distribution. The intensity profiles in neuroimages are related to anatomy and pathology. Fig. 16 shows boxplots of the MI and R metrics over the entire testing dataset. For both metrics, the highest MI and correlation are reported by FlowReg-A+O and FlowReg-A followed by ANTs. Therefore, the proposed work maintains and matches the intensity histograms the best over all competing methods.

Refer to caption
Figure 16: Intensity alignment validation over 464 testing volumes. Box and whisker plots. Left: mutual information. Right: correlation coefficient.

If a registration method generates images that have a high degree of spatial alignment, regions of high correspondence will have the same intensity characteristics reflected in the average. If different tissue regions are being averaged, however, there will be mixing of neighbouring tissues and therefore, the intensity profile of the images will not be maintained. To measure this, registration-specific atlases are generated via synchronized averaging. The quality of these atlases A(x,y,z)𝐴𝑥𝑦𝑧A(x,y,z) are quantitatively compared to the original template F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z) by examining histogram differences via Mean Absolute Intensity Differences (MAID). In FLAIR MRI, histogram peaks represent significant tissue regions such as brain matter and cerebrospinal fluid (CSF). These peaks should be aligned in the newly generated atlas A(x,y,z)𝐴𝑥𝑦𝑧A(x,y,z) with the original atlas F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z). Fig. 17 (left) shows the intensity histograms (normalized) of the atlases A(x,y,z)𝐴𝑥𝑦𝑧A(x,y,z) compared to the histogram of the original fixed volume. It can be seen that the histogram of the FlowReg-O+A and and FlowRegA are very similar to that of the atlas for the middle (major) peak (which corresponds to the brain tissue). To quantitatively measure the similarity of histograms, the MAID is computed between the original and new generated atlases in Figure 17 (middle) and the lowest error is found with FlowReg-A. We analyze a second set of results for a thresholded histogram that removes the background noise peak from the histogram. This MAID, computed on the histogram without the noise peak is called MAIDzp𝑀𝐴𝐼𝐷𝑧𝑝MAID-zp and is shown in Figure 17 (right). In these results, ANTs, FlowReg-A, and FlowReg-A+O provide the best performance indicating good spatial and intensity alignment in the registered outputs for these methods. The performance of FlowReg-A+O and FlowReg-A are similar, indicating FlowReg-O does not distort the intensity histogram. The highest error is observed in Demons.

Refer to caption
Figure 17: Left: Intensity distribution histograms of the atlases A(x,y,z)𝐴𝑥𝑦𝑧A(x,y,z) created through registering all test volumes per method. Middle: MAID computed between intensity PMFs of the generated atlas and the original atlas (fixed). Right: MAID for PMFs with the background-nulled from bin 00 to 202020 (MAIDzp𝑀𝐴𝐼subscript𝐷𝑧𝑝MAID_{zp})

4 Discussion

Table 2 contains the summary of all the validation metrics. In comparison to all methods (ANTs, Demons, SE, VM), the proposed FlowReg framework (FlowReg-A+O) achieves the highest performance across the spatial alignment metric (PWA) which indicates excellent slice to slice correspondence between registered datasets and the fixed volume. This can be attributed to the initial alignment of the volumes in 3D using the affine component, followed by the slice-by-slice refinement using optical flow in 2D that performs fine pixel movements. FlowReg also achieves high intensity similarity based on the MI and R metrics, which indicates the histograms of the registered volumes and that of the atlas are aligned. The correlation loss function may contribute to this phenomena since it enforces global intensity similarity between images. Since intensity distributions are related to the tissue content in the images, if the histograms are more aligned in the registered images, it will make subsequent analysis consistent and comparable across patients. FlowReg-A also was a top performer for several metrics, namely the volumetric ratio metric for the ventricles (ΔVventΔsubscript𝑉𝑣𝑒𝑛𝑡\Delta V_{vent}) and the Mean Absolute Intensity Difference (MAID). The ventricular integrity metric indicates that the related shapes and volumes are maintained the best using FlowReg-A. The intensity similarity can also be attributed to the correlation loss function in the affine network. Among the deep learning frameworks either FlowReg-A or FlowReg-A+O outperform Voxelmorph in all metrics except for the structural integrity metric for the brain. This may be due to the deformation field calculated by Voxelmorph, which was found to have lower vector magnitudes at the periphery of the head indicating little displacement in these regions. Since Voxelmorph was trained and tested for other neuroimaging sequences (i.e. T1), perhaps this architecture is not suited for FLAIR neuroimages.
Overall, FlowReg maintains anatomical and structure features while obtaining high intensity similarity to the fixed volume and excellent spatial alignment across the testing datasets. This can be attributed to several reasons. First, FlowReg-A performs the affine alignment in 3D which will globally deform the volume to achieve maximal correspondence. Further, FlowReg-O calculates the 2D displacement field and refines the movement of pixels on a slice-by-slice basis. The optical flow model architecture has been adapted from the video processing field to medical images. The advantage of this approach is that it is able to calculate small differences and perform the refinements needed to obtain correspondence. The three component loss function (photometric, smoothness, and correlation) perform three separate but important roles. The photometric loss, which is a pixelwise difference, ensures that pixels with similar intensities are displaced to areas of similar intensities and is the refinement component. The role of the smoothness loss is to ensure that for the calculated optical flow, continuity of the flow fields is encouraged. The correlation loss operates on the overall histogram intensity alignment between the moving and fixed volumes. Finally, the Charbonnier penalty function was used to reduce the effect of gross outliers to ensure the structural integrity of anatomy and pathology was maintained. It can be seen that with the combination of these loss functions, FlowReg-A+O performs the best for the intensity measures, MI, R, and for the spatial alignment measure PWA and is likely due to the complementary nature of the loss components. As the deformation vector field is generated for each pixel in the moving image, the photometric loss specifically minimizes the difference between similar intensity pixels in the moving and fixed images and can therefore regress accurate vectors for pixel displacement. Similarly, the correlation loss component operates on the global intensity differences and contributes to accurate global flow regression.
To investigate the relative run-time speed for each algorithm, each method was evaluated on ten randomly sampled volumes from the testing set. The average computation times per method for these ten volumes are shown in Table 3. Note that the times reported for FlowReg are for the two components FlowReg-A and FlowReg-O separately. For Voxelmorph, only the test time for the CNN network is shown. As can be seen, the fastest algorithm is FlowReg-A with an average time of 1.721.721.72 seconds per 3D volume, followed by Voxelmorph at 4.31s and then FlowReg-O with 6.9s. Given the testing time for FlowReg-A and FlowReg-O, the total time to register a volume (3D) and all the slices (2D) is 8.62s. Compared to the traditional approaches, the proposed method is faster by 2.6-4.7×\times. It is to be noted that the time reported for Voxelmorph is only the CNN network testing time, but for optimal performance affine transformation is required first using tools such as ANTs which adds additional computation time. Further, all deep learning methods outperform the traditional iterative methods by a large margin, indicating that the transferable nature of CNN-based registration tools are efficient and effective. As with many CNN-based systems, the challenge is the training time; the average training time on the 3714 training volumes for FlowReg-A was about 18 hours, while FlowReg-O was just over 5 days. As training can be completed offline, it is not too big of a concern for real-time applications.

Table 3: Average run-time for ten testing 3D volumes for each registration method.
ANTsDemonsSEVM-onlyFlowReg-AFlowReg-O
27.81s22.77s40.33s4.31s1.72s6.90s

Another interesting observation is that for WML structural integrity, all CNN-based solutions perform poorly compared to classic iterative-based approaches. Hyperintense lesions are usually the brightest spots in a brain MRI, thus when performing an alignment to a template (such as an averaged atlas) these hyperintense regions will be represented as areas of high dissimilarities. A network with a loss function that attempts to mitigate pixel differences between the two volumes will attempt to over-correct these areas and displace the pixels in undesirable ways. This displacement will change the registered volume’s WMLs from a structural and volumetric perspective. A possible solution that can be investigated to mitigate this problem in future works is lesion-inpainting (Sdika and Pelletier, 2009) prior to registration. Inpainting frameworks mask the lesions with the average intensities of the surrounding brain tissue. However, this approach would require either manual or automatic segmentation of WMLs which is an active area of research. Additionally, future works could also consider the incorporation of skull-stripping as a preprocessing step, to remove all non-cerebral tissues. This could improve performance since a lot the warping is occurring in regions where pixel intensities are high, which correspond to extra-cranial structures. If these areas are removed it is possible that more of the pixel movement would be focused to resolve areas of higher differences in the brain.
When considering all validation groups and metrics, the traditional iterative based registration methods perform well over many of the structural integrity metrics such as ΔVwmlΔsubscript𝑉𝑤𝑚𝑙\Delta V_{wml}, ΔPVventΔ𝑃subscript𝑉𝑣𝑒𝑛𝑡\Delta PV_{vent}, ΔPVwmlΔ𝑃subscript𝑉𝑤𝑚𝑙\Delta PV_{wml}, indicating that these methods do not deform the WML and ventricles as much as the CNN-based methods. In terms of Demons, although it does perform the best according to the ΔVΔ𝑉\Delta V and ΔPVΔ𝑃𝑉\Delta PV metrics, when visually examining several volumes as depicted in Figure 7, the cerebral tissue seems warped in a manner that is uncharacteristic of FLAIR MRI. Further, structural changes are visible around the edges of the sulci and the WML themselves have been smeared and blended with the remainder of the gray matter of the brain. A limitation of the design is noted in the FlowReg-A registration when it comes to the ΔVΔ𝑉\Delta V and ΔPVΔ𝑃𝑉\Delta PV. As FlowReg-A operates on the volume using an affine matrix, which is a global transformation that equally warps every voxel in the volume, the proportional and volumetric difference of a structure before and after registration should remain unchanged. In our experiments we did notice differences for FlowReg-A and reported the differences. This outcome is likely due to slice thickness, limited pixel resolution, and the slice gap of 3mm3𝑚𝑚~{}3mm during image acquisition. ANTs maintains moderate distortion over all structural integrity metrics and the best for the ventricular metrics. One reason for this could be due to the Symmetric Normalization, where both the moving and the fixed volumes are warped symmetrically to a similarity ”mid-point” (Avants et al., 2008). SimpleElastix, another iterative-based registration method, performs well for the Head Angle metric likely due to the two step process of an affine alignment followed by a B-spline transform (Klein et al., 2010). One major downside of using iterative-based methods for image registration is the lengthy computation for 3D neuroimaging volumes (as is seen in Table 2) and the lack of transferring this knowledge to new image pairs.
Medical image registration is a preprocessing tool that can map two images to the same geometric space. Once images are registered, direct spatial comparisons can be made between the two images to quantify disease progression, treatment efficacy, pathology changes, and age-specific anatomical changes. The proposed FlowReg model is able to warp a moving image to a fixed image space in an unsupervised manner, and is computed in a two-phase approach: initially for gross alignment in 3D, followed by fine-tuning in 2D on an image-by-image basis which is a novel approach. Alongside the registration framework, several clinically relevant validation metrics are proposed that we hope will be used by researchers in the future.

5 Conclusion

In this work we propose FlowReg, a deep learning-based framework that performs unsupervised image registration for multicentre FLAIR MRI. The system is composed of two architectures: FlowReg-A which affinely corrects for gross differences between moving and fixed volumes in 3D followed by FlowReg-O which performs pixelwise deformations on a slice-by-slice basis for fine tuning in 2D. Using 464 testing volumes, with 70 of the imaging volumes having ground truth manual delineations for ventricles and lesions, the proposed method was compared to ANTs, Demons, SE, and Voxelmorph. To quantitatively assess the performance of the registration tools, several proposed validation metrics were used. These metrics focused on structural integrity of tissues, spatial alignment, and intensity similarity. Tissue integrity was analyzed using volumetric and structural measures: rolumetric ratio, proportional volume (PV), and structural similarity distance (SSD). Spatial alignment was analyzed with a Head Angle with respect to the saggital plane, Pixelwise Agreement, and Brain DSC. The intensity metrics measured the similarity in intensities and intensity distrubitons of the moving and fixed volumes with Mutual Information (MI), correlation (R), and Mean Intensity Difference.

Experimental results show FlowReg (FlowReg-A+O) performs better than iterative-based registration algorithms for intensity and spatial alignment metrics, indicating that FlowReg delivers optimal intensity and spatial alignment between moving and fixed volumes. Among the deep learning frameworks evaluated, FlowReg-A or FlowReg-A+O provided the highest performance over all but one of the metrics. In terms of structural integrity metrics, FlowReg provided moderate (or best) performance for the brain, ventricle and WML objects. The success of the proposed work can be attributed to: 1) the two-step registration process that consists of affine followed by optical flow deformations and 2) the three component loss function in optical flow that encourages global intensity similarity, while minimizing large deformations. Finally, the novel validation metrics to assess medical image registration provide the necessary context when compared to other registration methods.


Acknowledgments

We acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC) through the NSERC Discovery Grant program.

We would also like to acknowledge the research mentorship received for this work from Dr. Konstantinos Derpanis (Ryerson University, Computer Science Dept.) and Jason J. Yu (York University, Computer Science Dept.) which included methodological input on the optical flow network and on the use of the penalty function, in addition to editing of the manuscript.

The Canadian Atherosclerosis Imaging Network (CAIN) was established through funding from a Canadian Institutes of Health Research Team Grant for Clinical Research Initiatives (CIHR-CRI 88057). Funding for the infrastructure was received from the Canada Foundation for Innovation (CFI-CAIN 20099), with matching funds provided by the governments of Alberta, Ontario, and Quebec.

Data collection and sharing for this project was partially funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defence award number W81XWH-12-2-0012).

Ethical Standards

The work follows appropriate ethical standards in conducting research and writing the manuscript, following applicable laws and regulations.

Conflicts of Interest

There are no conflicts of interest.

References

  • Abadi et al. (2015) Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available from tensorflow.org.
  • Alber et al. (2019) Jessica Alber, Suvarna Alladi, Hee-Joon Bae, David A Barton, Laurel A Beckett, Joanne M Bell, Sara E Berman, Geert Jan Biessels, Sandra E Black, Isabelle Bos, et al. White matter hyperintensities in vascular contributions to cognitive impairment and dementia (vcid): knowledge gaps and opportunities. Alzheimer’s & Dementia: Translational Research & Clinical Interventions, 5:107–117, 2019.
  • Avants et al. (2008) Brian B Avants, Charles L Epstein, Murray Grossman, and James C Gee. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Medical Image Analysis, 12(1):26–41, 2008.
  • Badji and Westman (2020) Atef Badji and Eric Westman. Cerebrovascular pathology in alzheimer’s disease: Hopes and gaps. Psychiatry Research: Neuroimaging, page 111184, 2020.
  • Balakrishnan et al. (2018) Guha Balakrishnan, Amy Zhao, Mert R Sabuncu, John Guttag, and Adrian V Dalca. An unsupervised learning model for deformable medical image registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9252–9260, 2018.
  • Cao et al. (2017) Xiaohuan Cao, Jianhua Yang, Jun Zhang, Dong Nie, Minjeong Kim, Qian Wang, and Dinggang Shen. Deformable image registration based on similarity-steered cnn regression. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 300–308. Springer, 2017.
  • Chollet et al. (2015) François Chollet et al. Keras. https://keras.io, 2015.
  • Csapo et al. (2012) Istvan Csapo, Brad Davis, Yundi Shi, Mar Sanchez, Martin Styner, and Marc Niethammer. Longitudinal image registration with non-uniform appearance change. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 280–288. Springer, 2012.
  • Dalca et al. (2018) Adrian V Dalca, Guha Balakrishnan, John Guttag, and Mert R Sabuncu. Unsupervised learning for fast probabilistic diffeomorphic registration. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 729–738. Springer, 2018.
  • de Vos et al. (2019) Bob D de Vos, Floris F Berendsen, Max A Viergever, Hessam Sokooti, Marius Staring, and Ivana Išgum. A deep learning framework for unsupervised affine and deformable image registration. Medical Image Analysis, 52:128–143, 2019.
  • Debette and Markus (2010) Stéphanie Debette and HS Markus. The clinical importance of white matter hyperintensities on brain magnetic resonance imaging: systematic review and meta-analysis. Bmj, 341:c3666, 2010.
  • Dosovitskiy et al. (2015) Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. Flownet: Learning optical flow with convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 2758–2766, 2015.
  • El-Gamal et al. (2016) Fatma El-Zahraa Ahmed El-Gamal, Mohammed Elmogy, and Ahmed Atwan. Current trends in medical image registration and fusion. Egyptian Informatics Journal, 17(1):99–124, 2016.
  • Fan et al. (2018) Jingfan Fan, Xiaohuan Cao, Zhong Xue, Pew-Thian Yap, and Dinggang Shen. Adversarial similarity network for evaluating image alignment in deep learning based registration. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 739–746. Springer, 2018.
  • Frey et al. (2019) Benedikt M Frey, Marvin Petersen, Carola Mayer, Maximilian Schulz, Bastian Cheng, and Goetz Thomalla. Characterization of white matter hyperintensities in large-scale mri-studies. Frontiers in Neurology, 10:238, 2019.
  • Garg et al. (2016) Ravi Garg, Vijay Kumar Bg, Gustavo Carneiro, and Ian Reid. Unsupervised cnn for single view depth estimation: Geometry to the rescue. In European Conference on Computer Vision, pages 740–756. Springer, 2016.
  • Griffanti et al. (2018) Ludovica Griffanti, Mark Jenkinson, Sana Suri, Enikő Zsoldos, Abda Mahmood, Nicola Filippini, Claire E Sexton, Anya Topiwala, Charlotte Allan, Mika Kivimäki, et al. Classification and characterization of periventricular and deep white matter hyperintensities on mri: a study in older adults. Neuroimage, 170:174–181, 2018.
  • Guimiot et al. (2008) F Guimiot, C Garel, C Fallet-Bianco, F Menez, S Khung-Savatovsky, J-F Oury, G Sebag, and A-L Delezoide. Contribution of diffusion-weighted imaging in the evaluation of diffuse white matter ischemic lesions in fetuses: correlations with fetopathologic findings. American journal of neuroradiology, 29(1):110–115, 2008.
  • Horn and Schunck (1981) Berthold KP Horn and Brian G Schunck. Determining optical flow. Artificial Intelligence, 17(1-3):185–203, 1981.
  • Hunt et al. (1989) AL Hunt, WW Orrison, RA Yeo, KY Haaland, RL Rhyne, PJ Garry, and GA Rosenberg. Clinical significance of mri white matter lesions in the elderly. Neurology, 39(11):1470–1470, 1989.
  • Iglesias and Sabuncu (2015) Juan Eugenio Iglesias and Mert R Sabuncu. Multi-atlas segmentation of biomedical images: a survey. Medical image analysis, 24(1):205–219, 2015.
  • Ilg et al. (2017) Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2462–2470, 2017.
  • Jaderberg et al. (2015) Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. Spatial transformer networks. In Advances in Neural Information Processing Systems, pages 2017–2025, 2015.
  • Johnson et al. (2013) Hans J. Johnson, M. McCormick, L. Ibáñez, and The Insight Software Consortium. The ITK Software Guide. Kitware, Inc., third edition, 2013. URL http://www.itk.org/ItkSoftwareGuide.pdf. In press.
  • Khademi et al. (2011) April Khademi, Anastasios Venetsanopoulos, and Alan R Moody. Robust white matter lesion segmentation in flair mri. IEEE Transactions on Biomedical Engineering, 59(3):860–871, 2011.
  • Khademi et al. (2020) April Khademi, Brittany Reiche, Justin DiGregorio, Giordano Arezza, and Alan R Moody. Whole volume brain extraction for multi-centre, multi-disease flair mri datasets. Magnetic Resonance Imaging, 66:116–130, 2020.
  • Kim et al. (2008) Ki Woong Kim, James R MacFall, and Martha E Payne. Classification of white matter lesions on magnetic resonance imaging in elderly persons. Biological psychiatry, 64(4):273–280, 2008.
  • Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • Klein et al. (2010) Stefan Klein, Marius Staring, Keelin Murphy, Max A Viergever, Josien PW Pluim, et al. Elastix: a toolbox for intensity-based medical image registration. IEEE Transactions on Medical Imaging, 29(1):196–205, 2010.
  • Kobayashi et al. (2002) K Kobayashi, M Hayashi, H Nakano, Y Fukutani, K Sasaki, M Shimazaki, and Y Koshino. Apoptosis of astrocytes with enhanced lysosomal activity and oligodendrocytes in white matter lesions in alzheimer’s disease. Neuropathology and applied neurobiology, 28(3):238–251, 2002.
  • Lao et al. (2008) Zhiqiang Lao, Dinggang Shen, Dengfeng Liu, Abbas F Jawad, Elias R Melhem, Lenore J Launer, R Nick Bryan, and Christos Davatzikos. Computer-assisted segmentation of white matter lesions in 3d mr images using support vector machine. Academic radiology, 15(3):300–313, 2008.
  • Liu et al. (2001) Yanxi Liu, Robert T Collins, and William E Rothfus. Robust midsagittal plane extraction from normal and pathological 3-d neuroradiology images. IEEE Transactions on Medical Imaging, 20(3):175–192, 2001.
  • Maes et al. (1997) Frederik Maes, Andre Collignon, Dirk Vandermeulen, Guy Marchal, and Paul Suetens. Multimodality image registration by maximization of mutual information. IEEE transactions on Medical Imaging, 16(2):187–198, 1997.
  • Malloy et al. (2007) Paul Malloy, Stephen Correia, Glenn Stebbins, and David H Laidlaw. Neuroimaging of white matter in aging and dementia. The Clinical Neuropsychologist, 21(1):73–109, 2007.
  • Mani and Arivazhagan (2013) VRS Mani and S Arivazhagan. Survey of medical image registration. Journal of Biomedical Engineering and Technology, 1(2):8–25, 2013.
  • Marstal et al. (2016) Kasper Marstal, Floris Berendsen, Marius Staring, and Stefan Klein. Simpleelastix: A user-friendly, multi-lingual library for medical image registration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 134–142, 2016.
  • Mueller et al. (2005) Susanne G Mueller, Michael W Weiner, Leon J Thal, Ronald C Petersen, Clifford Jack, William Jagust, John Q Trojanowski, Arthur W Toga, and Laurel Beckett. The alzheimer’s disease neuroimaging initiative. Neuroimaging Clinics, 15(4):869–877, 2005.
  • Oppedal et al. (2015) Ketil Oppedal, Trygve Eftestøl, Kjersti Engan, Mona K Beyer, and Dag Aarsland. Classifying dementia using local binary patterns from different regions in magnetic resonance images. Journal of Biomedical Imaging, 2015:1–14, 2015.
  • Pennec et al. (1999) Xavier Pennec, Pascal Cachier, and Nicholas Ayache. Understanding the “demon’s algorithm”: 3d non-rigid registration by gradient descent. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 597–605. Springer, 1999.
  • Phellan et al. (2014) Renzo Phellan, Alexandre X Falcao, and Jayaram Udupa. Improving atlas-based medical image segmentation with a relaxed object search. In International Symposium Computational Modeling of Objects Represented in Images, pages 152–163. Springer, 2014.
  • Piguet et al. (2005) Olivier Piguet, LJ Ridley, DA Grayson, HP Bennett, H Creasey, TC Lye, and G Anthony Broe. Comparing white matter lesions on t2 and flair mri in the sydney older persons study. European journal of neurology, 12(5):399–402, 2005.
  • Rehman and Lee (2018) Hafiz Rehman and Sungon Lee. An efficient automatic midsagittal plane extraction in brain mri. Applied Sciences, 8(11):2203, 2018.
  • Sarbu et al. (2016) Nicolae Sarbu, Robert Y Shih, Robert V Jones, Iren Horkayne-Szakaly, Laura Oleaga, and James G Smirniotopoulos. White matter diseases with radiologic-pathologic correlation. Radiographics, 36(5):1426–1447, 2016.
  • Sdika and Pelletier (2009) Michaël Sdika and Daniel Pelletier. Nonrigid registration of multiple sclerosis brain images using lesion inpainting for morphometry or lesion mapping. Human Brain Mapping, 30(4):1060–1067, 2009.
  • Sokooti et al. (2017) Hessam Sokooti, Bob De Vos, Floris Berendsen, Boudewijn PF Lelieveldt, Ivana Išgum, and Marius Staring. Nonrigid image registration using multi-scale 3d convolutional neural networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 232–239. Springer, 2017.
  • Tardif et al. (2013) Jean-Claude Tardif, J David Spence, Therese M Heinonen, Alan Moody, Josephine Pressacco, Richard Frayne, Philippe L’Allier, Benjamin JW Chow, Matthias Friedrich, Sandra E Black, et al. Atherosclerosis imaging and the canadian atherosclerosis imaging network. Canadian Journal of Cardiology, 29(3):297–303, 2013.
  • Thirion (1998) J-P Thirion. Image matching as a diffusion process: an analogy with maxwell’s demons. Medical Image Analysis, 2(3):243–260, 1998.
  • Thirion (1995) Jean-Philippe Thirion. Fast non-rigid matching of 3d medical images. 1995.
  • Trip and Miller (2005) S A Trip and D H Miller. Imaging in multiple sclerosis. Journal of Neurology, Neurosurgery & Psychiatry, 76(suppl 3):11–18, 2005. ISSN 0022-3050. doi: 10.1136/jnnp.2005.073213. URL https://jnnp.bmj.com/content/76/suppl_3/iii11.
  • Udaka et al. (2002) Fukashi Udaka, Hideyuki Sawada, and Masakuni Kameyama. White matter lesions and dementia mri-pathological correlation. Annals of the New York Academy of Sciences, 977(1):411–415, 2002.
  • Uzunova et al. (2017) Hristina Uzunova, Matthias Wilms, Heinz Handels, and Jan Ehrhardt. Training cnns for image registration from few samples with model-based data augmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 223–231. Springer, 2017.
  • Wardlaw et al. (2013) Joanna M Wardlaw, Eric E Smith, Geert J Biessels, Charlotte Cordonnier, Franz Fazekas, Richard Frayne, Richard I Lindley, John T O’Brien, Frederik Barkhof, Oscar R Benavente, et al. Neuroimaging standards for research into small vessel disease and its contribution to ageing and neurodegeneration. The Lancet Neurology, 12(8):822–838, 2013.
  • (53) AM Winkler, P Kochunov, and DC Glahn. Flair templates— brainder.
  • Yu et al. (2016) Jason J Yu, Adam W Harley, and Konstantinos G Derpanis. Back to basics: Unsupervised learning of optical flow via brightness constancy and motion smoothness. In European Conference on Computer Vision, pages 3–10. Springer, 2016.
  • Zhao et al. (2019) Shengyu Zhao, Tingfung Lau, Ji Luo, I Eric, Chao Chang, and Yan Xu. Unsupervised 3d end-to-end medical image registration with volume tweening network. IEEE journal of biomedical and health informatics, 24(5):1394–1404, 2019.
  • Zhu et al. (2020) Zhenyu Zhu, Yiqin Cao, Chenchen Qin, Yi Rao, Dong Ni, and Yi Wang. Unsupervised 3d end-to-end deformable network for brain mri registration. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 1355–1359. IEEE, 2020.

Appendix A

Imaging Dataset Details

A summary of the datasets used, and demographical information are shown in Table 4. The ADNI database consisted of images from three MR scanner manufacturers: General Electric (n=1075𝑛1075n=1075), Phillips Medical Systems (n=848𝑛848n=848), and Siemens (n=2076𝑛2076n=2076) with 181818 different models in total. In CAIN there is five different models across three vendors with General Electric (n=181𝑛181n=181), Phillips Medical Systems (n=230𝑛230n=230), and Siemens (n=289𝑛289n=289). The number of cases per scanner model and vendor are shown in Table 5 along with the ranges of the acquisition parameters. As can be seen, this dataset represents a diverse multicentre dataset, with varying scanners, diseases, voxel resolutions, imaging acquisition parameters and pixel resolutions. Therefore, this dataset will give insight into how each registration method can generalize in multicentre data.

Table 4: Experimental datasets used in this work (CAIN and ADNI).
Dataset# Subjects# Volumes# Slices# CentersAgeSex F/M (%percent\%)
CAIN40070031,500973.87±8.29plus-or-minus73.878.2973.87\pm 8.2938.0/58.6
ADNI9004263250,006073.48±7.37plus-or-minus73.487.3773.48\pm 7.3746.5/53.5
Table 5: Detailed acquisition parameters and information for each dataset.

DatasetVendorModelTR(ms𝑚𝑠ms)TE(ms𝑚𝑠ms)TI(ms𝑚𝑠ms)MagneticField (B𝐵B)Pixel Size(mm2𝑚superscript𝑚2mm^{2})SliceThickness(mm𝑚𝑚mm)NADNIGE MedicalSystemsDiscoveryMR750/w11000149.42 -153.13225030.73865614Signa HDxt10002 -11002149.10 -192.652200 -22501.5 - 30.7386 - 0.87895 - 6461PhillipsMedicalSystemsIngenia900090250030.7386583Achieva900090250030.6104 - 0.73865520Gemini900090250030.7386535Ingenuity900090250030.7386519Intera6000 -900090 -1402000 -25001.5 - 30.7386 - 0.87895191SiemensBiographmMR900091250030.7386513Prisma900091250030.738655Skyra900091250030.73865213SymphonyTim1000012522001.50.878952TrioTim9000 -1100090 -1492250 -280030.7386 - 12 - 51332Verio900091250030.7386 - 1.03155511CAINGE MedicalSystemsDiscoveryMR7508000 -9995140.84 -150.242200 -239030.7386 - 0.87893181PhillipsMedicalSystemsAchieva9000 -11000125280030.18373230SiemensInteraMR9000119250031314Skyra90001192500313162TrioTim9000117 -1222500313113


Supplemental Tables and Figures

Table 6: FlowReg-A model, details of architecture in Figure 2.
LayerFiltersKernelStrideActivation
fixedInput----
movingInput----
concatenate----
conv3D167x7x72, 2, 1ReLu
conv3D325x5x52, 2, 1ReLu
conv3D643x3x32, 2, 2ReLu
conv3D1283x3x32, 2, 2ReLu
conv3D2563x3x32, 2, 2ReLu
conv3D5123x3x32, 2, 2ReLu
flatten----
dense12--Linear
Table 7: FlowReg-O model details of architecture in Figure 3.
LayerFiltersKernelStridesActivation
fixedInput----
movingInput----
concatenate----
conv2D647x72, 2L-ReLu
conv2D1285x52, 2L-ReLu
conv2D2565x52, 2L-ReLu
conv2D2563x31, 1L-ReLu
conv2D5123x32, 2L-ReLu
conv2D5123x31, 1L-ReLu
conv2D5123x32, 2L-ReLu
conv2D5123x31, 1L-ReLu
conv2D10243x32, 2L-ReLu
conv2D10243x31, 1L-ReLu
conv2D23x31, 1-
upconv2D24x42, 2-
upconv2D5124x42, 2L-ReLu
conv2D23x31, 1-
upconv2D24x42, 2-
upconv2D2564x42, 2L-ReLu
conv2D23x31, 1-
upconv2D24x42, 2-
upconv2D1284x42, 2L-ReLu
conv2D23x31, 1-
upconv2D24x42, 2-
upconv2D644x42, 2L-ReLu
conv2D23x31, 1-
upconv2D24x42, 2-
upconv2D324x42, 2L-ReLu
conv2D23x31, 1-
upconv2D24x42, 2-
upconv2D164x42, 2L-ReLu
conv2D23x32, 2-
resampler----
Refer to caption
Figure 18: The total loss values during training for FlowReg-O at different α𝛼\alpha values in the Charbonnier penalty function (Eqn. 5).
Refer to caption
Figure 19: Single slices from five volumes registered using FlowReg-O at various α𝛼\alpha values for the Charbonnier penalty.
Refer to caption
Figure 20: Flow magnitudes of deformation fields from FlowReg-O at various α𝛼\alpha values. Images correspond to slices in Figure 19. Blue indicates areas of low flow vector magnitude and red indicates larger vector magnitude.

Appendix B

Here we describe in detail the calculations of each registration metric. The block diagram to compute the metrics is shown in Figure 21.

Refer to caption
Figure 21: Registration metrics extraction process. HA = Head Angle, MI = Mutual Information, R = Correlation Coefficient, PV = Proportional Volume, SSD = Surface to Surface Distance.

Structural Integrity: Proportional Volume — PV

If a registration scheme maintains the structural integrity of anatomical and pathological objects, the relative volume of objects should remain approximately the same after registration. Using binary masks of anatomical or pathological objects of interest, the proportional volume (PV) is proposed to measure the volume of a structure (vols𝑣𝑜subscript𝑙𝑠vol_{s}) compared to the total brain volume (volb𝑣𝑜subscript𝑙𝑏vol_{b}), as in:

PV=volsvolb.𝑃𝑉𝑣𝑜subscript𝑙𝑠𝑣𝑜subscript𝑙𝑏PV=\frac{vol_{s}}{vol_{b}}.(8)

The volume of objects in physical dimensions is found by multiplying the number of pixels by voxel resolution:

vol=Vx×Vy×Vz×np,𝑣𝑜𝑙subscript𝑉𝑥subscript𝑉𝑦subscript𝑉𝑧subscript𝑛𝑝vol=V_{x}\times V_{y}\times V_{z}\times n_{p},(9)

where vol𝑣𝑜𝑙vol is the volume in mm3𝑚superscript𝑚3mm^{3}, Vxsubscript𝑉𝑥V_{x} and Vysubscript𝑉𝑦V_{y} are the pixel width and height and Vzsubscript𝑉𝑧V_{z} is the slice-thickness.
The difference between the PV before and after registration can be investigated to analyze how registration changes the proportion of each structure with respect to the brain. The difference in PV before and after registration can be measured by

ΔPV=PVorigPVreg.Δ𝑃𝑉𝑃subscript𝑉𝑜𝑟𝑖𝑔𝑃subscript𝑉𝑟𝑒𝑔\Delta PV=PV_{orig}-PV_{reg}.(10)

where PVorig𝑃subscript𝑉𝑜𝑟𝑖𝑔PV_{orig} and PVreg𝑃subscript𝑉𝑟𝑒𝑔PV_{reg} are the PV computed before and after registration. In this work, two structures of interest are examined for vols𝑣𝑜subscript𝑙𝑠vol_{s}: white matter lesions (WML) and ventricles since these objects are important for disease evaluation. Ideally the ratio of object volumes to the total brain volume would stay the same before and after registration

Structural Integrity: Volume Ratio — ΔVsΔsubscript𝑉𝑠\Delta V_{s}

In addition to the ΔPVΔ𝑃𝑉\Delta PV which looks at proportional changes in volumes before and after registration, the volume ratio ΔVsΔsubscript𝑉𝑠\Delta V_{s} is also defined on a per object basis. The volume ratio investigates the volumes of objects before and after registration, as in

ΔVs=volorigvolreg,Δsubscript𝑉𝑠𝑣𝑜subscript𝑙𝑜𝑟𝑖𝑔𝑣𝑜subscript𝑙𝑟𝑒𝑔\Delta V_{s}=\frac{vol_{orig}}{vol_{reg}},(11)

where volorig𝑣𝑜subscript𝑙𝑜𝑟𝑖𝑔vol_{orig} and volreg𝑣𝑜subscript𝑙𝑟𝑒𝑔vol_{reg} are the volumes of object s𝑠s before and after registration, respectively. The volume ratio is computed for three objects s𝑠s: the brain, WML and ventricles. This metric quantifies the overall change in volume before and after registration (the best ratio is a value equal to 1).

Structural Integrity: Surface to Surface Distance — SSD

A third structural integrity measure, SSD, is proposed to measure the integrity between objects in the brain before and after registration. In particular, the SSD measures how the overall shape of a structure changes after registration. To compute the SSD, binary masks of the brain B(x,y,z)𝐵𝑥𝑦𝑧B(x,y,z), and structure S(x,y,z)𝑆𝑥𝑦𝑧S(x,y,z) of interest are obtained, and an edge map is obtained to find EB(x,y,z)subscript𝐸𝐵𝑥𝑦𝑧E_{B}(x,y,z) and ES(x,y,z)subscript𝐸𝑆𝑥𝑦𝑧E_{S}(x,y,z), respectively. For every non-zero pixel coordinate (x,y,z)𝑥𝑦𝑧(x,y,z) in the boundary of the structure, i.e. ES(x,y,z)=1subscript𝐸𝑆𝑥𝑦𝑧1E_{S}(x,y,z)=1, the minimum Euclidean distance from the structure’s boundary to the brain edge is found. This closest surface-to-surface distance between pixels in the objects’ boundaries is averaged over the entire structure of interest to compute the average SSD

SSD=1N[i=1Nargmin(xs,ys,zs)S(p+q+r)],𝑆𝑆𝐷1𝑁delimited-[]superscriptsubscript𝑖1𝑁subscriptargminsubscript𝑥𝑠subscript𝑦𝑠subscript𝑧𝑠𝑆𝑝𝑞𝑟SSD=\frac{1}{N}\left[\sum_{i=1}^{N}\mathrm{argmin}_{(x_{s},y_{s},z_{s})\in S}\left(\sqrt{p+q+r}\right)\right],(12)

where p=(xsxb)2𝑝superscriptsubscript𝑥𝑠subscript𝑥𝑏2p=(x_{s}-x_{b})^{2}, q=(ysyb)2𝑞superscriptsubscript𝑦𝑠subscript𝑦𝑏2q=(y_{s}-y_{b})^{2}, r=(zszb)2𝑟superscriptsubscript𝑧𝑠subscript𝑧𝑏2r=(z_{s}-z_{b})^{2} are differences between points in the edge maps of Essubscript𝐸𝑠E_{s} and Ebsubscript𝐸𝑏E_{b}, (xs,ys,zs)subscript𝑥𝑠subscript𝑦𝑠subscript𝑧𝑠(x_{s},y_{s},z_{s}) and (xb,yb,zb)subscript𝑥𝑏subscript𝑦𝑏subscript𝑧𝑏(x_{b},y_{b},z_{b}) are triplets of the co-ordinates in the binary edge maps for the structural objects of interest and the brain, respectively, and N𝑁N is the number of points in the structure’s boundary. The overall structural integrity of objects should be maintained with respect to the brain structure after registration, i.e. the distance between the objects and the brain shape should not be significantly altered. This metric can be used to investigate the extent in which anatomical shapes are maintained or distorted by registration by examining the difference in SSD before and after registration by

ΔSSD=SSDorigSSDregSSDorig,Δ𝑆𝑆𝐷𝑆𝑆subscript𝐷𝑜𝑟𝑖𝑔𝑆𝑆subscript𝐷𝑟𝑒𝑔𝑆𝑆subscript𝐷𝑜𝑟𝑖𝑔\Delta SSD=\frac{SSD_{orig}-SSD_{reg}}{SSD_{orig}},(13)

where SSDorig𝑆𝑆subscript𝐷𝑜𝑟𝑖𝑔SSD_{orig} and SSDreg𝑆𝑆subscript𝐷𝑟𝑒𝑔SSD_{reg} is the SSD before and after registration, respectively.

Spatial Alignment: Head Angle — HA

Head Angle (HA) is an orientation metric that measures the extent to which the head is rotated with respect to the midsaggital plane. For properly registered data (especially for data that is being aligned to an atlas), the HA should be close to zero with the head being aligned along the midstagittal plane. To measure the HA, a combination of Principal Component Analysis (PCA) and angle sweep as described in Rehman and Lee (2018); Liu et al. (2001) is adopted to find the head orientation of registered and unregistered data. The MRI volume is first binarized using a combination of adaptive thresholding and opening and closing techniques to approximately segment the head. The coordinates of each non-zero pixel in this 2D mask are stored in two vectors (one for each coordinate) and the eigenvectors of these arrays are found through PCA. The directions of eigenvectors specify the orientation of the major axes of the approximated ellipse for the head region in a slice with respect to the vertical saggital plane. Eigenvalues are the magnitude of the eigenvectors (or length of the axes). The largest eigenvalue dictates the direction of the longest axis which is approximately the head angle θ1subscript𝜃1\theta_{1}. For improved robustness to outliers and improve the precision of the estimated angles, a secondary refinement step is utilized to compute the refined HA θ2subscript𝜃2\theta_{2}. Every second slice from the middle (axial) slice to the top of the head are used and the three smallest angles over all slices are taken as candidates for further refinement.The lowest angles are selected as they are the most representative of head orientation. Each selected slice is mirrored and rotated according to an angle sweep from 2×θ1<θ2<2×θ12subscript𝜃1subscript𝜃22subscript𝜃1-2\times\theta_{1}<\theta_{2}<2\times\theta_{1} at 0.5°0.5°0.5\degree angle increments. At every increment of the rotation, the cross-correlation between the mirrored rotating image and the original is calculated and a score is recorded. The angle at which the highest score is selected for the optimized value θ2subscript𝜃2\theta_{2}. The final HA is obtained by summing the respective coarse and fine angle estimates, i.e. θ=θ1+θ2𝜃subscript𝜃1subscript𝜃2\theta=\theta_{1}+\theta_{2}.

Spatial Alignment: Pixelwise Agreement — PWA

Physical alignment in 3D means that within a registered dataset, each subject should have high correspondence between slices, i.e. the same slice from each subject should account for the same anatomy across patients. To measure this effect, a metric called Pixelwise Agreement (PWA) is proposed. It considers the same slice across all the registered volumes in a dataset and compares them to the same slice from an atlas template (the fixed volume) through the mean-squared error. The sum of the error is computed for each slice, to obtain a slice-wise estimate of the difference between the same slice in the atlas as compared to each of the same slices from the registered data:

PWA(z)=1Nj1NxyjJ(x,y)(Mj(x,y,z)F(x,y,z))2𝑃𝑊𝐴𝑧1subscript𝑁𝑗1subscript𝑁𝑥𝑦subscript𝑗𝐽subscript𝑥𝑦superscriptsubscript𝑀𝑗𝑥𝑦𝑧𝐹𝑥𝑦𝑧2PWA(z)=\frac{1}{N_{j}}\frac{1}{N_{xy}}\sum_{j\in J}\sum_{(x,y)}(M_{j}(x,y,z)-F(x,y,z))^{2}(14)

where z𝑧z is the slice number for which PWA is computed, Mj(x,y,z)subscript𝑀𝑗𝑥𝑦𝑧M_{j}(x,y,z) is the moving test volume j𝑗j from a total of Njsubscript𝑁𝑗N_{j} volumes from the dataset, Nxysubscript𝑁𝑥𝑦N_{x}y is the number of voxels in slice z𝑧z and F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z) is the atlas. Thus, at each slice, for the entire dataset, the PWA compares every slice to the same slice of the atlas. Low PWA indicates high-degree of correspondence between all the slices from the registered dataset and that of the atlas and considers both spatial and intensity alignment. If there is poor spatial alignment, there will be poor intensity alignment since different tissues will be overlapping during averaging. The slice-based PWA may also be summed to get a total volume PWA, i.e. 1NzzPWA(z)1subscript𝑁𝑧subscript𝑧𝑃𝑊𝐴𝑧\frac{1}{N_{z}}\sum_{z}PWA(z) where Nzsubscript𝑁𝑧N_{z} is the number of slices.

Spatial Alignment: Dice Similarity Coefficient — DSC

To further examine spatial alignment, manually delineated brain masks from the moving volumes M(x,y,z)𝑀𝑥𝑦𝑧M(x,y,z) were warped with the calculated deformation and compared to the brain mask of the atlas template F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z) through the Dice Similarity Coefficient (DSC):

DSC=2|bMbF||bM|+|bF|,𝐷𝑆𝐶2subscript𝑏𝑀subscript𝑏𝐹subscript𝑏𝑀subscript𝑏𝐹DSC=\frac{2|b_{M}\cap b_{F}|}{|b_{M}|+|b_{F}|},(15)

where bMsubscript𝑏𝑀b_{M} is the registered, moving brain mask and bFsubscript𝑏𝐹b_{F} is the brain mask of the atlas template. DSC will be higher when there is a high-degree of overlap between the brain regions from the atlas and moving volume. For visual inspection of overlap, all registered brain masks were averaged to summarize alignment accuracy as a heatmap.

Intensity Similarity: Mutual Information — MI

The first intensity-based metric used to investigate registration performance is the widely adopted Mutual Information (MI) metric that describes the statistical dependence between two random variables. If there is excellent alignment between the moving and fixed images, there will be tight clustering in the joint probability mass functions. The MI of two volumes M(x,y,z)𝑀𝑥𝑦𝑧M(x,y,z) and F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z) with PMFs of pM(i)subscript𝑝𝑀𝑖p_{M}(i) and pF(i)subscript𝑝𝐹𝑖p_{F}(i) is calculated as follows:

I(M;F)=fFmMp(M,F)(m,f)log(p(M,F)(m,f)pM(m)pF(f))𝐼𝑀𝐹subscript𝑓𝐹subscript𝑚𝑀subscript𝑝𝑀𝐹𝑚𝑓subscript𝑝𝑀𝐹𝑚𝑓subscript𝑝𝑀𝑚subscript𝑝𝐹𝑓I(M;F)=\sum_{f\in F}\sum_{m\in M}p_{(M,F)}(m,f)\log\left(\frac{p_{(M,F)}(m,f)}{p_{M}(m)p_{F}(f)}\right)(16)

where p(M,F)(m,f)subscript𝑝𝑀𝐹𝑚𝑓p_{(M,F)}(m,f) is the joint probability mass function of the intensities of the moving and fixed volumes, pM(m)subscript𝑝𝑀𝑚p_{M}(m) is the marginal probability of the moving volume intensities, and pF(f)subscript𝑝𝐹𝑓p_{F}(f) is the marginal probability for the fixed volume.

Intensity Similarity: Pearson Correlation Coefficient — r𝑟r

The Pearson Correlation Coefficient, r𝑟r, is used as the second intensity measure which quantifies the correlation between the intensities in the moving M(x,y,z)𝑀𝑥𝑦𝑧M(x,y,z) and F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z) volumes:

r(M,F)=i=1n(MiM¯)(FiF¯)i=1n(MiM¯)2i=1n(FiF¯)2𝑟𝑀𝐹superscriptsubscript𝑖1𝑛subscript𝑀𝑖¯𝑀subscript𝐹𝑖¯𝐹superscriptsubscript𝑖1𝑛superscriptsubscript𝑀𝑖¯𝑀2superscriptsubscript𝑖1𝑛superscriptsubscript𝐹𝑖¯𝐹2r(M,F)=\frac{\sum_{i=1}^{n}(M_{i}-\overline{M})(F_{i}-\overline{F})}{\sqrt{\sum_{i=1}^{n}(M_{i}-\overline{M})^{2}}\sqrt{\sum_{i=1}^{n}(F_{i}-\overline{F})^{2}}}(17)

where N𝑁N is the number of voxels in a volume, Misubscript𝑀𝑖M_{i} and Fisubscript𝐹𝑖F_{i} are the pixels from the moving and fixed volumes, and F¯¯𝐹\overline{F} and M¯¯𝑀\overline{M} are the respective volume mean intensities.

Intensity Similarity: Mean Intensity Difference — MAID

The last registration performance metric considered is the mean intensity difference, MAID, which measures the quality of a newly generated atlas A(x,y,z)𝐴𝑥𝑦𝑧A(x,y,z) compared to the original atlas (fixed volume). To create the new atlas, the moving volumes M(x,y,z)𝑀𝑥𝑦𝑧M(x,y,z) from a dataset are registered to the original atlas F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z) and then the same slices are averaged across the registered dataset generating the atlas A(x,y,z)𝐴𝑥𝑦𝑧A(x,y,z). The intensity histograms of the original F(x,y,z)𝐹𝑥𝑦𝑧F(x,y,z) and newly generated atlases A(x,y,z)𝐴𝑥𝑦𝑧A(x,y,z) are compared through the mean absolute error to get the MAID, as in

MAID(A,F)=1Nii|pA(i)pF(i)|𝑀𝐴𝐼𝐷𝐴𝐹1subscript𝑁𝑖subscript𝑖subscript𝑝𝐴𝑖subscript𝑝𝐹𝑖MAID(A,F)=\frac{1}{N_{i}}\sum_{i}|p_{A}(i)-p_{F}(i)|(18)

where pFsubscript𝑝𝐹p_{F} and pAsubscript𝑝𝐴p_{A} are the probability distributions of the intensities for the fixed (original atlas) and moving volumes (new atlas) and Nisubscript𝑁𝑖N_{i} is the maximum number of intensities. Changes to the intensity distribution of registered images could arise from poor slice alignment. The higher the similarity between the new atlas and the original, the lower the error between the two.