SITReg: Multi-resolution architecture for symmetric, inverse consistent, and topology preserving image registration

Joel Honkamaa1Orcid, Pekka Marttinen1Orcid
1: Department of Computer Science, Aalto University, Finland
Publication date: 2024/11/27
https://doi.org/10.59275/j.melba.2024-276b
PDF · Code · arXiv

Abstract

Deep learning has emerged as a strong alternative for classical iterative methods for de- formable medical image registration, where the goal is to find a mapping between the coordinate systems of two images. Popular classical image registration methods enforce the useful inductive biases of symmetricity, inverse consistency, and topology preservation by construction. However, while many deep learning registration methods encourage these properties via loss functions, no earlier methods enforce all of them by construction. Here, we propose a novel registration architecture based on extracting multi-resolution feature representations which is by construction symmetric, inverse consistent, and topology pre- serving. We also develop an implicit layer for memory efficient inversion of the deformation fields. Our method achieves state-of-the-art registration accuracy on three datasets. The code is available at https://github.com/honkamj/SITReg

Keywords

Machine Learning · Image Registration

Bibtex @article{melba:2024:026:honkamaa, title = "SITReg: Multi-resolution architecture for symmetric, inverse consistent, and topology preserving image registration", author = "Honkamaa, Joel and Marttinen, Pekka", journal = "Machine Learning for Biomedical Imaging", volume = "2", issue = "November 2024 issue", year = "2024", pages = "2148--2194", issn = "2766-905X", doi = "https://doi.org/10.59275/j.melba.2024-276b", url = "https://melba-journal.org/2024:026" }
RISTY - JOUR AU - Honkamaa, Joel AU - Marttinen, Pekka PY - 2024 TI - SITReg: Multi-resolution architecture for symmetric, inverse consistent, and topology preserving image registration T2 - Machine Learning for Biomedical Imaging VL - 2 IS - November 2024 issue SP - 2148 EP - 2194 SN - 2766-905X DO - https://doi.org/10.59275/j.melba.2024-276b UR - https://melba-journal.org/2024:026 ER -

2024:026 cover

Disclaimer: the following html version has been automatically generated and the PDF remains the reference version. Feedback can be sent directly to publishing-editor@melba-journal.org

1 Introduction

Deformable medical image registration aims at finding a mapping between coordinate systems of two images, called a deformation, to align them anatomically. Deep learning can be used to train a registration network which takes as input two images and outputs a deformation. We focus on unsupervised intra-modality registration without a ground-truth deformation and where images are of the same modality, applicable, e.g., when deforming brain MRI images from different patients to an atlas or analyzing a patient’s breathing cycle from multiple images.

Success of the most popular deep learning architectures is often seen through the desirable inductive biases imposed by suitable geometric priors (such as translation equivariance for CNNs) (Bronstein et al., 2017, 2021). In image registration priors of inverse consistency, symmetry, and topology preservation are widely considered to be useful (Sotiras et al., 2013). While some of the most popular classical methods enforce all of these properties by construction (Ashburner, 2007; Avants et al., 2008), no prior deep learning methods do, and we address this gap (see a detailed literature review in Appendix F). To clearly state our contributions, we start by defining the properties (further clarifications in Appendix J).

We define a registration method as a function f𝑓f that takes two images, say xAsubscript𝑥𝐴x_{A} and xBsubscript𝑥𝐵x_{B}, and produces a deformation. In general one can predict the deformation with any method in either direction by varying the input order, but some methods predict both forward and inverse deformations directly, and we use subscripts to indicate the direction for such methods. For example, f12subscript𝑓12f_{1\to 2} produces a deformation that aligns the image of the first argument to the image of the second argument. Note that even if f12(xA,xB)subscript𝑓12subscript𝑥𝐴subscript𝑥𝐵f_{1\to 2}(x_{A},x_{B}) and f21(xB,xA)subscript𝑓21subscript𝑥𝐵subscript𝑥𝐴f_{2\to 1}(x_{B},x_{A}) both denote a deformation for aligning image xAsubscript𝑥𝐴x_{A} to xBsubscript𝑥𝐵x_{B}, they in general can be different functions. As a result, a registration method may predict up to four different deformations for any given input pair: f12(xA,xB)subscript𝑓12subscript𝑥𝐴subscript𝑥𝐵f_{1\to 2}(x_{A},x_{B}), f21(xA,xB)subscript𝑓21subscript𝑥𝐴subscript𝑥𝐵f_{2\to 1}(x_{A},x_{B}), f12(xB,xA)subscript𝑓12subscript𝑥𝐵subscript𝑥𝐴f_{1\to 2}(x_{B},x_{A}), and f21(xB,xA)subscript𝑓21subscript𝑥𝐵subscript𝑥𝐴f_{2\to 1}(x_{B},x_{A}). If a method predicts a deformation only in one direction for a given input order, we might omit the subscript.

Inverse consistent registration methods ensure that f12(xA,xB)subscript𝑓12subscript𝑥𝐴subscript𝑥𝐵f_{1\to 2}(x_{A},x_{B}) is an accurate inverse of f21(xA,xB)subscript𝑓21subscript𝑥𝐴subscript𝑥𝐵f_{2\to 1}(x_{A},x_{B}), which we quantify using the inverse consistency error: f12(xA,xB)f21(xA,xB)2superscriptnormsubscript𝑓12subscript𝑥𝐴subscript𝑥𝐵subscript𝑓21subscript𝑥𝐴subscript𝑥𝐵2||f_{1\to 2}(x_{A},x_{B})\circ f_{2\to 1}(x_{A},x_{B})-\mathcal{I}||^{2}, where \circ is the composition operator and \mathcal{I} is the identity deformation. Inverse consistency has been enforced using a loss or by construction. However, due to a limited spatial resolution of the predicted deformations, even for the by construction inverse consistent methods the inverse consistency error is not exactly zero (Ashburner, 2007).

Refer to caption
Figure 1: Example deformation from the method. Left: Forward deformation. Middle: Inverse deformation. Right: Composition of the forward and inverse deformations. Only one 2D slice is shown of the 3D deformation. The deformation is from the LPBA40 experiment. For more detailed visualization of a predicted deformation, see Figure 10 in Appendix H.

In symmetric registration, the registration outcome does not depend on the order of the inputs, i.e., f12(xA,xB)subscript𝑓12subscript𝑥𝐴subscript𝑥𝐵f_{1\to 2}(x_{A},x_{B}) equals f21(xB,xA)subscript𝑓21subscript𝑥𝐵subscript𝑥𝐴f_{2\to 1}(x_{B},x_{A}). Unlike with inverse consistency, f12(xA,xB)subscript𝑓12subscript𝑥𝐴subscript𝑥𝐵f_{1\to 2}(x_{A},x_{B}) can equal f21(xB,xA)subscript𝑓21subscript𝑥𝐵subscript𝑥𝐴f_{2\to 1}(x_{B},x_{A}) exactly (Avants et al., 2008; Estienne et al., 2021), which we call symmetric by construction. A related property, cycle consistency, can be assessed using cycle consistency error f(xA,xB)f(xB,xA)2superscriptnorm𝑓subscript𝑥𝐴subscript𝑥𝐵𝑓subscript𝑥𝐵subscript𝑥𝐴2||f(x_{A},x_{B})\circ f(x_{B},x_{A})-\mathcal{I}||^{2}. It can be computed for any method since it does not require the method to predict deformations in both directions. If the method is symmetric by construction, inverse consistency error equals cycle consistency error.

We define topology preservation of predicted deformations similarly to Christensen et al. (1995). From the real-world point of view this means the preservation of anatomical structures, preventing non-smooth changes. Mathematically we want the deformations to be homeomorphisms, i.e., invertible and continuous. In registration literature it is common to talk about diffeomorphims which are additionally differentiable. In practice we want a deformation not to fold on top of itself which we measure by estimating the local Jacobian determinants of the predicted deformations, and checking whether they are positive.

With these definitions at hand, we summarize our main contributions as follows:

  • We propose a novel multi-resolution deep learning registration architecture which is by construction inverse consistent, symmetric, and preserves topology. The properties are fulfilled for the whole multi-resolution pipeline, not just separately for each resolution. Apart from the parallel works by (Greer et al., 2023; Zhang et al., 2023), we are not aware of other multi-resolution deep learning registration methods which are by construction both symmetric and inverse consistent. For motivation of the multi-resolution approach, see Section 2.4.

  • As a component in our architecture, we propose an implicit neural network layer, which we call deformation inversion layer, based on a well-known fixed point iteration formula (Chen et al., 2008) and recent advances in Deep Equilibrium models (Bai et al., 2019; Duvenaud et al., 2020). The layer allows memory efficient inversion of deformation fields.

  • We show that the method achieves state-of-the-art results on two brain subject-to-subject registration datasets, and state-of-the-art results for deep learning methods on inspiration-exhale registration of lung CT scans.

We name the method SITReg after its symmetricity, inverse consistency and topology preservation properties.

2 Background and preliminaries

2.1 Topology preserving registration

The diffeomorphic LDDMM framework (Cao et al., 2005) was the first approach suggested for by construction topology preserving registration. The core idea was to generate diffeomorphisms through integration of time-varying velocity fields constrained by certain differential equations. Arsigny et al. (2006) suggested more constrained but computationally cheaper stationary velocity field (SVF) formulation and it was later adopted by popular registration algorithms (Ashburner, 2007; Vercauteren et al., 2009). While some unsupervised deep learning methods do use the LDDMM approach for generating topology preserving deformations (Shen et al., 2019b; Ramon et al., 2022; Wang et al., 2023), most commonly (Chen et al., 2023) topology preservation in unsupervised deep learning is achieved using the more efficient SVF formulation, e.g. by Krebs et al. (2018, 2019); Niethammer et al. (2019); Shen et al. (2019a, b); Mok and Chung (2020a).

Another classical method by Choi and Lee (2000); Rueckert et al. (2006) generates topology preserving deformations by constraining each deformation to be diffeomorphic but small, and forming the final deformation as a composition of multiple small deformations. Since diffeomorphisms form a group under composition, the final deformation is diffeomorphic. The principle is close to a practical implementation of the SVF, where the velocity field is integrated by first scaling it down by a power of two and interpreting the result as a small deformation, which is then repeatedly composed with itself. The idea is hence similar: a composition of small deformations.

In this work we build topology preserving deformations using the same strategy, as a composition of small topology preserving deformations.

2.2 Inverse consistent registration

Originally inverse consistency was achieved via variational losses (Christensen et al., 1995) but later LDDMM and SVF frameworks allowed for inverse consistent by construction methods since the inverse can be obtained by integrating the velocity field in the opposite direction. Both approaches are found among the deep learning methods as well, as some enforce inverse consistency via a penalty (Zhang, 2018; Kim et al., 2019; Estienne et al., 2021), and many use the SVF formulation as mentioned in Section 2.1.

Compared to the earlier deep learning approaches, we take a methodologically slightly different approach of using the proposed deformation inversion layer for building an inverse consistent architecture.

2.3 Symmetric registration

Symmetric registration methods consider both images equally: swapping the input order should not change the registration outcome. Developing symmetric registration methods has a long history (Sotiras et al., 2013), but one particularly relevant for this work is a method called symmetric normalization (SyN, Avants et al., 2008) which learns two separate transformations: one for deforming the first image half-way toward the second image and the other for deforming the second image half-way toward the first image. The images are matched in the intermediate coordinates and the full deformation is obtained as a composition of the half-way deformations (one of which is inverted). The same idea was applied in deep learning setting by SYMNet (Mok and Chung, 2020a). However, SYMNet does not guarantee symmetricity by construction during inference (see Figure 7 in Appendix H). Some existing deep learning registration methods enforce cycle consistency via a penalty (Mahapatra and Ge, 2019; Gu et al., 2020; Zheng et al., 2021), and the method by Estienne et al. (2021) is symmetric by construction but only for a single component of their multi-step formulation, and not inverse consistent by construction. Recently, parallel with and unrelated to us, Iglesias (2023); Greer et al. (2023); Zhang et al. (2023) have proposed by construction symmetric and inverse consistent registration methods within the SVF framework, in a different way from us.

We use the idea of deforming the images half-way towards each other to achieve symmetry throughout our multi-resolution architecture.

2.4 Multi-resolution registration

Multi-resolution registration methods learn the deformation by first estimating it in a low resolution and then incrementally improving it while increasing the resolution. For each resolution one feeds the input images deformed with the deformation learned thus far, and incrementally composes the full deformation. Since its introduction a few decades ago (Rueckert et al., 1999; Oliveira and Tavares, 2014), the approach has been used in the top-performing classical and deep learning registration methods (Avants et al., 2008; Klein et al., 2009; Mok and Chung, 2020b, 2021; Hering et al., 2022).

In this work we propose the first multi-resolution deep learning registration architecture that is by construction symmetric, inverse consistent, and topology preserving.

2.5 Deep equilibrium networks

Deep equilibrium networks use implicit fixed point iteration layers, which have emerged as an alternative to the common explicit layers (Bai et al., 2019, 2020; Duvenaud et al., 2020). Unlike explicit layers, which produce output via an exact sequence of operations, the output of an implicit layer is defined indirectly as a solution to a fixed point equation, which is specified using a fixed point mapping. In the simplest case the fixed point mapping takes two arguments, one of which is the input. For example, let g:A×BB:𝑔𝐴𝐵𝐵g:A\times B\to B be a fixed point mapping defining an implicit layer. Then, for a given input a𝑎a, the output of the layer is the solution z𝑧z to equation

z=g(z,a).𝑧𝑔𝑧𝑎z=g(z,a).(1)

This equation is called a fixed point equation and the solution is called a fixed point solution. If g𝑔g has suitable properties, the equation can be solved iteratively by starting with an initial guess and repeatedly feeding the output as the next input to g𝑔g. More advanced iteration methods have also been developed for solving fixed point equations, such as Anderson acceleration (Walker and Ni, 2011).

The main mathematical innovation related to deep equilibrium networks is that the derivative of an implicit layer w.r.t. its inputs can be calculated based solely on a fixed point solution, i.e., no intermediate iteration values need to be stored for back-propagation. Now, given some solution (a0,z0)subscript𝑎0subscript𝑧0(a_{0},z_{0}), such that z0=g(z0,a0)subscript𝑧0𝑔subscript𝑧0subscript𝑎0z_{0}=g(z_{0},a_{0}), and assuming certain local invertibility properties for g𝑔g, the implicit function theorem says that there exists a solution mapping in the neighborhood of (a0,z0)subscript𝑎0subscript𝑧0(a_{0},z_{0}), which maps other inputs to their corresponding solutions. Let us denote the solution mapping as zsuperscript𝑧z^{*}. The solution mapping can be seen as the theoretical explicit layer corresponding to the implicit layer. To find the derivatives of the implicit layer we need to find the Jacobian of zsuperscript𝑧z^{*} at point a0subscript𝑎0a_{0} which can be obtained using implicit differentiation as

z(a0)=[I1g(z0,a0)]10g(z0,a0).superscript𝑧subscript𝑎0superscriptdelimited-[]𝐼subscript1𝑔subscript𝑧0subscript𝑎01subscript0𝑔subscript𝑧0subscript𝑎0\partial z^{*}(a_{0})=\left[I-\partial_{1}g(z_{0},a_{0})\right]^{-1}\partial_{0}g(z_{0},a_{0}).

The vector-Jacobian product of zsuperscript𝑧z^{*} needed for back-propagation can be calculated using another fixed point equation without fully computing the Jacobians, see, e.g., Duvenaud et al. (2020). Hence, both forward and backward passes of the implicit layer can be computed as a fixed point iteration.

We use these ideas to develop a neural network layer for inverting deformations based on the fixed point equation, following Chen et al. (2008). The layer is very memory efficient as only the fixed point solution needs to be stored for the backward pass.

3 Methods

Let n𝑛n denote the dimensionality of the image, e.g., n=3𝑛3n=3 for 3D medical images, and k𝑘k the number of channels, e.g., k=3𝑘3k=3 for an RGB-image. The goal in deformable image registration is to find a mapping from nsuperscript𝑛\mathbb{R}^{n} to nsuperscript𝑛\mathbb{R}^{n}, connecting the coordinate systems of two non-aligned images xA,xB:nk:subscript𝑥𝐴subscript𝑥𝐵superscript𝑛superscript𝑘x_{A},x_{B}:\mathbb{R}^{n}\to\mathbb{R}^{k}, called a deformation. Application of a deformation to an image can be mathematically represented as a (function) composition of the image and the deformation, denoted by \circ. Furthermore, in practice linear interpolation is used to represent images (and deformations) in continuous coordinates.

In this work the deformations are in practice stored as displacement fields with the same resolution as the registered images, that is, each pixel or voxel is associated with a displacement vector describing the coordinate difference between the original image and the deformed image (e.g. if n=3𝑛3n=3, displacement field is tensor with shape 3×H×W×D3𝐻𝑊𝐷3\times H\times W\times D where H×W×D𝐻𝑊𝐷H\times W\times D is the shape of the image). In our notation we equate the displacement fields with the corresponding coordinate mappings, and always use \circ to denote the deformation operation (sometimes called warping).

In deep learning based image registration, we aim at learning a neural network f𝑓f that takes two images as input and outputs a mapping between the image coordinates. Specifically, in medical context f𝑓f should be such that xAf(xA,xB)subscript𝑥𝐴𝑓subscript𝑥𝐴subscript𝑥𝐵x_{A}\circ f(x_{A},x_{B}) matches anatomically with xBsubscript𝑥𝐵x_{B}.

3.1 Symmetric formulation

As discussed in Section 1, we want our method to be cycle consistent. That is, since in the ideal case of f𝑓f finding the correct coordinate mapping between any given inputs xAsubscript𝑥𝐴x_{A} and xBsubscript𝑥𝐵x_{B}, f(xA,xB)𝑓subscript𝑥𝐴subscript𝑥𝐵f(x_{A},x_{B}) and f(xB,xA)𝑓subscript𝑥𝐵subscript𝑥𝐴f(x_{B},x_{A}) should be inverses of each other. Hence enforcing such a property by construction should provide a useful constraint on the optimization space. To achieve this, we define the network f𝑓f using another auxiliary network u𝑢u which also predicts deformations:

f(xA,xB):=u(xA,xB)u(xB,xA)1.assign𝑓subscript𝑥𝐴subscript𝑥𝐵𝑢subscript𝑥𝐴subscript𝑥𝐵𝑢superscriptsubscript𝑥𝐵subscript𝑥𝐴1f(x_{A},x_{B}):=u(x_{A},x_{B})\circ u(x_{B},x_{A})^{-1}.(2)

By defining f𝑓f this way, it holds by construction that f(xA,xB)=f(xB,xA)1𝑓subscript𝑥𝐴subscript𝑥𝐵𝑓superscriptsubscript𝑥𝐵subscript𝑥𝐴1f(x_{A},x_{B})=f(x_{B},x_{A})^{-1} apart from small numerical errors introduced by the composition and inversion. An additional benefit is that f(xA,xA)𝑓subscript𝑥𝐴subscript𝑥𝐴f(x_{A},x_{A}) equals the identity transformation, again apart from numerical inaccuracies, which is a natural requirement for a registration method. Applying the formulation in Equation 2 naively would double the computational cost. To avoid this we encode features from the inputs separately before feeding them to the deformation extraction network in Equation 2. A similar approach has been used in recent registration methods (Estienne et al., 2021; Young et al., 2022). Denoting the feature extraction network by hh, the modified formulation is

f(xA,xB):=u(h(xA),h(xB))u(h(xB),h(xA))1.assign𝑓subscript𝑥𝐴subscript𝑥𝐵𝑢subscript𝑥𝐴subscript𝑥𝐵𝑢superscriptsubscript𝑥𝐵subscript𝑥𝐴1f(x_{A},x_{B}):=u(h(x_{A}),h(x_{B}))\circ u(h(x_{B}),h(x_{A}))^{-1}.(3)

3.2 Multi-resolution architecture

Refer to caption
Figure 2: Overview of the proposed architecture. Multi-resolution features are first extracted from the inputs xAsubscript𝑥𝐴x_{A} and xBsubscript𝑥𝐵x_{B} using convolutional encoder hh. Output deformations f12(xA,xB)subscript𝑓12subscript𝑥𝐴subscript𝑥𝐵f_{1\to 2}(x_{A},x_{B}) and f21(xA,xB)subscript𝑓21subscript𝑥𝐴subscript𝑥𝐵f_{2\to 1}(x_{A},x_{B}) are built recursively from the multi-resolution features using the symmetric deformation updates described in Section 3.2 and visualized in Figure 3. The architecture is symmetric and inverse consistent with respect to the inputs and the final deformation is obtained in both directions. The brain images are from the OASIS dataset (Marcus et al., 2007)

As the overarching architecture, we propose a novel symmetric and inverse consistent multi-resolution coarse-to-fine approach. For motivation, see Section 2.4. Overview of the architecture is shown in Figure 2, and the prediction process is demonstrated visually in Figure 10 (Appendix H).

First, we extract image feature representations h(k)(xA),h(k)(xB)superscript𝑘subscript𝑥𝐴superscript𝑘subscript𝑥𝐵h^{(k)}(x_{A}),h^{(k)}(x_{B}), at different resolutions k{0,,K1}𝑘0𝐾1k\in\{0,\dots,K-1\}. Index k=0𝑘0k=0 is the original resolution and increasing k𝑘k by one halves the spatial resolution. In practice hh is a ResNet (He et al., 2016) style convolutional network and features at each resolution are extracted sequentially from previous features. Starting from the lowest resolution k=K1𝑘𝐾1k=K-1, we recursively build the final deformation between the inputs using the extracted representations. To ensure symmetry, we build two deformations: one deforming the first image half-way towards the second image, and the other for deforming the second image half-way towards the first image (see Section 2.3). The full deformation is composed of these at the final stage. Let us denote the half-way deformations extracted at resolution k𝑘k as d11.5(k)superscriptsubscript𝑑11.5𝑘d_{1\to 1.5}^{(k)} and d21.5(k)superscriptsubscript𝑑21.5𝑘d_{2\to 1.5}^{(k)}. Initially, at level k=K𝑘𝐾k=K, these are identity deformations. Then, at each k=K1,,0𝑘𝐾10k=K-1,\ldots,0, the half-way deformations are updated by composing them with a predicted update deformation. In detail, the update at level k𝑘k consists of three steps (visualized in Figure 3):

  1. 1.

    Deform the feature representations h(k)(xA),h(k)(xB)superscript𝑘subscript𝑥𝐴superscript𝑘subscript𝑥𝐵h^{(k)}(x_{A}),h^{(k)}(x_{B}) of level k𝑘k towards each other by the half-way deformations from the previous level k+1𝑘1k+1:

    z1(k):=h(k)(xA)d11.5(k+1)andz2(k):=h(k)(xB)d21.5(k+1).formulae-sequenceassignsuperscriptsubscript𝑧1𝑘superscript𝑘subscript𝑥𝐴superscriptsubscript𝑑11.5𝑘1andassignsuperscriptsubscript𝑧2𝑘superscript𝑘subscript𝑥𝐵superscriptsubscript𝑑21.5𝑘1z_{1}^{(k)}:=h^{(k)}(x_{A})\circ d_{1\to 1.5}^{(k+1)}\ \quad\text{and}\quad\ z_{2}^{(k)}:=h^{(k)}(x_{B})\circ d_{2\to 1.5}^{(k+1)}.(4)

    Note that the deformations have a higher (or the same when k=0𝑘0k=0) spatial resolution than the deformed feature volumes, and in practice we first downsample (with linear interpolation) the deformation to the resolution of the feature volume, and then resample the feature volume based on the downsampled deformation.

  2. 2.

    Define an update deformation δ(k)superscript𝛿𝑘\delta^{(k)}, using the idea from Equation 3 and the half-way deformed feature representations z1(k)superscriptsubscript𝑧1𝑘z_{1}^{(k)} and z2(k)superscriptsubscript𝑧2𝑘z_{2}^{(k)}:

    δ(k):=u(k)(z1(k),z2(k))u(k)(z2(k),z1(k))1.assignsuperscript𝛿𝑘superscript𝑢𝑘superscriptsubscript𝑧1𝑘superscriptsubscript𝑧2𝑘superscript𝑢𝑘superscriptsuperscriptsubscript𝑧2𝑘superscriptsubscript𝑧1𝑘1\delta^{(k)}:=u^{(k)}(z_{1}^{(k)},z_{2}^{(k)})\circ u^{(k)}(z_{2}^{(k)},z_{1}^{(k)})^{-1}.(5)

    Here, u(k)superscript𝑢𝑘u^{(k)} is a trainable convolutional neural network predicting an invertible auxiliary deformation (details in Section 3.4). The intuition here is that the symmetrically predicted update deformation δ(k)superscript𝛿𝑘\delta^{(k)} should learn to adjust for whatever differences in the image features remain after deforming them half-way towards each other in Step 1 with deformations d(k+1)superscript𝑑𝑘1d^{(k+1)} from the previous resolution.

  3. 3.

    Obtain the updated half-way deformation d11.5(k)superscriptsubscript𝑑11.5𝑘d_{1\to 1.5}^{(k)} by composing the earlier half-way deformation of level k+1𝑘1k+1 with the update deformation δ(k)superscript𝛿𝑘\delta^{(k)}

    d11.5(k):=d11.5(k+1)δ(k).assignsuperscriptsubscript𝑑11.5𝑘superscriptsubscript𝑑11.5𝑘1superscript𝛿𝑘d_{1\to 1.5}^{(k)}:=d_{1\to 1.5}^{(k+1)}\ \circ\ \delta^{(k)}.(6)

    For the other direction d21.5(k)superscriptsubscript𝑑21.5𝑘d_{2\to 1.5}^{(k)}, we use the inverse of the deformation update (δ(k))1superscriptsuperscript𝛿𝑘1\left(\delta^{(k)}\right)^{-1} which can be obtained simply by reversing z1(k)superscriptsubscript𝑧1𝑘z_{1}^{(k)} and z2(k)superscriptsubscript𝑧2𝑘z_{2}^{(k)} in Equation 5 (see Figure 3):

    d21.5(k)=d21.5(k+1)(δ(k))1.superscriptsubscript𝑑21.5𝑘superscriptsubscript𝑑21.5𝑘1superscriptsuperscript𝛿𝑘1d_{2\to 1.5}^{(k)}\\ =d_{2\to 1.5}^{(k+1)}\ \circ\ \left(\delta^{(k)}\right)^{-1}.(7)

    The inverses (d11.5(k))1superscriptsuperscriptsubscript𝑑11.5𝑘1\left(d_{1\to 1.5}^{(k)}\right)^{-1} and (d21.5(k))1superscriptsuperscriptsubscript𝑑21.5𝑘1\left(d_{2\to 1.5}^{(k)}\right)^{-1} are updated similarly.

The full registration architecture is then defined by the functions f12subscript𝑓12f_{1\to 2} and f21subscript𝑓21f_{2\to 1} which compose the half-way deformations from stage k=0𝑘0k=0:

f12(xA,xB):=d11.5(0)(d21.5(0))1andf21(xA,xB):=d21.5(0)(d11.5(0))1.formulae-sequenceassignsubscript𝑓12subscript𝑥𝐴subscript𝑥𝐵superscriptsubscript𝑑11.50superscriptsuperscriptsubscript𝑑21.501andassignsubscript𝑓21subscript𝑥𝐴subscript𝑥𝐵superscriptsubscript𝑑21.50superscriptsuperscriptsubscript𝑑11.501f_{1\to 2}(x_{A},x_{B}):=d_{1\to 1.5}^{(0)}\circ\left(d_{2\to 1.5}^{(0)}\right)^{-1}\ \quad\text{and}\ \quad f_{2\to 1}(x_{A},x_{B}):=d_{2\to 1.5}^{(0)}\circ\left(d_{1\to 1.5}^{(0)}\right)^{-1}.(8)

Note that d11.5(0)superscriptsubscript𝑑11.50d_{1\to 1.5}^{(0)}, d21.5(0)superscriptsubscript𝑑21.50d_{2\to 1.5}^{(0)}, and their inverses are functions of xAsubscript𝑥𝐴x_{A} and xBsubscript𝑥𝐵x_{B} through the features h(k)(xA),h(k)(xB)superscript𝑘subscript𝑥𝐴superscript𝑘subscript𝑥𝐵h^{(k)}(x_{A}),h^{(k)}(x_{B}) in Equation 4, but the dependence is suppressed in the notation for clarity.

Refer to caption
Figure 3: Recursive multi-resolution deformation update. The deformation update at resolution k𝑘k, described in Section 3.2, takes as input the half-way deformations d11.5(k+1)superscriptsubscript𝑑11.5𝑘1d_{1\to 1.5}^{(k+1)} and d21.5(k+1)superscriptsubscript𝑑21.5𝑘1d_{2\to 1.5}^{(k+1)} from the previous resolution, and updates them through a composition with an update deformation δ(k)superscript𝛿𝑘\delta^{(k)}. The update deformation δ(k)superscript𝛿𝑘\delta^{(k)} is calculated symmetrically from image features z1(k)superscriptsubscript𝑧1𝑘z_{1}^{(k)} and z2(k)superscriptsubscript𝑧2𝑘z_{2}^{(k)} (deformed mid-way towards each other with the previous half-way deformations) using a neural network u(k)superscript𝑢𝑘u^{(k)} according to Equation 5. The deformation inversion layer for inverting auxiliary deformations predicted by u(k)superscript𝑢𝑘u^{(k)} is described in Section 3.3.

By using half-way deformations at each stage, we avoid the problem with full deformations of having to select either of the image coordinates to which to deform the feature representations of the next stage, breaking the symmetry of the architecture. Now we can instead deform the feature representations of both inputs by the symmetrically predicted half-way deformations, which ensures that the updated deformations after each stage are separately invariant to input order.

3.3 Implicit deformation inversion layer

Implementing the architecture requires inverting deformations from u(k)superscript𝑢𝑘u^{(k)} in Equation 5. This could be done, e.g., with the SVF framework, but we propose an approach which requires storing 5absent5\approx 5 times less data for the backward pass than the standard SVF. The memory saving is significant due to the high memory consumption of volumetric data, allowing larger images to be registered. During each forward pass 2×(K1)2𝐾12\times(K-1) inversions are required. More details are provided in Appendix E.

As shown by Chen et al. (2008), deformations can be inverted in certain cases by a fixed point iteration. Consequently, we propose to use the deep equilibrium network framework from Section 2.5 for inverting deformations, and label the resulting layer deformation inversion layer. The fixed point equation proposed by Chen et al. (2008) is

g(z,a):=(a)z+,assign𝑔𝑧𝑎𝑎𝑧g(z,a):=-(a-\mathcal{I})\circ z+\mathcal{I},

where a𝑎a is the deformation to be inverted, z𝑧z is the candidate for the inverse of a𝑎a, and \mathcal{I} is the identity deformation. It is easy to see that substituting a1superscript𝑎1a^{-1} for z𝑧z, yields a1superscript𝑎1a^{-1} as output. We use Anderson acceleration (Walker and Ni, 2011) for solving the fixed point equation and use the memory-effecient back-propagation (Bai et al., 2019; Duvenaud et al., 2020) strategy discussed in Section 2.5.

Lipschitz condition is sufficient for the fixed point algorithm to converge (Chen et al., 2008), and we ensure that the predicted deformations fulfill the condition (see Section 3.4). The iteration converges well also in practice as shown in Appendix G.

3.4 Topology preserving deformation prediction networks

Each u(k)superscript𝑢𝑘u^{(k)} predicts a deformation based on the features z1(k)superscriptsubscript𝑧1𝑘z_{1}^{(k)} and z2(k)superscriptsubscript𝑧2𝑘z_{2}^{(k)} and we define the networks u(k)superscript𝑢𝑘u^{(k)} as CNNs predicting cubic spline control point grid in the resolution of the features z1(k)superscriptsubscript𝑧1𝑘z_{1}^{(k)} and z2(k)superscriptsubscript𝑧2𝑘z_{2}^{(k)}. The cubic spine control point grid can then be used to generate the displacement field representing the deformation. The use of cubic spline control point grid for representing deformations is a well-known strategy in image registration, see e.g. (Rueckert et al., 2006; De Vos et al., 2019).

The deformations generated by u(k)superscript𝑢𝑘u^{(k)} have to be invertible to ensure the topology preservation property, and particularly invertible by the deformation inversion layer. To ensure that, we limit the control point absolute values below a hard constraint γ(k)superscript𝛾𝑘\gamma^{(k)} using a scaled TanhTanh\operatorname{Tanh} function.

In more detail, each u(k)superscript𝑢𝑘u^{(k)} consists of the following sequential steps:

  1. 1.

    Concatenation of the two inputs, z1(k)superscriptsubscript𝑧1𝑘z_{1}^{(k)} and z2(k)superscriptsubscript𝑧2𝑘z_{2}^{(k)}, along the channel dimension. Before concatenation we reparametrize the features as z1(k)z2(k)superscriptsubscript𝑧1𝑘superscriptsubscript𝑧2𝑘z_{1}^{(k)}-z_{2}^{(k)} and z1(k)+z2(k)superscriptsubscript𝑧1𝑘superscriptsubscript𝑧2𝑘z_{1}^{(k)}+z_{2}^{(k)} as suggested by Young et al. (2022).

  2. 2.

    Any number of spatial resolution preserving (stride=1stride1\text{stride}=1 and padding=samepaddingsame\text{padding}=\text{same}) convolutions with an activation after each of the convolutions except the final one. The number of output channels of the final convolutions should equal the dimensionality (333 in our experiments).

  3. 3.

    γ(k)×Tanhsuperscript𝛾𝑘Tanh\gamma^{(k)}\times\operatorname{Tanh} function

  4. 4.

    Cubic spline upsampling to the image resolution by interpreting the output of the step 3 as cubic spline control point grid, similarly to e.g. De Vos et al. (2019). Cubic spline upsampling can be effeciently implemented as one dimensional transposed convolutions along each axis.

As shown in Appendix D, the optimal upper bound for γ(k)superscript𝛾𝑘\gamma^{(k)} ensuring invertibility by the deformation inversion layer but also in general, can be obtained by the formula γ(k)<1Knksuperscript𝛾𝑘1subscriptsuperscript𝐾𝑘𝑛\gamma^{(k)}<\frac{1}{K^{{k}}_{n}} where

Kn(k):=maxxXαn|j=1nB(xj+12kαj)B(xjαj)1/2kiN{j}B(xiαi)|,assignsubscriptsuperscript𝐾𝑘𝑛subscript𝑥𝑋subscript𝛼superscript𝑛superscriptsubscript𝑗1𝑛𝐵subscript𝑥𝑗1superscript2𝑘subscript𝛼𝑗𝐵subscript𝑥𝑗subscript𝛼𝑗1superscript2𝑘subscriptproduct𝑖𝑁𝑗𝐵subscript𝑥𝑖subscript𝛼𝑖K^{(k)}_{n}:=\max_{\begin{subarray}{l}x\in X\end{subarray}}\ \sum_{\alpha\in\mathbb{Z}^{n}}\left|\sum_{j=1}^{n}\frac{B(x_{j}+\frac{1}{2^{k}}-\alpha_{j})-B(x_{j}-\alpha_{j})}{1/2^{k}}\prod_{i\in N\setminus\{j\}}B(x_{i}-\alpha_{i})\right|,(9)

n𝑛n is the dimensionality of the images (in this work n=3𝑛3n=3), X:={12+12k+1+i2k|i}n[0,1]nassign𝑋superscriptconditional-set121superscript2𝑘1𝑖superscript2𝑘𝑖𝑛superscript01𝑛X:=\{\frac{1}{2}+\frac{1}{2^{k+1}}+\frac{i}{2^{k}}\ |\ i\in\mathbb{Z}\}^{n}\cap[0,1]^{n} are the relative sampling positions used in cubic spline upsampling (imlementation detail), and B𝐵B is a centered cardinal cubic B-spline (symmetric function with finite support). In practice we define γ(k):=0.99×1Kn(k)assignsuperscript𝛾𝑘0.991subscriptsuperscript𝐾𝑘𝑛\gamma^{(k)}:=0.99\times\frac{1}{K^{(k)}_{n}}.

The formula can be evaluated exactly for dimensions n=2,3𝑛23n=2,3 for any practical number of resolution levels (for concrete values, see Table 16 in Appendix D). Note that the outer sum over nsuperscript𝑛\mathbb{Z}^{n} is finite since B𝐵B has a finite support and hence only a finite number of terms are non-zero.

The strategy of limiting the absolute values of cubic spline control points to ensure invertibility is similar to the one taken by Rueckert et al. (2006) based on the proof by Choi and Lee (2000). However, our mathematical proof additionally covers the convergence by the deformation inversion layer, and provides an exact bound even in our discretised setting (see Appendix D for more details).

3.5 Theoretical properties

Theorem 1

The proposed architecture is inverse consistent by construction.

Theorem 2

The proposed architecture is symmetric by construction.

Theorem 3

The proposed architecture is topology preserving.

Proof  Appendix C, including discussion on numerical errors caused by limited sampling resolution.

3.6 Training and implementation

We train the model in an unsupervised end-to-end manner similarly to most other unsupervised registration methods, by using similarity and deformation regularization losses. The similarity loss encourages deformed images to be similar to the target images, and the regularity loss encourages desirable properties, such as smoothness, on the predicted deformations. For similarity we use local normalized cross-correlation with window width 7 and for regularization we use L2superscript𝐿2L^{2} penalty on the gradients of the displacement fields, identically to VoxelMorph (Balakrishnan et al., 2019). We apply the losses in both directions to maintain symmetry. One could apply the losses in the intermediate coordinates and avoid building the full deformations during training. The final loss is:

=NCC(xAd12,xB)+NCC(xA,xBd21)+λ×[Grad(d12)+Grad(d21)],NCCsubscript𝑥𝐴subscript𝑑12subscript𝑥𝐵NCCsubscript𝑥𝐴subscript𝑥𝐵subscript𝑑21𝜆delimited-[]Gradsubscript𝑑12Gradsubscript𝑑21\mathcal{L}=\operatorname{NCC}(x_{A}\circ d_{1\to 2},\ x_{B})+\operatorname{NCC}(x_{A},\ x_{B}\circ d_{2\to 1})+\lambda\times\left[\operatorname{Grad}(d_{1\to 2})+\operatorname{Grad}(d_{2\to 1})\right],(10)

where d12:=f12(xA,xB)assignsubscript𝑑12subscript𝑓12subscript𝑥𝐴subscript𝑥𝐵d_{1\to 2}:=f_{1\to 2}(x_{A},x_{B}), d21:=f21(xA,xB)assignsubscript𝑑21subscript𝑓21subscript𝑥𝐴subscript𝑥𝐵d_{2\to 1}:=f_{2\to 1}(x_{A},x_{B}), NCCNCC\operatorname{NCC} the local normalized cross-correlation loss, GradGrad\operatorname{Grad} the L2superscript𝐿2L^{2} penalty on the gradients of the displacement fields, and λ𝜆\lambda is the regularization weight. For details on hyperparameter selection, see Appendix A. Our implementation is in PyTorch (Paszke et al., 2019), and is available at  https://github.com/honkamj/SITReg. Evaluation methods and preprocessing done by us, see Section 4, are included.

3.7 Inference

We consider two variants: Standard: The final deformation is formed by iteratively resampling at each image resolution (common approach). Complete: All individual deformations (outputs of u(k)superscript𝑢𝑘u^{(k)}) are stored in memory and the final deformation is their true composition. The latter is included only to demonstrate that the deformation is everywhere invertible (no negative determinants) without numerical sampling errors, but the first one is used unless stated otherwise, and perfect invertibility is not necessary in practice. Due to limited sampling resolution even the existing ”diffeomorphic” registration frameworks such as SVF do not usually achieve perfect invertibility.

4 Experimental setup

We evaluate our method on subject-to-subject registration of brain MRI images.

4.1 Datasets

We evaluate our method on two tasks: subject-to-subject registration of brain images and on inspiration-exhale registration of lung CT scans. The latter is considered challenging for deep learning methods, and classical optimization based methods remain state-of-the-art (Falta et al., 2024).

We use two subject-to-subject registration datasets and evaluate on both of them separately: OASIS brains dataset with 414 T1-weighted brain MRI images (Marcus et al., 2007) as pre-processed for Learn2Reg challenge (Hoopes et al., 2021; Hering et al., 2022) 111https://www.oasis-brains.org/#access , and LPBA40 dataset from University of California Laboratory of Neuro Imaging (USC LONI) with 40 brain MRI images (Shattuck et al., 2008)222https://resource.loni.usc.edu/resources/atlases/license-agreement/

Pre-processing for both brain subject-to-subject datasets includes bias field correction, normalization, and cropping. For OASIS dataset we use affinely pre-aligned images and for LPBA40 dataset we use rigidly pre-aligned images. Additionally we train the models without any pre-alignment on OASIS data (OASIS raw) to compare the methods with larger initial displacements. We crop the images in affinely pre-aligned OASIS dataset to 144×192×160144192160144\times 192\times 160 resolution and in LPBA40 dataset to 160×192×160160192160160\times 192\times 160 resolution. Images in raw OASIS dataset have resolution 256×256×256256256256256\times 256\times 256 and we do not crop the images. Voxel sizes of the affinely aligned and raw datasets are the same. We split the OASIS dataset into 255255255, 202020 and 139139139 images for training, validation, and testing. The split differs from the Learn2Reg challenge since the test set is not available, but sizes correspond to the splits used by Mok and Chung (2020a, b, 2021). We used all image pairs for testing and validation, yielding 959195919591 test and 190190190 validation pairs. LPBA40 dataset is much smaller and we split it into 252525, 555 and 101010 images for training, validation, and testing. This leaves us with 101010 pairs for validation and 454545 for testing.

For inspiration-exhale registration of lung CT scans we use a recently published Lung250M-4B dataset by Falta et al. (2024). The dataset is based on multiple earlier datasets: DIR-LAB COPDgene (Castillo et al., 2013), EMPIRE10 (Murphy et al., 2011), L2R-LungCT dataset from Learn2reg challenge (Hering et al., 2022), The National Lung Screening Trial (NLST) dataset (NLST Research Team, 2011, 2013; Clark et al., 2013), TCIA-Ventilation dataset (Clark et al., 2013; Eslick et al., 2018, 2022), and TCIA-NSCLC dataset (Clark et al., 2013; Hugo et al., 2016, 2017; Balik et al., 2013; Roman et al., 2012). In total the dataset has 97, 18, and 9 pairs for training, testing, and validation. However, we can not use 12 image pairs from EMPIRE10 dataset due terms of use, 10 of which are part of the training set, and 2 of which are part of the validation set. The test set is not affected. For evaluation the data set includes manually placed landmarks on the validation and the test image pairs. In addition, the dataset contains automatically generated landmarks for training pairs but we want to stay in the unsupervised setting and do not use those for training. We use lung masked images for training, and downsample them to isotropic 222 mm resolution, normalize and clip them to range [1024,0]10240[-1024,0], and crop the volumes to the shape of 160×112×160160112160160\times 112\times 160.

4.2 Evaluation metrics

We evaluate brain subject-to-subject registration accuracy using segmentations of brain structures included in the datasets: (353535 structures for OASIS and 565656 for LPBA40), and two metrics: Dice score (Dice) and 95% quantile of the Hausdorff distances (HD95), similarly to Learn2Reg challenge (Hering et al., 2022). Dice score measures the overlap of the segmentations of source images deformed by the method and the segmentations of target images, and HD95 measures the distance between the surfaces of the segmentations.

For evaluating the inspiration-exhale registration of lung CT scans we use mean distance between landmark pairs after registration, denoted as target registration error (TRE). However, comparing methods only based on the overlap of anatomic regions or landmarks is insufficient (Pluim et al., 2016; Rohlfing, 2011), and hence also deformation regularity should be measured, for which we use conventional metrics based on the local Jacobian determinants at 106superscript10610^{6} sampled locations in each volume. The local derivatives were estimated via small perturbations of 107superscript10710^{-7} voxels. We measure topology preservation as the proportion of the locations with a negative determinant (%percent\% of |Jϕ|0subscriptsubscript𝐽italic-ϕabsent0|J_{\phi}|_{\leq 0}), and deformation smoothness as the standard deviation of the determinant (std(|Jϕ|)stdsubscript𝐽italic-ϕ\operatorname{std}(|J_{\phi}|)). Additionally we report inverse and cycle consistency errors, see Section 1.

4.3 Baseline methods

We compare against VoxelMorph (Balakrishnan et al., 2019), SYMNet (Mok and Chung, 2020a), conditional LapIRN (cLapIRN) (Mok and Chung, 2020b, 2021), and ByConstructionICON (Greer et al., 2023). VoxelMorph is a standard baseline in deep learning based unsupervised registration. With SYMNet we are interested in how well our method preserves topology and how accurate the generated inverse deformations are compared to the SVF based methods. Additionally, since SYMNet is symmetric from the loss point of view, it is interesting to see how symmetric predictions it produces in practice. cLapIRN was the best method on OASIS dataset in Learn2Reg 2021 challenge (Hering et al., 2022). ByConstructionICON is a parallel work to ours developing a multi-step inverse consistent, symmetric, and topology preserving deep learning registration method based on SVF formulation. We used the official implementations333https://github.com/voxelmorph/voxelmorph444https://github.com/cwmok/Fast-Symmetric-Diffeomorphic-Image-Registration-with-Convolutional-Neural-Networks555https://github.com/cwmok/Conditional_LapIRN/666https://github.com/uncbiag/ByConstructionICON/ adjusted to our datasets. SYMNet uses anti-folding loss to penalize negative determinant. Since this loss is a separate component that could be easily used with any method, we also train SYMNet without it, denoted SYMNet (simple) for two of our four datasets. This provides a comparison on how well the vanilla SVF framework can generate invertible deformations in comparison to our method. For details on hyperparameter selection for baseline models, see Appendix B.

4.4 Ablation study

To study the usefulness of the symmetric formulation introduced in Section 3.1, we also conduct the experiments without it while keeping the architecture otherwise identical. In more detail, we change Equation 5 into the following form:

δ(k):=u(k)(z1(k),z2(k))u(k)(z1(k),z2(k)).assignsuperscript𝛿𝑘superscript𝑢𝑘superscriptsubscript𝑧1𝑘superscriptsubscript𝑧2𝑘superscript𝑢𝑘superscriptsubscript𝑧1𝑘superscriptsubscript𝑧2𝑘\delta^{(k)}:=u^{(k)}(z_{1}^{(k)},z_{2}^{(k)})\circ u^{(k)}(z_{1}^{(k)},z_{2}^{(k)}).(11)

For the inverse update (not necessarily inverse anymore despite the notation) we then simply use

(δ(k))1:=u(k)(z2(k),z1(k))u(k)(z2(k),z1(k)).assignsuperscriptsuperscript𝛿𝑘1superscript𝑢𝑘superscriptsubscript𝑧2𝑘superscriptsubscript𝑧1𝑘superscript𝑢𝑘superscriptsubscript𝑧2𝑘superscriptsubscript𝑧1𝑘\left(\delta^{(k)}\right)^{-1}:=u^{(k)}(z_{2}^{(k)},z_{1}^{(k)})\circ u^{(k)}(z_{2}^{(k)},z_{1}^{(k)}).(12)

The resulting architecture is still topology preserving but no longer inverse or cycle consistent by construction. We refer to the architecture as SITReg (non-sym)

5 Results

Evaluation results for the affinely pre-aligned OASIS dataset are shown in Table 1, the OASIS raw dataset in Table 2., the LPBA40 dataset in Table 3, and for the Lung250M-4B dataset in Table 4. Figure 4 visualizes differences in deformation regularity on OASIS dataset, and additional visualizations are available in Appendix H. A comparison of the methods’ inference time efficiencies on OASIS dataset are shown in Table 5.

Table 1: Results, OASIS dataset. Mean and standard deviation of each metric are computed on the test set. The percentage of folding voxels (%percent\% of |Jϕ|0subscriptsubscript𝐽italic-ϕabsent0|J_{\phi}|_{\leq 0}) from the complete SITReg version is shown in blue, other results are with the standard version (see Section 3.7). VoxelMorph and cLapIRN do not predict inverse deformations and hence the inverse-consistency error is not shown.
AccuracyDeformation regularityConsistency
ModelDice \uparrowHD95 \downarrow%percent\% of |Jϕ|0subscriptsubscript𝐽italic-ϕabsent0absent|J_{\phi}|_{\leq 0}\downarrowstd(|Jϕ|)stdsubscript𝐽italic-ϕabsent\operatorname{std}(|J_{\phi}|)\downarrowCycle \downarrowInverse \downarrow
SYMNet (original)0.788(0.029)0.7880.0290.788(0.029)2.15(0.57)2.150.572.15(0.57)1.5𝐞𝟑(4.3e4)1.5𝐞34.3e4\bm{{1.5}\mathrm{e}{-3}}({4.3}\mathrm{e}{-4})0.44(0.039)0.440.039\bm{0.44}(0.039)3.0e1(2.9e2)3.0e12.9e2{3.0}\mathrm{e}{-1}({2.9}\mathrm{e}{-2})3.5𝐞𝟑(4.2e4)3.5𝐞34.2e4\bm{{3.5}\mathrm{e}{-3}}({4.2}\mathrm{e}{-4})
SYMNet (simple)0.787(0.029)0.7870.0290.787(0.029)2.17(0.58)2.170.582.17(0.58)1.5e2(3.1e3)1.5e23.1e3{1.5}\mathrm{e}{-2}({3.1}\mathrm{e}{-3})0.46(0.045)0.460.0450.46(0.045)2.8e1(2.8e2)2.8e12.8e2{2.8}\mathrm{e}{-1}({2.8}\mathrm{e}{-2})5.2e3(8.4e4)5.2e38.4e4{5.2}\mathrm{e}{-3}({8.4}\mathrm{e}{-4})
VoxelMorph0.803(0.031)0.8030.0310.803(0.031)2.08(0.57)2.080.572.08(0.57)1.4e1(9.4e2)1.4e19.4e2{1.4}\mathrm{e}{-1}({9.4}\mathrm{e}{-2})0.49(0.032)0.490.0320.49(0.032)4.5e1(5.3e2)4.5e15.3e2{4.5}\mathrm{e}{-1}({5.3}\mathrm{e}{-2})-
cLapIRN0.812(0.027)0.8120.0270.812(0.027)1.93(0.50)1.930.501.93(0.50)1.1e0(2.1e1)1.1e02.1e1{1.1}\mathrm{e}{0}({2.1}\mathrm{e}{-1})0.55(0.032)0.550.0320.55(0.032)1.2e0(1.6e1)1.2e01.6e1{1.2}\mathrm{e}{0}({1.6}\mathrm{e}{-1})-
ByConstructionICON0.813(0.022)0.8130.0220.813(0.022)1.83(0.42)1.830.421.83(0.42)2.3e2(6.4e3)2.3e26.4e3{2.3}\mathrm{e}{-2}({6.4}\mathrm{e}{-3})0.48(0.069)0.480.0690.48(0.069)5.3𝐞𝟑(1.5e3)5.3𝐞31.5e3\bm{{5.3}\mathrm{e}{-3}}({1.5}\mathrm{e}{-3})5.3e3(1.5e3)5.3e31.5e3{5.3}\mathrm{e}{-3}({1.5}\mathrm{e}{-3})
SITReg0.818(0.025)0.818superscript0.0250.818(0.025)^{*}1.84(0.45)1.840.451.84(0.45)8.1e3(1.6e3)/𝟎(0)8.1e31.6e300{8.1}\mathrm{e}{-3}({1.6}\mathrm{e}{-3})/\color[rgb]{0,0,1}{\bm{0}(0)}0.45(0.038)0.450.0380.45(0.038)5.5e3(6.9e4)5.5e36.9e4{5.5}\mathrm{e}{-3}({6.9}\mathrm{e}{-4})5.5e3(6.9e4)5.5e36.9e4{5.5}\mathrm{e}{-3}({6.9}\mathrm{e}{-4})
SITReg (non-sym)0.819(0.024)0.819superscript0.024\bm{0.819}(0.024)^{*}1.82(0.44)1.820.44\bm{1.82}(0.44)1.7e2(4.5e3)1.7e24.5e3{1.7}\mathrm{e}{-2}({4.5}\mathrm{e}{-3})0.50(0.037)0.500.0370.50(0.037)1.2e1(1.4e2)1.2e11.4e2{1.2}\mathrm{e}{-1}({1.4}\mathrm{e}{-2})1.2e1(1.4e2)1.2e11.4e2{1.2}\mathrm{e}{-1}({1.4}\mathrm{e}{-2})
Statistically significant (p<0.05𝑝0.05p<0.05) improvement compared to the baselines, for details see Appendix I.
Table 2: Results, OASIS raw dataset. The results are interpreted similarly to Table 1. SYMNet and VoxelMorph did not converge to anything meaningful due to the large initial displacement.
AccuracyDeformation regularityConsistency
ModelDice \uparrowHD95 \downarrow%percent\% of |Jϕ|0subscriptsubscript𝐽italic-ϕabsent0absent|J_{\phi}|_{\leq 0}\downarrowstd(|Jϕ|)stdsubscript𝐽italic-ϕabsent\operatorname{std}(|J_{\phi}|)\downarrowCycle \downarrowInverse \downarrow
SYMNet0.176(0.14)0.1760.140.176(0.14)19.7(10.)19.7(10.)1.8𝐞𝟒(1.8e4)1.8𝐞41.8e4\bm{{1.8}\mathrm{e}{-4}}({1.8}\mathrm{e}{-4})0.24(0.031)0.240.0310.24(0.031)3.6e1(1.4e1)3.6e11.4e1{3.6}\mathrm{e}{-1}({1.4}\mathrm{e}{-1})7.2e4(2.1e4)7.2e42.1e4{7.2}\mathrm{e}{-4}({2.1}\mathrm{e}{-4})
VoxelMorph0.230(0.19)0.2300.190.230(0.19)19.4(11.)19.4(11.)9.2e2(4.5e2)9.2e24.5e2{9.2}\mathrm{e}{-2}({4.5}\mathrm{e}{-2})0.26(0.019)0.260.0190.26(0.019)4.0e0(2.3e0)4.0e02.3e0{4.0}\mathrm{e}{0}({2.3}\mathrm{e}{0})-
cLapIRN0.744(0.073)0.7440.0730.744(0.073)3.14(1.9)3.141.93.14(1.9)5.8e1(1.4e1)5.8e11.4e1{5.8}\mathrm{e}{-1}({1.4}\mathrm{e}{-1})0.38(0.045)0.380.0450.38(0.045)3.0e0(2.1e0)3.0e02.1e0{3.0}\mathrm{e}{0}({2.1}\mathrm{e}{0})-
ByConstructionICON0.803(0.023)0.8030.0230.803(0.023)1.83(0.55)1.830.551.83(0.55)3.3e3(2.0e3)3.3e32.0e3{3.3}\mathrm{e}{-3}({2.0}\mathrm{e}{-3})0.21(0.045)0.210.0450.21(0.045)1.1𝐞𝟑(6.1e4)1.1𝐞36.1e4\bm{{1.1}\mathrm{e}{-3}}({6.1}\mathrm{e}{-4})1.1𝐞𝟑(6.1e4)1.1𝐞36.1e4\bm{{1.1}\mathrm{e}{-3}}({6.1}\mathrm{e}{-4})
SITReg0.813(0.023)0.813superscript0.023\bm{0.813}(0.023)^{*}1.80(0.52)1.80superscript0.52\bm{1.80}(0.52)^{*}1.0e3(3.9e4)/𝟎(0)1.0e33.9e400{1.0}\mathrm{e}{-3}({3.9}\mathrm{e}{-4})/\color[rgb]{0,0,1}{\bm{0}(0)}0.20(0.031)0.200.031\bm{0.20}(0.031)1.3e3(3.2e4)1.3e33.2e4{1.3}\mathrm{e}{-3}({3.2}\mathrm{e}{-4})1.3e3(3.2e4)1.3e33.2e4{1.3}\mathrm{e}{-3}({3.2}\mathrm{e}{-4})
Statistically significant (p<0.05𝑝0.05p<0.05) improvement compared to the baselines, for details see Appendix I.
Table 3: Results, LPBA40 dataset. The results are interpreted similarly to Table 1.
AccuracyDeformation regularityConsistency
ModelDice \uparrowHD95 \downarrow%percent\% of |Jϕ|0subscriptsubscript𝐽italic-ϕabsent0absent|J_{\phi}|_{\leq 0}\downarrowstd(|Jϕ|)stdsubscript𝐽italic-ϕabsent\operatorname{std}(|J_{\phi}|)\downarrowCycle \downarrowInverse \downarrow
SYMNet (original)0.669(0.033)0.6690.0330.669(0.033)6.79(0.70)6.790.706.79(0.70)1.1𝐞𝟑(4.6e4)1.1𝐞34.6e4\bm{{1.1}\mathrm{e}{-3}}({4.6}\mathrm{e}{-4})0.35(0.050)0.350.0500.35(0.050)2.7e1(6.1e2)2.7e16.1e2{2.7}\mathrm{e}{-1}({6.1}\mathrm{e}{-2})2.1e3(4.3e4)2.1e34.3e4{2.1}\mathrm{e}{-3}({4.3}\mathrm{e}{-4})
SYMNet (simple)0.664(0.034)0.6640.0340.664(0.034)6.88(0.73)6.880.736.88(0.73)4.7e3(1.6e3)4.7e31.6e3{4.7}\mathrm{e}{-3}({1.6}\mathrm{e}{-3})0.37(0.053)0.370.0530.37(0.053)2.8e1(5.8e2)2.8e15.8e2{2.8}\mathrm{e}{-1}({5.8}\mathrm{e}{-2})2.9e3(6.7e4)2.9e36.7e4{2.9}\mathrm{e}{-3}({6.7}\mathrm{e}{-4})
VoxelMorph0.676(0.032)0.6760.0320.676(0.032)6.72(0.68)6.720.686.72(0.68)2.2e1(2.1e1)2.2e12.1e1{2.2}\mathrm{e}{-1}({2.1}\mathrm{e}{-1})0.35(0.040)0.350.0400.35(0.040)3.1e1(1.1e1)3.1e11.1e1{3.1}\mathrm{e}{-1}({1.1}\mathrm{e}{-1})-
cLapIRN0.714(0.019)0.7140.0190.714(0.019)5.93(0.43)5.930.435.93(0.43)8.4e2(2.9e2)8.4e22.9e2{8.4}\mathrm{e}{-2}({2.9}\mathrm{e}{-2})0.27(0.020)0.270.020\bm{0.27}(0.020)5.6e1(1.8e1)5.6e11.8e1{5.6}\mathrm{e}{-1}({1.8}\mathrm{e}{-1})-
ByConstructionICON0.674(0.031)0.6740.0310.674(0.031)6.60(0.71)6.600.716.60(0.71)4.7e3(2.9e3)4.7e32.9e3{4.7}\mathrm{e}{-3}({2.9}\mathrm{e}{-3})0.33(0.41)0.330.410.33(0.41)1.6𝐞𝟑(6.6e4)1.6𝐞36.6e4\bm{{1.6}\mathrm{e}{-3}}({6.6}\mathrm{e}{-4})1.6𝐞𝟑(6.6e4)1.6𝐞36.6e4\bm{{1.6}\mathrm{e}{-3}}({6.6}\mathrm{e}{-4})
SITReg0.720(0.017)0.720superscript0.017\bm{0.720}(0.017)^{*}5.88(0.43)5.880.43\bm{5.88}(0.43)2.4e3(6.4e4)/𝟎(0)2.4e36.4e400{2.4}\mathrm{e}{-3}({6.4}\mathrm{e}{-4})/\color[rgb]{0,0,1}{\bm{0}(0)}0.31(0.032)0.310.0320.31(0.032)2.6e3(4.2e4)2.6e34.2e4{2.6}\mathrm{e}{-3}({4.2}\mathrm{e}{-4})2.6e3(4.2e4)2.6e34.2e4{2.6}\mathrm{e}{-3}({4.2}\mathrm{e}{-4})
SITReg (non-sym)0.697(0.024)0.6970.0240.697(0.024)6.29(0.57)6.290.576.29(0.57)1.2e3(5.6e4)1.2e35.6e4{1.2}\mathrm{e}{-3}({5.6}\mathrm{e}{-4})0.34(0.033)0.340.0330.34(0.033)2.2e1(3.2e2)2.2e13.2e2{2.2}\mathrm{e}{-1}({3.2}\mathrm{e}{-2})2.2e1(3.2e2)2.2e13.2e2{2.2}\mathrm{e}{-1}({3.2}\mathrm{e}{-2})
Statistically significant (p<0.05𝑝0.05p<0.05) improvement compared to the baselines, for details see Appendix I.
Table 4: Results, Lung250M-4B dataset. The results are interpreted similarly to Table 1. Note that TRE is computed in the inhale coordinates in contrast to the exhale coordinates used in the paper by Falta et al. (2024). In the inhale coordinates our method obtains TRE=2.272.272.27.
Deformation regularityConsistency
ModelTRE \downarrow%percent\% of |Jϕ|0subscriptsubscript𝐽italic-ϕabsent0absent|J_{\phi}|_{\leq 0}\downarrowstd(|Jϕ|)stdsubscript𝐽italic-ϕabsent\operatorname{std}(|J_{\phi}|)\downarrowCycle \downarrowInverse \downarrow
SYMNet8.25(2.1)8.252.18.25(2.1)0.0𝐞𝟎(0.0e0)0.0𝐞𝟎0.0e0\bm{{0.0}\mathrm{e}{0}}({0.0}\mathrm{e}{0})0.32(0.093)0.320.0930.32(0.093)5.3e1(2.2e1)5.3e12.2e1{5.3}\mathrm{e}{-1}({2.2}\mathrm{e}{-1})3.6e4(1.7e4)3.6e41.7e4{3.6}\mathrm{e}{-4}({1.7}\mathrm{e}{-4})
VoxelMorph6.66(1.9)6.661.96.66(1.9)1.2e0(5.9e1)1.2e05.9e1{1.2}\mathrm{e}{0}({5.9}\mathrm{e}{-1})0.39(0.090)0.390.0900.39(0.090)1.1e1(6.5e0)1.1e16.5e0{1.1}\mathrm{e}{1}({6.5}\mathrm{e}{0})-
cLapIRN5.34(1.9)5.341.95.34(1.9)4.9e3(6.5e3)4.9e36.5e3{4.9}\mathrm{e}{-3}({6.5}\mathrm{e}{-3})0.25(0.071)0.250.0710.25(0.071)5.9e0(3.7e0)5.9e03.7e0{5.9}\mathrm{e}{0}({3.7}\mathrm{e}{0})-
ByConstructionICON8.63(3.1)8.633.18.63(3.1)0.0𝐞𝟎(0.0e0)0.0𝐞𝟎0.0e0\bm{{0.0}\mathrm{e}{0}}({0.0}\mathrm{e}{0})0.19(0.054)0.190.054\bm{0.19}(0.054)4.8𝐞𝟓(2.8e5)4.8𝐞52.8e5\bm{{4.8}\mathrm{e}{-5}}({2.8}\mathrm{e}{-5})4.8𝐞𝟓(2.8e5)4.8𝐞52.8e5\bm{{4.8}\mathrm{e}{-5}}({2.8}\mathrm{e}{-5})
SITReg2.71(0.93)2.71superscript0.93\bm{2.71}(0.93)^{*}4.0e5(6.6e5)/𝟎(0)4.0e56.6e500{4.0}\mathrm{e}{-5}({6.6}\mathrm{e}{-5})/\color[rgb]{0,0,1}{\bm{0}(0)}0.30(0.084)0.300.0840.30(0.084)1.2e3(4.5e4)1.2e34.5e4{1.2}\mathrm{e}{-3}({4.5}\mathrm{e}{-4})1.2e3(4.5e4)1.2e34.5e4{1.2}\mathrm{e}{-3}({4.5}\mathrm{e}{-4})
SITReg (non-sym)4.98(2.3)4.982.34.98(2.3)0.0𝐞𝟎(0.0e0)0.0𝐞𝟎0.0e0\bm{{0.0}\mathrm{e}{0}}({0.0}\mathrm{e}{0})0.32(0.088)0.320.0880.32(0.088)4.6e1(2.8e1)4.6e12.8e1{4.6}\mathrm{e}{-1}({2.8}\mathrm{e}{-1})4.6e1(2.8e1)4.6e12.8e1{4.6}\mathrm{e}{-1}({2.8}\mathrm{e}{-1})
Statistically significant (p<0.05𝑝0.05p<0.05) improvement compared to the baselines, for details see Appendix I.
Table 5: Computational efficiency, OASIS dataset. Mean and standard deviation are shown. Inference time and memory usage were measured on NVIDIA GeForce RTX 3090.
ModelInference Time (s) \downarrowInference Memory (GB) \downarrow# parameters (M) \downarrow
SYMNet0.095(0.00052)0.0950.000520.095(0.00052)1.91.9\bm{1.9}0.90.9\bm{0.9}
VoxelMorph0.16(0.0010)0.160.00100.16(0.0010)5.65.65.61.31.31.3
cLapIRN0.10(0.00052)0.100.00052\bm{0.10}(0.00052)4.14.14.11.21.21.2
ByConstrictionICON0.32(0.0022)0.320.00220.32(0.0022)2.42.42.445.145.145.1
SITReg0.37(0.0057)0.370.00570.37(0.0057)3.43.43.41.21.21.2
Refer to caption
Figure 4: Visual deformation regularity comparison. Local Jacobian determinants are visualized for each model for a single predicted deformation in OASIS experiment. Folding voxels (determinant below zero) are marked with black color. Only one axial slice of the predicted 3D deformation is visible.

6 Discussion

Evaluation of registration algorithms is difficult due to lack of ground truth (Pluim et al., 2016). In particular, the assessment of registration performance should not be based only on a tissue overlap score since a clearly unrealistic and wrong deformation could still have a good tissue overlap score (an extreme example is presented by Rohlfing (2011)). For that reason, usually also deformation regularity is measured. However, that introduces further difficulties for evaluation because of the trade-off invoked by the regularization hyperparameter (in our case λ𝜆\lambda) between the tissue overlap and deformation regularity, which can change the ranking of the methods for different metrics. For that reason one should look at the overall performance across the regularity and accuracy metrics (see e.g. Learn2reg challenge (Hering et al., 2022)). We further evaluate using the consistency metrics as is often done in the literature (Holden et al., 2000; Pluim et al., 2016).

In a such overall comparison over all four datasets, our method performs clearly the best. In more detail:

  • VoxelMorph: Our method outperforms it on every metric in every experiment.

  • SYMNet: While SYMNet (original) has in general slightly better deformation regularity (compared to our standard inference variant), our method has a significantly better dice score or TRE. By increasing regularization one could make our model to have better regularity while still maintaining significantly better dice score or TRE than SYMNet. This is demonstrated for validation set results in Tables 6, 7, and 8 in Appendix A. In other words, our method has significantly better overall performance.

  • cLapIRN: Our method outperforms cLapIRN on all four datasets. On OASIS raw and Lung250M-4B datasets our method outperforms cLapIRN clearly. On the pre-aligned OASIS and LPBA40 datasets our method has only slightly better tissue overlap performance than cLapIRN, but has clearly superior deformation regularity in terms of folding voxels.

  • ByConstructionICON: On the OASIS datasets ByCounstrictionICON performs similarily to our method, although slightly worse on the OASIS raw dataset. However, on the smaller LPBA40 and Lung250M-4B datasets the performance of our method is clearly better, suggesting that our method generalizes from less data than ByConstructionICON.

For the Lung250M-4B dataset the test set is standardized, and we can compare the results to the methods benchmarked in the paper (Falta et al., 2024). However, our metrics are computed in the coordinates of the inhale image whereas the paper uses coordinates of the exhale image. In the exhale coordinates our method obtains TRE =2.27absent2.27=2.27 which is very competitive, clearly outperforming the best deep learning method VoxelMorph++ without instance optimization (TRE =4.47absent4.47=4.47) trained using additional landmark supervision unlike our unsupervised method. VoxelMorph++ with instance optimization obtains TRE =2.26absent2.26=2.26.

6.1 Ablation study

Based on the ablation study, the symmetric formulation helps especially on the smaller LPBA40 and Lung250M-4B datasets. On the larger OASIS dataset the performance on tissue overlap metrics is similar but even there is a slight improvement in terms of deformation regularity. The result is in line with the common machine learning wisdom that incorporating inductive bias into models has more relevance when the training set is small.

6.2 Computational performance

Inference time of our method is slightly larger than that of the compared methods, but unlike VoxelMorph and cLapIRN, it produces deformations in both directions immediately. Also, half a second runtime is still very fast and restrictive only in the most time-critical use cases. In terms of inference memory usage our method is competitive.

7 Conclusions

We proposed a novel image registration architecture inbuilt with the desirable properties of symmetry, inverse consistency, and topology preservation. The multi-resolution formulation was capable of accurately registering images even with large intial misalignments. As part of our method, we developed a new neural network component deformation inversion layer. The model is easily end-to-end trainable and does not require tedious multi-stage training strategies. In the experiments the method demonstrates state-of-the-art registration performance. The main limitation is somewhat heavier computational cost than other methods.


Acknowledgments

This work was supported by the Academy of Finland (Flagship programme: Finnish Center for Artificial Intelligence FCAI, and grants 336033, 352986) and EU (H2020 grant 101016775 and NextGenerationEU). We also acknowledge the computational resources provided by the Aalto Science-IT Project.


Ethical Standards

The work follows appropriate ethical standards in conducting research and writing the manuscript, following all applicable laws and regulations regarding treatment of animals or human subjects.


Conflicts of Interest

We declare we don’t have conflicts of interest.


Data availability

All data used in the experiments is publicly available. The exact OASIS dataset used can be accessed from the website of Learn2reg challenge https://learn2reg.grand-challenge.org/ and the LPBA40 dataset is available for download at https://www.loni.usc.edu/research/atlas_downloads. Lung250M-4B dataset can be obtained based on instructions at https://github.com/multimodallearning/Lung250M-4B. The provided codebase contains implementation for all additional data pre-processing that was done.

References

  • Arsigny et al. (2006) Vincent Arsigny, Olivier Commowick, Xavier Pennec, and Nicholas Ayache. A log-euclidean framework for statistics on diffeomorphisms. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 924–931. Springer, 2006.
  • Ashburner (2007) John Ashburner. A fast diffeomorphic image registration algorithm. Neuroimage, 38(1):95–113, 2007.
  • Avants et al. (2008) Brian B Avants, Charles L Epstein, Murray Grossman, and James C Gee. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Medical image analysis, 12(1):26–41, 2008.
  • Avants et al. (2009) Brian B Avants, Nick Tustison, Gang Song, et al. Advanced normalization tools (ants). Insight j, 2(365):1–35, 2009.
  • Bai et al. (2019) Shaojie Bai, J Zico Kolter, and Vladlen Koltun. Deep equilibrium models. Advances in Neural Information Processing Systems, 32, 2019.
  • Bai et al. (2020) Shaojie Bai, Vladlen Koltun, and J Zico Kolter. Multiscale deep equilibrium models. Advances in Neural Information Processing Systems, 33:5238–5250, 2020.
  • Balakrishnan et al. (2019) Guha Balakrishnan, Amy Zhao, Mert R Sabuncu, John Guttag, and Adrian V Dalca. VoxelMorph: A learning framework for deformable medical image registration. IEEE transactions on medical imaging, 38(8):1788–1800, 2019.
  • Balik et al. (2013) Salim Balik, Elisabeth Weiss, Nuzhat Jan, Nicholas Roman, William C. Sleeman, Mirek Fatyga, Gary E. Christensen, Cheng Zhang, Martin J. Murphy, Jun Lu, Paul Keall, Jeffrey F. Williamson, and Geoffrey D. Hugo. Evaluation of 4-dimensional computed tomography to 4-dimensional cone-beam computed tomography deformable image registration for lung cancer adaptive radiation therapy. International Journal of Radiation Oncology*Biology*Physics, 86(2):372–379, 2013. ISSN 0360-3016. . URL https://www.sciencedirect.com/science/article/pii/S0360301613000023.
  • Beg et al. (2005) M Faisal Beg, Michael I Miller, Alain Trouvé, and Laurent Younes. Computing large deformation metric mappings via geodesic flows of diffeomorphisms. International journal of computer vision, 61:139–157, 2005.
  • Bronstein et al. (2017) Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18–42, 2017.
  • Bronstein et al. (2021) Michael M Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. arXiv preprint arXiv:2104.13478, 2021.
  • Cao et al. (2005) Yan Cao, Michael I Miller, Raimond L Winslow, and Laurent Younes. Large deformation diffeomorphic metric mapping of vector fields. IEEE transactions on medical imaging, 24(9):1216–1230, 2005.
  • Castillo et al. (2013) Richard Castillo, Edward Castillo, David Fuentes, Moiz Ahmad, Abbie M Wood, Michelle S Ludwig, and Thomas Guerrero. A reference dataset for deformable image registration spatial accuracy evaluation using the copdgene study archive. Physics in Medicine & Biology, 58(9):2861, 2013.
  • Chen et al. (2023) Junyu Chen, Yihao Liu, Shuwen Wei, Zhangxing Bian, Shalini Subramanian, Aaron Carass, Jerry L Prince, and Yong Du. A survey on deep learning in medical image registration: New technologies, uncertainty, evaluation metrics, and beyond. arXiv preprint arXiv:2307.15615, 2023.
  • Chen et al. (2008) Mingli Chen, Weiguo Lu, Quan Chen, Kenneth J Ruchala, and Gustavo H Olivera. A simple fixed-point approach to invert a deformation field. Medical physics, 35(1):81–88, 2008.
  • Choi and Lee (2000) Yongchoel Choi and Seungyong Lee. Injectivity conditions of 2D and 3D uniform cubic B-spline functions. Graphical models, 62(6):411–427, 2000.
  • Christensen et al. (1995) Gary E Christensen, Richard D Rabbitt, Michael I Miller, Sarang C Joshi, Ulf Grenander, Thomas A Coogan, and David C Van Essen. Topological properties of smooth anatomic maps. In Information processing in medical imaging, volume 3, page 101. Springer Science & Business Media, 1995.
  • Clark et al. (2013) Kenneth Clark, Bruce Vendt, Kirk Smith, John Freymann, Justin Kirby, Paul Koppel, Stephen Moore, Stanley Phillips, David Maffitt, Michael Pringle, Lawrence Tarbox, and Fred Prior. The cancer imaging archive (tcia): Maintaining and operating a public information repository. Journal of Digital Imaging, 26(6):1045–1057, Dec 2013. ISSN 1618-727X. . URL https://doi.org/10.1007/s10278-013-9622-7.
  • Dalca et al. (2018) Adrian V Dalca, Guha Balakrishnan, John Guttag, and Mert R Sabuncu. Unsupervised learning for fast probabilistic diffeomorphic registration. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part I, pages 729–738. Springer, 2018.
  • De Vos et al. (2019) Bob D De Vos, Floris F Berendsen, Max A Viergever, Hessam Sokooti, Marius Staring, and Ivana Išgum. A deep learning framework for unsupervised affine and deformable image registration. Medical image analysis, 52:128–143, 2019.
  • Duvenaud et al. (2020) David Duvenaud, J Zico Kolter, and Matthew Johnson. Deep implicit layers tutorial-neural ODEs, deep equilibirum models, and beyond. Neural Information Processing Systems Tutorial, 2020.
  • Eslick et al. (2018) Enid M. Eslick, John Kipritidis, Denis Gradinscak, Mark J. Stevens, Dale L. Bailey, Benjamin Harris, Jeremy T. Booth, and Paul J. Keall. Ct ventilation imaging derived from breath hold ct exhibits good regional accuracy with galligas pet. Radiotherapy and Oncology, 127(2):267–273, 2018. ISSN 0167-8140. . URL https://www.sciencedirect.com/science/article/pii/S0167814017327652.
  • Eslick et al. (2022) Enid M Eslick, John Kipritidis, Denis Gradinscak, Mark J Stevens, Dale L Bailey, Benjamin Harris, Jeremy T Booth, and Paul J Keall. CT ventilation as a functional imaging modality for lung cancer radiotherapy (CT-vs-PET-Ventilation-Imaging), 2022.
  • Estienne et al. (2021) Théo Estienne, Maria Vakalopoulou, Enzo Battistella, Theophraste Henry, Marvin Lerousseau, Amaury Leroy, Nikos Paragios, and Eric Deutsch. MICS: Multi-steps, inverse consistency and symmetric deep learning registration network. arXiv preprint arXiv:2111.12123, 2021.
  • Falta et al. (2024) Fenja Falta, Christoph Großbröhmer, Alessa Hering, Alexander Bigalke, and Mattias Heinrich. Lung250m-4b: a combined 3d dataset for ct-and point cloud-based intra-patient lung registration. Advances in Neural Information Processing Systems, 36, 2024.
  • Greer et al. (2023) Hastings Greer, Lin Tian, Francois-Xavier Vialard, Roland Kwitt, Sylvain Bouix, Raul San Jose Estepar, Richard Rushmore, and Marc Niethammer. Inverse consistency by construction for multistep deep registration. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 688–698. Springer, 2023.
  • Gu et al. (2020) Dongdong Gu, Xiaohuan Cao, Shanshan Ma, Lei Chen, Guocai Liu, Dinggang Shen, and Zhong Xue. Pair-wise and group-wise deformation consistency in deep registration network. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 171–180. Springer, 2020.
  • He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • Hering et al. (2022) Alessa Hering, Lasse Hansen, Tony CW Mok, Albert CS Chung, Hanna Siebert, Stephanie Häger, Annkristin Lange, Sven Kuckertz, Stefan Heldmann, Wei Shao, et al. Learn2Reg: Comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning. IEEE Transactions on Medical Imaging, 2022.
  • Holden et al. (2000) Mark Holden, Derek LG Hill, Erika RE Denton, Jo M Jarosz, Tim CS Cox, Torsten Rohlfing, Joanne Goodey, and David J Hawkes. Voxel similarity measures for 3-d serial mr brain image registration. IEEE transactions on medical imaging, 19(2):94–102, 2000.
  • Hoopes et al. (2021) Andrew Hoopes, Malte Hoffmann, Bruce Fischl, John Guttag, and Adrian V Dalca. Hypermorph: Amortized hyperparameter learning for image registration. In International Conference on Information Processing in Medical Imaging, pages 3–17. Springer, 2021.
  • Hugo et al. (2016) Geoffrey D Hugo, Elisabeth Weiss, William C Sleeman, Salim Balik, Paul J Keall, Jun Lu, and Jeffrey F Williamson. Data from 4D lung imaging of NSCLC patients, 2016.
  • Hugo et al. (2017) Geoffrey D. Hugo, Elisabeth Weiss, William C. Sleeman, Salim Balik, Paul J. Keall, Jun Lu, and Jeffrey F. Williamson. A longitudinal four-dimensional computed tomography and cone beam computed tomography dataset for image-guided radiation therapy research in lung cancer. Medical Physics, 44(2):762–771, 2017. . URL https://aapm.onlinelibrary.wiley.com/doi/abs/10.1002/mp.12059.
  • Iglesias (2023) Juan Eugenio Iglesias. A ready-to-use machine learning tool for symmetric multi-modality registration of brain mri. Scientific Reports, 13(1):6657, 2023.
  • Joshi and Hong (2023) Ankita Joshi and Yi Hong. R2net: Efficient and flexible diffeomorphic image registration using lipschitz continuous residual networks. Medical Image Analysis, 89:102917, 2023.
  • Kim et al. (2019) Boah Kim, Jieun Kim, June-Goo Lee, Dong Hwan Kim, Seong Ho Park, and Jong Chul Ye. Unsupervised deformable image registration using cycle-consistent CNN. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 166–174. Springer, 2019.
  • Klein et al. (2009) Arno Klein, Jesper Andersson, Babak A Ardekani, John Ashburner, Brian Avants, Ming-Chang Chiang, Gary E Christensen, D Louis Collins, James Gee, Pierre Hellier, et al. Evaluation of 14 nonlinear deformation algorithms applied to human brain mri registration. Neuroimage, 46(3):786–802, 2009.
  • Krebs et al. (2018) Julian Krebs, Tommaso Mansi, Boris Mailhé, Nicholas Ayache, and Hervé Delingette. Unsupervised probabilistic deformation modeling for robust diffeomorphic registration. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pages 101–109. Springer, 2018.
  • Krebs et al. (2019) Julian Krebs, Hervé Delingette, Boris Mailhé, Nicholas Ayache, and Tommaso Mansi. Learning a probabilistic model for diffeomorphic registration. IEEE transactions on medical imaging, 38(9):2165–2176, 2019.
  • Mahapatra and Ge (2019) Dwarikanath Mahapatra and Zongyuan Ge. Training data independent image registration with GANs using transfer learning and segmentation information. In 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), pages 709–713. IEEE, 2019.
  • Marcus et al. (2007) Daniel S Marcus, Tracy H Wang, Jamie Parker, John G Csernansky, John C Morris, and Randy L Buckner. Open Access Series of Imaging Studies (OASIS): Cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. Journal of cognitive neuroscience, 19(9):1498–1507, 2007.
  • Mok and Chung (2020a) Tony CW Mok and Albert Chung. Fast symmetric diffeomorphic image registration with convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4644–4653, 2020a.
  • Mok and Chung (2020b) Tony CW Mok and Albert Chung. Large deformation diffeomorphic image registration with laplacian pyramid networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 211–221. Springer, 2020b.
  • Mok and Chung (2021) Tony CW Mok and Albert Chung. Conditional deformable image registration with convolutional neural network. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 35–45. Springer, 2021.
  • Murphy et al. (2011) Keelin Murphy, Bram Van Ginneken, Joseph M Reinhardt, Sven Kabus, Kai Ding, Xiang Deng, Kunlin Cao, Kaifang Du, Gary E Christensen, Vincent Garcia, et al. Evaluation of registration methods on thoracic ct: the empire10 challenge. IEEE transactions on medical imaging, 30(11):1901–1920, 2011.
  • Niethammer et al. (2019) Marc Niethammer, Roland Kwitt, and Francois-Xavier Vialard. Metric learning for image registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8463–8472, 2019.
  • NLST Research Team (2011) NLST Research Team. Reduced lung-cancer mortality with low-dose computed tomographic screening. New England Journal of Medicine, 365(5):395–409, 2011. . URL https://www.nejm.org/doi/full/10.1056/NEJMoa1102873.
  • NLST Research Team (2013) NLST Research Team. Data from the national lung screening trial (NLST). https://doi.org/10.7937/TCIA.HMQ8-J677, 2013. The Cancer Imaging Archive.
  • Oliveira and Tavares (2014) Francisco PM Oliveira and Joao Manuel RS Tavares. Medical image registration: a review. Computer methods in biomechanics and biomedical engineering, 17(2):73–93, 2014.
  • Paszke et al. (2019) Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019. URL http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.
  • Pluim et al. (2016) Josien PW Pluim, Sascha EA Muenzing, Koen AJ Eppenhof, and Keelin Murphy. The truth is hard to make: Validation of medical image registration. In 2016 23rd International Conference on Pattern Recognition (ICPR), pages 2294–2300. IEEE, 2016.
  • Ramon et al. (2022) Ubaldo Ramon, Monica Hernandez, and Elvira Mayordomo. Lddmm meets gans: Generative adversarial networks for diffeomorphic registration. In International Workshop on Biomedical Image Registration, pages 18–28. Springer, 2022.
  • Rohlfing (2011) Torsten Rohlfing. Image similarity and tissue overlaps as surrogates for image registration accuracy: widely used but unreliable. IEEE transactions on medical imaging, 31(2):153–163, 2011.
  • Roman et al. (2012) Nicholas O. Roman, Wes Shepherd, Nitai Mukhopadhyay, Geoffrey D. Hugo, and Elisabeth Weiss. Interfractional positional variability of fiducial markers and primary tumors in locally advanced non-small-cell lung cancer during audiovisual biofeedback radiotherapy. International Journal of Radiation Oncology*Biology*Physics, 83(5):1566–1572, 2012. ISSN 0360-3016. . URL https://www.sciencedirect.com/science/article/pii/S0360301611034596.
  • Rueckert et al. (1999) Daniel Rueckert, Luke I Sonoda, Carmel Hayes, Derek LG Hill, Martin O Leach, and David J Hawkes. Nonrigid registration using free-form deformations: Application to breast MR images. IEEE Transactions on Medical Imaging, 18(8):712–721, 1999.
  • Rueckert et al. (2006) Daniel Rueckert, Paul Aljabar, Rolf A Heckemann, Joseph V Hajnal, and Alexander Hammers. Diffeomorphic registration using B-splines. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 702–709. Springer, 2006.
  • Shattuck et al. (2008) David W Shattuck, Mubeena Mirza, Vitria Adisetiyo, Cornelius Hojatkashani, Georges Salamon, Katherine L Narr, Russell A Poldrack, Robert M Bilder, and Arthur W Toga. Construction of a 3D probabilistic atlas of human cortical structures. Neuroimage, 39(3):1064–1080, 2008.
  • Shen et al. (2019a) Zhengyang Shen, Xu Han, Zhenlin Xu, and Marc Niethammer. Networks for joint affine and non-parametric image registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4224–4233, 2019a.
  • Shen et al. (2019b) Zhengyang Shen, François-Xavier Vialard, and Marc Niethammer. Region-specific diffeomorphic metric mapping. Advances in Neural Information Processing Systems, 32, 2019b.
  • Sotiras et al. (2013) Aristeidis Sotiras, Christos Davatzikos, and Nikos Paragios. Deformable medical image registration: A survey. IEEE transactions on medical imaging, 32(7):1153–1190, 2013.
  • Vercauteren et al. (2009) Tom Vercauteren, Xavier Pennec, Aymeric Perchant, and Nicholas Ayache. Diffeomorphic demons: Efficient non-parametric image registration. NeuroImage, 45(1):S61–S72, 2009.
  • Walker and Ni (2011) Homer F Walker and Peng Ni. Anderson acceleration for fixed-point iterations. SIAM Journal on Numerical Analysis, 49(4):1715–1735, 2011.
  • Wang and Zhang (2020) Jian Wang and Miaomiao Zhang. Deepflash: An efficient network for learning-based medical image registration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4444–4452, 2020.
  • Wang et al. (2023) Jian Wang, Jiarui Xing, Jason Druzgal, William M Wells III, and Miaomiao Zhang. Metamorph: learning metamorphic image transformation with appearance changes. In International Conference on Information Processing in Medical Imaging, pages 576–587. Springer, 2023.
  • Wu et al. (2022) Yifan Wu, Tom Z Jiahao, Jiancong Wang, Paul A Yushkevich, M Ani Hsieh, and James C Gee. Nodeo: A neural ordinary differential equation based optimization framework for deformable image registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20804–20813, 2022.
  • Young et al. (2022) Sean I Young, Yaël Balbastre, Adrian V Dalca, William M Wells, Juan Eugenio Iglesias, and Bruce Fischl. SuperWarp: Supervised learning and warping on U-Net for invariant subvoxel-precise registration. arXiv preprint arXiv:2205.07399, 2022.
  • Zhang (2018) Jun Zhang. Inverse-consistent deep networks for unsupervised deformable image registration. arXiv preprint arXiv:1809.03443, 2018.
  • Zhang et al. (2023) Liutong Zhang, Guochen Ning, Lei Zhou, and Hongen Liao. Symmetric pyramid network for medical image inverse consistent diffeomorphic registration. Computerized Medical Imaging and Graphics, 104:102184, 2023.
  • Zheng et al. (2021) Yuanjie Zheng, Xiaodan Sui, Yanyun Jiang, Tongtong Che, Shaoting Zhang, Jie Yang, and Hongsheng Li. Symreg-gan: symmetric image registration with generative adversarial networks. IEEE transactions on pattern analysis and machine intelligence, 44(9):5631–5646, 2021.

A Hyperparameter selection details

We experimented on validation set with different hyperparameters during the development. While the final results on test set are computed only for one chosen configuration, the results on validation set might still be of interest for the reader. Results of these experiments for the pre-aligned OASIS dataset are shown in Table 6, for the LPBA40 dataset in Table 7, and for the Lung250M-4B dataset in Table 8.

For the OASIS raw dataset without pre-alignment we used 666 resolution levels, together with an affine transformation prediction stage before the other deformation updates. We omitted the predicted affine transformation from the deformation regularization. The same regularization weight was used as for the pre-aligned OASIS dataset.

Table 6: Hyperparameter optimization results for our method calculated on the OASIS validation set. The chosen configuration was λ=1.0𝜆1.0\lambda=1.0, and K=4𝐾4K=4. HD95 metric is not included due to relatively high computational cost.
HyperparametersAccuracyDeformation regularityConsistency
λ𝜆\lambdaK𝐾KDice \uparrow%percent\% of |Jϕ|0subscriptsubscript𝐽italic-ϕabsent0absent|J_{\phi}|_{\leq 0}\downarrowstd(|Jϕ|)stdsubscript𝐽italic-ϕabsent\operatorname{std}(|J_{\phi}|)\downarrowCycle \downarrowInverse \downarrow
1.050.822(0.035)0.8220.0350.822(0.035)9.1e3(1.7e3)9.1e31.7e3{9.1}\mathrm{e}{-3}({1.7}\mathrm{e}{-3})0.45(0.027)0.450.0270.45(0.027)5.7e3(6.0e4)5.7e36.0e4{5.7}\mathrm{e}{-3}({6.0}\mathrm{e}{-4})5.7e3(6.0e4)5.7e36.0e4{5.7}\mathrm{e}{-3}({6.0}\mathrm{e}{-4})
1.550.818(0.034)0.8180.0340.818(0.034)1.9e3(5.1e4)1.9e35.1e4{1.9}\mathrm{e}{-3}({5.1}\mathrm{e}{-4})0.40(0.023)0.400.0230.40(0.023)3.7e3(3.4e4)3.7e33.4e4{3.7}\mathrm{e}{-3}({3.4}\mathrm{e}{-4})3.7e3(3.4e4)3.7e33.4e4{3.7}\mathrm{e}{-3}({3.4}\mathrm{e}{-4})
2.050.815(0.035)0.8150.0350.815(0.035)3.7e4(2.0e4)3.7e42.0e4{3.7}\mathrm{e}{-4}({2.0}\mathrm{e}{-4})0.37(0.021)0.370.0210.37(0.021)2.6e3(2.1e4)2.6e32.1e4{2.6}\mathrm{e}{-3}({2.1}\mathrm{e}{-4})2.6e3(2.1e4)2.6e32.1e4{2.6}\mathrm{e}{-3}({2.1}\mathrm{e}{-4})
1.040.822(0.034)0.8220.0340.822(0.034)8.2e3(1.5e3)8.2e31.5e3{8.2}\mathrm{e}{-3}({1.5}\mathrm{e}{-3})0.44(0.028)0.440.0280.44(0.028)5.5e3(5.6e4)5.5e35.6e4{5.5}\mathrm{e}{-3}({5.6}\mathrm{e}{-4})5.5e3(5.6e4)5.5e35.6e4{5.5}\mathrm{e}{-3}({5.6}\mathrm{e}{-4})
1.540.819(0.035)0.8190.0350.819(0.035)2.1e3(5.8e4)2.1e35.8e4{2.1}\mathrm{e}{-3}({5.8}\mathrm{e}{-4})0.40(0.023)0.400.0230.40(0.023)3.4e3(3.3e4)3.4e33.3e4{3.4}\mathrm{e}{-3}({3.3}\mathrm{e}{-4})3.4e3(3.3e4)3.4e33.3e4{3.4}\mathrm{e}{-3}({3.3}\mathrm{e}{-4})
2.040.815(0.036)0.8150.0360.815(0.036)3.6e4(2.1e4)3.6e42.1e4{3.6}\mathrm{e}{-4}({2.1}\mathrm{e}{-4})0.37(0.020)0.370.0200.37(0.020)2.6e3(2.2e4)2.6e32.2e4{2.6}\mathrm{e}{-3}({2.2}\mathrm{e}{-4})2.6e3(2.2e4)2.6e32.2e4{2.6}\mathrm{e}{-3}({2.2}\mathrm{e}{-4})
Table 7: Hyperparameter optimization results for our method calculated on the LPBA40 validation set. The chosen configuration was λ=1.0𝜆1.0\lambda=1.0, K=7𝐾7K=7, and Affine=NoAffineNo\text{Affine}=\text{No}.
HyperparametersAccuracyDeformation regularityConsistency
λ𝜆\lambdaK𝐾KAffineDice \uparrowHD95 \downarrow%percent\% of |Jϕ|0subscriptsubscript𝐽italic-ϕabsent0absent|J_{\phi}|_{\leq 0}\downarrowstd(|Jϕ|)stdsubscript𝐽italic-ϕabsent\operatorname{std}(|J_{\phi}|)\downarrowCycle \downarrowInverse \downarrow
1.04No0.710(0.015)0.7100.0150.710(0.015)6.10(0.46)6.100.466.10(0.46)2.5e3(8.7e4)2.5e38.7e4{2.5}\mathrm{e}{-3}({8.7}\mathrm{e}{-4})0.31(0.020)0.310.0200.31(0.020)2.5e3(3.5e4)2.5e33.5e4{2.5}\mathrm{e}{-3}({3.5}\mathrm{e}{-4})2.5e3(3.5e4)2.5e33.5e4{2.5}\mathrm{e}{-3}({3.5}\mathrm{e}{-4})
1.05No0.720(0.014)0.7200.0140.720(0.014)5.83(0.36)5.830.365.83(0.36)1.7e3(5.7e4)1.7e35.7e4{1.7}\mathrm{e}{-3}({5.7}\mathrm{e}{-4})0.30(0.019)0.300.0190.30(0.019)2.3e3(3.1e4)2.3e33.1e4{2.3}\mathrm{e}{-3}({3.1}\mathrm{e}{-4})2.3e3(3.1e4)2.3e33.1e4{2.3}\mathrm{e}{-3}({3.1}\mathrm{e}{-4})
1.06No0.725(0.012)0.7250.0120.725(0.012)5.70(0.31)5.700.315.70(0.31)2.1e3(4.8e4)2.1e34.8e4{2.1}\mathrm{e}{-3}({4.8}\mathrm{e}{-4})0.29(0.019)0.290.0190.29(0.019)2.3e3(3.0e4)2.3e33.0e4{2.3}\mathrm{e}{-3}({3.0}\mathrm{e}{-4})2.3e3(3.0e4)2.3e33.0e4{2.3}\mathrm{e}{-3}({3.0}\mathrm{e}{-4})
1.07No0.726(0.011)0.7260.0110.726(0.011)5.69(0.30)5.690.305.69(0.30)1.9e3(5.3e4)1.9e35.3e4{1.9}\mathrm{e}{-3}({5.3}\mathrm{e}{-4})0.29(0.019)0.290.0190.29(0.019)2.3e3(3.1e4)2.3e33.1e4{2.3}\mathrm{e}{-3}({3.1}\mathrm{e}{-4})2.3e3(3.1e4)2.3e33.1e4{2.3}\mathrm{e}{-3}({3.1}\mathrm{e}{-4})
1.05Yes0.719(0.014)0.7190.0140.719(0.014)5.86(0.35)5.860.355.86(0.35)2.2e3(7.2e4)2.2e37.2e4{2.2}\mathrm{e}{-3}({7.2}\mathrm{e}{-4})0.30(0.019)0.300.0190.30(0.019)2.4e3(3.3e4)2.4e33.3e4{2.4}\mathrm{e}{-3}({3.3}\mathrm{e}{-4})2.4e3(3.3e4)2.4e33.3e4{2.4}\mathrm{e}{-3}({3.3}\mathrm{e}{-4})
1.06Yes0.721(0.015)0.7210.0150.721(0.015)5.78(0.37)5.780.375.78(0.37)2.6e3(5.8e4)2.6e35.8e4{2.6}\mathrm{e}{-3}({5.8}\mathrm{e}{-4})0.30(0.019)0.300.0190.30(0.019)2.4e3(3.2e4)2.4e33.2e4{2.4}\mathrm{e}{-3}({3.2}\mathrm{e}{-4})2.4e3(3.2e4)2.4e33.2e4{2.4}\mathrm{e}{-3}({3.2}\mathrm{e}{-4})
2.04No0.703(0.018)0.7030.0180.703(0.018)6.20(0.50)6.200.506.20(0.50)5.0e5(8.1e5)5.0e58.1e5{5.0}\mathrm{e}{-5}({8.1}\mathrm{e}{-5})0.25(0.017)0.250.0170.25(0.017)1.2e3(1.5e4)1.2e31.5e4{1.2}\mathrm{e}{-3}({1.5}\mathrm{e}{-4})1.2e3(1.5e4)1.2e31.5e4{1.2}\mathrm{e}{-3}({1.5}\mathrm{e}{-4})
2.05No0.718(0.014)0.7180.0140.718(0.014)5.84(0.35)5.840.355.84(0.35)6.5e5(7.9e5)6.5e57.9e5{6.5}\mathrm{e}{-5}({7.9}\mathrm{e}{-5})0.25(0.016)0.250.0160.25(0.016)1.1e3(1.5e4)1.1e31.5e4{1.1}\mathrm{e}{-3}({1.5}\mathrm{e}{-4})1.1e3(1.5e4)1.1e31.5e4{1.1}\mathrm{e}{-3}({1.5}\mathrm{e}{-4})
2.06No0.722(0.012)0.7220.0120.722(0.012)5.76(0.30)5.760.305.76(0.30)5.0e5(7.4e5)5.0e57.4e5{5.0}\mathrm{e}{-5}({7.4}\mathrm{e}{-5})0.24(0.016)0.240.0160.24(0.016)1.1e3(1.5e4)1.1e31.5e4{1.1}\mathrm{e}{-3}({1.5}\mathrm{e}{-4})1.1e3(1.5e4)1.1e31.5e4{1.1}\mathrm{e}{-3}({1.5}\mathrm{e}{-4})
2.07No0.721(0.012)0.7210.0120.721(0.012)5.77(0.30)5.770.305.77(0.30)4.0e5(5.8e5)4.0e55.8e5{4.0}\mathrm{e}{-5}({5.8}\mathrm{e}{-5})0.25(0.016)0.250.0160.25(0.016)1.2e3(1.5e4)1.2e31.5e4{1.2}\mathrm{e}{-3}({1.5}\mathrm{e}{-4})1.2e3(1.5e4)1.2e31.5e4{1.2}\mathrm{e}{-3}({1.5}\mathrm{e}{-4})
2.05Yes0.715(0.014)0.7150.0140.715(0.014)5.95(0.40)5.950.405.95(0.40)4.0e5(8.6e5)4.0e58.6e5{4.0}\mathrm{e}{-5}({8.6}\mathrm{e}{-5})0.24(0.016)0.240.0160.24(0.016)1.1e3(1.4e4)1.1e31.4e4{1.1}\mathrm{e}{-3}({1.4}\mathrm{e}{-4})1.1e3(1.4e4)1.1e31.4e4{1.1}\mathrm{e}{-3}({1.4}\mathrm{e}{-4})
2.06Yes0.721(0.014)0.7210.0140.721(0.014)5.77(0.34)5.770.345.77(0.34)5.0e5(7.4e5)5.0e57.4e5{5.0}\mathrm{e}{-5}({7.4}\mathrm{e}{-5})0.25(0.017)0.250.0170.25(0.017)1.1e3(1.4e4)1.1e31.4e4{1.1}\mathrm{e}{-3}({1.4}\mathrm{e}{-4})1.1e3(1.4e4)1.1e31.4e4{1.1}\mathrm{e}{-3}({1.4}\mathrm{e}{-4})
Table 8: Hyperparameter optimization results for our method calculated on the Lung250M-4b validation set. The chosen configuration was λ=1.0𝜆1.0\lambda=1.0, K=6𝐾6K=6.
HyperparametersAccuracyDeformation regularityConsistency
λ𝜆\lambdaK𝐾KTRE \downarrow%percent\% of |Jϕ|0subscriptsubscript𝐽italic-ϕabsent0absent|J_{\phi}|_{\leq 0}\downarrowstd(|Jϕ|)stdsubscript𝐽italic-ϕabsent\operatorname{std}(|J_{\phi}|)\downarrowCycle \downarrowInverse \downarrow
1.053.05(0.84)3.050.843.05(0.84)1.2e5(4.1e5)1.2e54.1e5{1.2}\mathrm{e}{-5}({4.1}\mathrm{e}{-5})0.21(0.11)0.210.110.21(0.11)1.1e2(4.1e3)1.1e24.1e3{1.1}\mathrm{e}{-2}({4.1}\mathrm{e}{-3})1.1e2(4.1e3)1.1e24.1e3{1.1}\mathrm{e}{-2}({4.1}\mathrm{e}{-3})
1.062.77(0.65)2.770.652.77(0.65)6.2e6(2.4e5)6.2e62.4e5{6.2}\mathrm{e}{-6}({2.4}\mathrm{e}{-5})0.20(0.11)0.200.110.20(0.11)9.0e3(3.1e3)9.0e33.1e3{9.0}\mathrm{e}{-3}({3.1}\mathrm{e}{-3})9.0e3(3.1e3)9.0e33.1e3{9.0}\mathrm{e}{-3}({3.1}\mathrm{e}{-3})
2.062.84(0.68)2.840.682.84(0.68)0.0e0(0.0e0)0.0e00.0e0{0.0}\mathrm{e}{0}({0.0}\mathrm{e}{0})0.18(0.088)0.180.0880.18(0.088)7.0e3(1.9e3)7.0e31.9e3{7.0}\mathrm{e}{-3}({1.9}\mathrm{e}{-3})7.0e3(1.9e3)7.0e31.9e3{7.0}\mathrm{e}{-3}({1.9}\mathrm{e}{-3})

B Hyperparameter selection details for baselines

For cLapIRN baseline we used the regularization parameter value λ¯=0.05¯𝜆0.05\overline{\lambda}=0.05 for the OASIS datasets, value λ¯=0.1¯𝜆0.1\overline{\lambda}=0.1 for the LPBA40 dataset, and λ¯=0.01¯𝜆0.01\overline{\lambda}=0.01 for the Lung250M-4b dataset where λ¯¯𝜆\overline{\lambda} is used as in the paper presenting the method (Mok and Chung, 2021). The values were chosen based on the validation set results shown in Tables 9, 10, 11, and 12.

For ByConstructionICON baseline we used regularization parameter λ=0.5𝜆0.5\lambda=0.5 for the OASIS datasets, λ=1.0𝜆1.0\lambda=1.0 for the LPBA40 dataset, and λ=5.0𝜆5.0\lambda=5.0 for the Lung250M-4B dataset. The values were chosen based on the validation set results shown in Tables 13, 14, and 15.

We trained VoxelMorph with losses and regularization weight identical to our method and for SYMNet we used hyperparameters directly provided by Mok and Chung (2020a). We used the default number of convolution features for the baselines except for VoxelMorph we doubled the number of features, as that was suggested in the paper (Balakrishnan et al., 2019).

Table 9: Regularization parameter optimization results for cLapIRN calculated on the pre-aligned OASIS validation set. Here λ¯¯𝜆\overline{\lambda} refers to the normalized regularization weight of the gradient loss of cLapIRN and should be in range [0, 1]01[0,\ 1]. Value λ¯=0.05¯𝜆0.05\overline{\lambda}=0.05 was chosen. HD95 metric is not included due to relatively high computational cost.
HyperparametersAccuracyDeformation regularityConsistency
λ¯¯𝜆\overline{\lambda}Dice \uparrow|Jϕ|0subscriptsubscript𝐽italic-ϕabsent0absent|J_{\phi}|_{\leq 0}\downarrowstd(|Jϕ|)stdsubscript𝐽italic-ϕabsent\operatorname{std}(|J_{\phi}|)\downarrowCycle \downarrow
0.010.812(0.034)0.8120.0340.812(0.034)2.5e0(2.9e1)2.5e02.9e1{2.5}\mathrm{e}{0}({2.9}\mathrm{e}{-1})0.82(0.048)0.820.0480.82(0.048)1.7e0(1.5e1)1.7e01.5e1{1.7}\mathrm{e}{0}({1.5}\mathrm{e}{-1})
0.050.817(0.034)0.8170.0340.817(0.034)1.1e0(1.8e1)1.1e01.8e1{1.1}\mathrm{e}{0}({1.8}\mathrm{e}{-1})0.56(0.029)0.560.0290.56(0.029)1.2e0(1.3e1)1.2e01.3e1{1.2}\mathrm{e}{0}({1.3}\mathrm{e}{-1})
0.10.812(0.035)0.8120.0350.812(0.035)4.2e1(1.1e1)4.2e11.1e1{4.2}\mathrm{e}{-1}({1.1}\mathrm{e}{-1})0.43(0.020)0.430.0200.43(0.020)8.9e1(1.1e1)8.9e11.1e1{8.9}\mathrm{e}{-1}({1.1}\mathrm{e}{-1})
0.20.798(0.038)0.7980.0380.798(0.038)7.2e2(3.9e2)7.2e23.9e2{7.2}\mathrm{e}{-2}({3.9}\mathrm{e}{-2})0.30(0.013)0.300.0130.30(0.013)6.0e1(8.3e2)6.0e18.3e2{6.0}\mathrm{e}{-1}({8.3}\mathrm{e}{-2})
0.40.769(0.042)0.7690.0420.769(0.042)1.4e3(1.7e3)1.4e31.7e3{1.4}\mathrm{e}{-3}({1.7}\mathrm{e}{-3})0.18(0.0087)0.180.00870.18(0.0087)3.5e1(4.4e2)3.5e14.4e2{3.5}\mathrm{e}{-1}({4.4}\mathrm{e}{-2})
0.80.727(0.049)0.7270.0490.727(0.049)3.4e6(2.2e5)3.4e62.2e5{3.4}\mathrm{e}{-6}({2.2}\mathrm{e}{-5})0.10(0.0050)0.100.00500.10(0.0050)2.5e1(3.8e2)2.5e13.8e2{2.5}\mathrm{e}{-1}({3.8}\mathrm{e}{-2})
1.00.711(0.052)0.7110.0520.711(0.052)1.3e6(1.7e5)1.3e61.7e5{1.3}\mathrm{e}{-6}({1.7}\mathrm{e}{-5})0.082(0.0042)0.0820.00420.082(0.0042)2.3e1(3.8e2)2.3e13.8e2{2.3}\mathrm{e}{-1}({3.8}\mathrm{e}{-2})
Table 10: Regularization parameter optimization results for cLapIRN calculated on the OASIS raw validation set. The table is interpreted similarly to Table 9. Value λ¯=0.05¯𝜆0.05\overline{\lambda}=0.05 was chosen since it resulted in clearly the highest Dice score. HD95 metric is not included due to relatively high computational cost.
HyperparametersAccuracyDeformation regularityConsistency
λ¯¯𝜆\overline{\lambda}Dice \uparrow|Jϕ|0subscriptsubscript𝐽italic-ϕabsent0absent|J_{\phi}|_{\leq 0}\downarrowstd(|Jϕ|)stdsubscript𝐽italic-ϕabsent\operatorname{std}(|J_{\phi}|)\downarrowCycle \downarrow
0.010.736(0.11)0.7360.110.736(0.11)4.9e1(1.3e1)4.9e11.3e1{4.9}\mathrm{e}{-1}({1.3}\mathrm{e}{-1})0.36(0.040)0.360.0400.36(0.040)3.1e0(1.9e0)3.1e01.9e0{3.1}\mathrm{e}{0}({1.9}\mathrm{e}{0})
0.020.738(0.11)0.7380.110.738(0.11)5.1e1(1.3e1)5.1e11.3e1{5.1}\mathrm{e}{-1}({1.3}\mathrm{e}{-1})0.36(0.038)0.360.0380.36(0.038)3.2e0(2.2e0)3.2e02.2e0{3.2}\mathrm{e}{0}({2.2}\mathrm{e}{0})
0.050.740(0.11)0.7400.110.740(0.11)2.9e1(8.0e2)2.9e18.0e2{2.9}\mathrm{e}{-1}({8.0}\mathrm{e}{-2})0.28(0.028)0.280.0280.28(0.028)2.9e0(2.1e0)2.9e02.1e0{2.9}\mathrm{e}{0}({2.1}\mathrm{e}{0})
0.10.733(0.12)0.7330.120.733(0.12)9.7e2(3.4e2)9.7e23.4e2{9.7}\mathrm{e}{-2}({3.4}\mathrm{e}{-2})0.21(0.019)0.210.0190.21(0.019)2.6e0(2.1e0)2.6e02.1e0{2.6}\mathrm{e}{0}({2.1}\mathrm{e}{0})
Table 11: Regularization parameter optimization results for cLapIRN calculated on the LPBA40 validation set. The table is interpreted similarly to Table 9. Value λ¯=0.1¯𝜆0.1\overline{\lambda}=0.1 was chosen due to the best overall performance.
HyperparametersAccuracyDeformation regularityConsistency
λ¯¯𝜆\overline{\lambda}Dice \uparrowHD95 \downarrow|Jϕ|0subscriptsubscript𝐽italic-ϕabsent0absent|J_{\phi}|_{\leq 0}\downarrowstd(|Jϕ|)stdsubscript𝐽italic-ϕabsent\operatorname{std}(|J_{\phi}|)\downarrowCycle \downarrow
0.010.714(0.014)0.7140.0140.714(0.014)9.9e1(1.5e1)9.9e11.5e1{9.9}\mathrm{e}{-1}({1.5}\mathrm{e}{-1})0.45(0.029)0.450.0290.45(0.029)0.45(0.029)0.450.0290.45(0.029)9.9e1(2.2e1)9.9e12.2e1{9.9}\mathrm{e}{-1}({2.2}\mathrm{e}{-1})
0.050.715(0.014)0.7150.0140.715(0.014)3.2e1(6.8e2)3.2e16.8e2{3.2}\mathrm{e}{-1}({6.8}\mathrm{e}{-2})0.33(0.018)0.330.0180.33(0.018)0.33(0.018)0.330.0180.33(0.018)8.0e1(2.1e1)8.0e12.1e1{8.0}\mathrm{e}{-1}({2.1}\mathrm{e}{-1})
0.10.714(0.014)0.7140.0140.714(0.014)7.4e2(2.4e2)7.4e22.4e2{7.4}\mathrm{e}{-2}({2.4}\mathrm{e}{-2})0.25(0.012)0.250.0120.25(0.012)0.25(0.012)0.250.0120.25(0.012)6.6e1(2.1e1)6.6e12.1e1{6.6}\mathrm{e}{-1}({2.1}\mathrm{e}{-1})
0.20.709(0.015)0.7090.0150.709(0.015)4.4e3(2.4e3)4.4e32.4e3{4.4}\mathrm{e}{-3}({2.4}\mathrm{e}{-3})0.19(0.0090)0.190.00900.19(0.0090)0.19(0.0090)0.190.00900.19(0.0090)4.9e1(1.9e1)4.9e11.9e1{4.9}\mathrm{e}{-1}({1.9}\mathrm{e}{-1})
0.40.698(0.017)0.6980.0170.698(0.017)3.5e5(5.7e5)3.5e55.7e5{3.5}\mathrm{e}{-5}({5.7}\mathrm{e}{-5})0.13(0.0071)0.130.00710.13(0.0071)0.13(0.0071)0.130.00710.13(0.0071)3.6e1(1.9e1)3.6e11.9e1{3.6}\mathrm{e}{-1}({1.9}\mathrm{e}{-1})
0.80.678(0.019)0.6780.0190.678(0.019)5.0e6(2.2e5)5.0e62.2e5{5.0}\mathrm{e}{-6}({2.2}\mathrm{e}{-5})0.085(0.0062)0.0850.00620.085(0.0062)0.085(0.0062)0.0850.00620.085(0.0062)3.0e1(1.9e1)3.0e11.9e1{3.0}\mathrm{e}{-1}({1.9}\mathrm{e}{-1})
1.00.671(0.021)0.6710.0210.671(0.021)5.0e6(2.2e5)5.0e62.2e5{5.0}\mathrm{e}{-6}({2.2}\mathrm{e}{-5})0.074(0.0061)0.0740.00610.074(0.0061)0.074(0.0061)0.0740.00610.074(0.0061)3.0e1(1.9e1)3.0e11.9e1{3.0}\mathrm{e}{-1}({1.9}\mathrm{e}{-1})
Table 12: Regularization parameter optimization results for cLapIRN calculated on the Lung250M-4B validation set. The table is interpreted similarly to Table 9. Value λ¯=0.01¯𝜆0.01\overline{\lambda}=0.01 was chosen due to the best overall performance.
HyperparametersAccuracyDeformation regularityConsistency
λ¯¯𝜆\overline{\lambda}TRE \downarrow|Jϕ|0subscriptsubscript𝐽italic-ϕabsent0absent|J_{\phi}|_{\leq 0}\downarrowstd(|Jϕ|)stdsubscript𝐽italic-ϕabsent\operatorname{std}(|J_{\phi}|)\downarrowCycle \downarrow
0.014.33(1.5)4.331.54.33(1.5)1.7e3(2.1e3)1.7e32.1e3{1.7}\mathrm{e}{-3}({2.1}\mathrm{e}{-3})0.20(0.078)0.200.0780.20(0.078)5.8e0(6.0e0)5.8e06.0e0{5.8}\mathrm{e}{0}({6.0}\mathrm{e}{0})
0.054.35(1.6)4.351.64.35(1.6)1.4e3(1.7e3)1.4e31.7e3{1.4}\mathrm{e}{-3}({1.7}\mathrm{e}{-3})0.19(0.073)0.190.0730.19(0.073)5.8e0(6.3e0)5.8e06.3e0{5.8}\mathrm{e}{0}({6.3}\mathrm{e}{0})
0.14.41(1.7)4.411.74.41(1.7)8.9e4(1.2e3)8.9e41.2e3{8.9}\mathrm{e}{-4}({1.2}\mathrm{e}{-3})0.18(0.066)0.180.0660.18(0.066)5.9e0(6.8e0)5.9e06.8e0{5.9}\mathrm{e}{0}({6.8}\mathrm{e}{0})
0.24.67(2.1)4.672.14.67(2.1)3.9e4(5.9e4)3.9e45.9e4{3.9}\mathrm{e}{-4}({5.9}\mathrm{e}{-4})0.17(0.055)0.170.0550.17(0.055)6.1e0(8.1e0)6.1e08.1e0{6.1}\mathrm{e}{0}({8.1}\mathrm{e}{0})
0.45.51(3.6)5.513.65.51(3.6)1.5e4(2.9e4)1.5e42.9e4{1.5}\mathrm{e}{-4}({2.9}\mathrm{e}{-4})0.14(0.039)0.140.0390.14(0.039)6.9e0(1.1e1)6.9e01.1e1{6.9}\mathrm{e}{0}({1.1}\mathrm{e}{1})
0.87.00(5.7)7.005.77.00(5.7)1.1e4(2.1e4)1.1e42.1e4{1.1}\mathrm{e}{-4}({2.1}\mathrm{e}{-4})0.13(0.025)0.130.0250.13(0.025)3.7e0(3.6e0)3.7e03.6e0{3.7}\mathrm{e}{0}({3.6}\mathrm{e}{0})
1.07.39(5.9)7.395.97.39(5.9)7.3e5(2.0e4)7.3e52.0e4{7.3}\mathrm{e}{-5}({2.0}\mathrm{e}{-4})0.12(0.022)0.120.0220.12(0.022)3.0e0(1.7e0)3.0e01.7e0{3.0}\mathrm{e}{0}({1.7}\mathrm{e}{0})
Table 13: Regularization parameter optimization results for ByConstructionICON calculated on the pre-aligned OASIS validation set. Here λ𝜆\lambda refers to the weight of the regularizing bending energy loss used by the method. Value λ=0.5𝜆0.5\lambda=0.5 was chosen due to the best performance (the paper (Greer et al., 2023) used λ=5.0𝜆5.0\lambda=5.0 for the OASIS dataset which we found to be suboptimal). HD95 metric is not included due to relatively high computational cost.
HyperparametersAccuracyDeformation regularityConsistency
λ¯¯𝜆\overline{\lambda}Dice \uparrow|Jϕ|0subscriptsubscript𝐽italic-ϕabsent0absent|J_{\phi}|_{\leq 0}\downarrowstd(|Jϕ|)stdsubscript𝐽italic-ϕabsent\operatorname{std}(|J_{\phi}|)\downarrowCycle \downarrowInverse \downarrow
0.50.818(0.031)0.8180.0310.818(0.031)2.5e2(6.1e3)2.5e26.1e3{2.5}\mathrm{e}{-2}({6.1}\mathrm{e}{-3})0.48(0.045)0.480.0450.48(0.045)5.4e3(1.0e3)5.4e31.0e3{5.4}\mathrm{e}{-3}({1.0}\mathrm{e}{-3})5.4e3(1.0e3)5.4e31.0e3{5.4}\mathrm{e}{-3}({1.0}\mathrm{e}{-3})
1.00.815(0.031)0.8150.0310.815(0.031)6.6e3(2.3e3)6.6e32.3e3{6.6}\mathrm{e}{-3}({2.3}\mathrm{e}{-3})0.43(0.036)0.430.0360.43(0.036)2.8e3(4.9e4)2.8e34.9e4{2.8}\mathrm{e}{-3}({4.9}\mathrm{e}{-4})2.8e3(4.9e4)2.8e34.9e4{2.8}\mathrm{e}{-3}({4.9}\mathrm{e}{-4})
5.00.796(0.033)0.7960.0330.796(0.033)3.9e6(2.1e5)3.9e62.1e5{3.9}\mathrm{e}{-6}({2.1}\mathrm{e}{-5})0.29(0.021)0.290.0210.29(0.021)4.1e4(5.0e5)4.1e45.0e5{4.1}\mathrm{e}{-4}({5.0}\mathrm{e}{-5})4.1e4(5.0e5)4.1e45.0e5{4.1}\mathrm{e}{-4}({5.0}\mathrm{e}{-5})
Table 14: Regularization parameter optimization results for ByConstructionICON calculated on the LPBA40 validation set. The table is interpreted similarly to Table 13. Value λ=1.0𝜆1.0\lambda=1.0 was chosen due to the best overall performance. HD95 metric is not included due to relatively high computational cost.
HyperparametersAccuracyDeformation regularityConsistency
λ¯¯𝜆\overline{\lambda}Dice \uparrowHD95 \downarrow|Jϕ|0subscriptsubscript𝐽italic-ϕabsent0absent|J_{\phi}|_{\leq 0}\downarrowstd(|Jϕ|)stdsubscript𝐽italic-ϕabsent\operatorname{std}(|J_{\phi}|)\downarrowCycle \downarrowInverse \downarrow
0.50.686(0.020)0.6860.0200.686(0.020)6.23(0.46)6.230.466.23(0.46)2.8e2(1.2e2)2.8e21.2e2{2.8}\mathrm{e}{-2}({1.2}\mathrm{e}{-2})0.34(0.042)0.340.0420.34(0.042)3.9e3(1.1e3)3.9e31.1e3{3.9}\mathrm{e}{-3}({1.1}\mathrm{e}{-3})3.9e3(1.1e3)3.9e31.1e3{3.9}\mathrm{e}{-3}({1.1}\mathrm{e}{-3})
1.00.684(0.018)0.6840.0180.684(0.018)6.30(0.48)6.300.486.30(0.48)3.0e3(2.0e3)3.0e32.0e3{3.0}\mathrm{e}{-3}({2.0}\mathrm{e}{-3})0.27(0.025)0.270.0250.27(0.025)1.2e3(2.5e4)1.2e32.5e4{1.2}\mathrm{e}{-3}({2.5}\mathrm{e}{-4})1.2e3(2.5e4)1.2e32.5e4{1.2}\mathrm{e}{-3}({2.5}\mathrm{e}{-4})
Table 15: Regularization parameter optimization results for ByConstructionICON calculated on the Lung250M-4B validation set. The table is interpreted similarly to Table 13. Value λ=5.0𝜆5.0\lambda=5.0 was chosen due to the best performance.
HyperparametersAccuracyDeformation regularityConsistency
λ𝜆\lambdaTRE \downarrow|Jϕ|0subscriptsubscript𝐽italic-ϕabsent0absent|J_{\phi}|_{\leq 0}\downarrowstd(|Jϕ|)stdsubscript𝐽italic-ϕabsent\operatorname{std}(|J_{\phi}|)\downarrowCycle \downarrowInverse \downarrow
0.58.76(4.5)8.764.58.76(4.5)2.6e3(3.5e3)2.6e33.5e3{2.6}\mathrm{e}{-3}({3.5}\mathrm{e}{-3})0.32(0.15)0.320.150.32(0.15)1.2e3(9.7e4)1.2e39.7e4{1.2}\mathrm{e}{-3}({9.7}\mathrm{e}{-4})1.2e3(9.7e4)1.2e39.7e4{1.2}\mathrm{e}{-3}({9.7}\mathrm{e}{-4})
1.07.63(4.0)7.634.07.63(4.0)3.3e4(9.9e4)3.3e49.9e4{3.3}\mathrm{e}{-4}({9.9}\mathrm{e}{-4})0.25(0.12)0.250.120.25(0.12)4.8e4(4.7e4)4.8e44.7e4{4.8}\mathrm{e}{-4}({4.7}\mathrm{e}{-4})4.8e4(4.7e4)4.8e44.7e4{4.8}\mathrm{e}{-4}({4.7}\mathrm{e}{-4})
3.06.38(3.9)6.383.96.38(3.9)0.0e0.0(0.0e0.0)0.0e0.00.0e0.0{0.0}\mathrm{e}{0.0}({0.0}\mathrm{e}{0.0})0.17(0.10)0.170.100.17(0.10)1.1e4(8.4e5)1.1e48.4e5{1.1}\mathrm{e}{-4}({8.4}\mathrm{e}{-5})1.1e4(8.4e5)1.1e48.4e5{1.1}\mathrm{e}{-4}({8.4}\mathrm{e}{-5})
5.06.42(4.1)6.424.16.42(4.1)0.0e0.0(0.0e0.0)0.0e0.00.0e0.0{0.0}\mathrm{e}{0.0}({0.0}\mathrm{e}{0.0})0.15(0.089)0.150.0890.15(0.089)6.8e5(5.1e5)6.8e55.1e5{6.8}\mathrm{e}{-5}({5.1}\mathrm{e}{-5})6.8e5(5.1e5)6.8e55.1e5{6.8}\mathrm{e}{-5}({5.1}\mathrm{e}{-5})

C Proof of theoretical properties

While in the main text dependence of the intermediate outputs d11.5(k)superscriptsubscript𝑑11.5𝑘d_{1\to 1.5}^{(k)}, d21.5(k)superscriptsubscript𝑑21.5𝑘d_{2\to 1.5}^{(k)}, z1(k)subscriptsuperscript𝑧𝑘1z^{(k)}_{1}, z2(k)subscriptsuperscript𝑧𝑘2z^{(k)}_{2}, and δ(k)superscript𝛿𝑘\delta^{(k)} on the input images xA,xBsubscript𝑥𝐴subscript𝑥𝐵x_{A},x_{B} is not explicitly written, throughout this proof we include the dependence in the notation since it is relevant for the proof.

C.1 Inverse consistent by construction (Theorem 1)

Proof  Inverse consistency by construction follows directly from Equation 8:

f12(xA,xB)subscript𝑓12subscript𝑥𝐴subscript𝑥𝐵\displaystyle f_{1\to 2}(x_{A},x_{B})=d11.5(0)(xA,xB)d21.5(0)(xA,xB)1absentsuperscriptsubscript𝑑11.50subscript𝑥𝐴subscript𝑥𝐵superscriptsubscript𝑑21.50superscriptsubscript𝑥𝐴subscript𝑥𝐵1\displaystyle=d_{1\to 1.5}^{(0)}(x_{A},x_{B})\circ d_{2\to 1.5}^{(0)}(x_{A},x_{B})^{-1}
=(d21.5(0)(xA,xB)d11.5(0)(xA,xB)1)1absentsuperscriptsuperscriptsubscript𝑑21.50subscript𝑥𝐴subscript𝑥𝐵superscriptsubscript𝑑11.50superscriptsubscript𝑥𝐴subscript𝑥𝐵11\displaystyle=\left(d_{2\to 1.5}^{(0)}(x_{A},x_{B})\circ d_{1\to 1.5}^{(0)}(x_{A},x_{B})^{-1}\right)^{-1}
=f21(xA,xB)1absentsubscript𝑓21superscriptsubscript𝑥𝐴subscript𝑥𝐵1\displaystyle=f_{2\to 1}(x_{A},x_{B})^{-1}

 
Note that due to limited sampling resolution the inverse consistency error is not exactly zero despite of the proof. The same is true for earlier inverse consistent by construction registration methods, as discussed in Section 1.

To be more specific, sampling resolution puts a limit on the accuracy of the inverses obtained using deformation inversion layer, and also limits accuracy of compositions if deformations are resampled to their original resolution as part of the composition operation (see Section 3.7). While another possible source could be the fixed point iteration in deformation inversion layer converging imperfectly, that can be proven to be insignificant. As shown by Appendix D, the fixed point iteration is guaranteed to converge, and error caused by the lack of convergence of fixed point iteration can hence be controlled by the stopping criterion. In our experiments we used as a stopping criterion maximum inversion error within all the sampling locations reaching below one hundredth of a voxel, which is very small.

C.2 Symmetric by construction (Theorem 2)

Proof  We use induction. Assume that for any xAsubscript𝑥𝐴x_{A} and xBsubscript𝑥𝐵x_{B} at level k+1𝑘1k+1 the following holds: d11.5(k+1)(xA,xB)=d21.5(k+1)(xB,xA)subscriptsuperscript𝑑𝑘111.5subscript𝑥𝐴subscript𝑥𝐵subscriptsuperscript𝑑𝑘121.5subscript𝑥𝐵subscript𝑥𝐴d^{(k+1)}_{1\to 1.5}(x_{A},x_{B})=d^{(k+1)}_{2\to 1.5}(x_{B},x_{A}). For level K𝐾K it holds trivially since d11.5(K)(xA,xB)subscriptsuperscript𝑑𝐾11.5subscript𝑥𝐴subscript𝑥𝐵d^{(K)}_{1\to 1.5}(x_{A},x_{B}) and d21.5(K)(xA,xB)subscriptsuperscript𝑑𝐾21.5subscript𝑥𝐴subscript𝑥𝐵d^{(K)}_{2\to 1.5}(x_{A},x_{B}) are defined as identity deformations. Using the induction assumption we have at level k𝑘k:

z1(k)(xA,xB)=h(k)(xA)d11.5(K)(xA,xB)=h(k)(xA)d21.5(K)(xB,xA)=z2(k)(xB,xA)superscriptsubscript𝑧1𝑘subscript𝑥𝐴subscript𝑥𝐵superscript𝑘subscript𝑥𝐴subscriptsuperscript𝑑𝐾11.5subscript𝑥𝐴subscript𝑥𝐵superscript𝑘subscript𝑥𝐴subscriptsuperscript𝑑𝐾21.5subscript𝑥𝐵subscript𝑥𝐴superscriptsubscript𝑧2𝑘subscript𝑥𝐵subscript𝑥𝐴z_{1}^{(k)}(x_{A},x_{B})=h^{(k)}(x_{A})\circ d^{(K)}_{1\to 1.5}(x_{A},x_{B})=h^{(k)}(x_{A})\circ d^{(K)}_{2\to 1.5}(x_{B},x_{A})=z_{2}^{(k)}(x_{B},x_{A})\

Then also:

δ(k)(xA,xB)superscript𝛿𝑘subscript𝑥𝐴subscript𝑥𝐵\displaystyle\delta^{(k)}(x_{A},x_{B})=u(k)(z1(k)(xA,xB),z2(k)(xA,xB))u(k)(z2(k)(xA,xB),z1(k)(xA,xB))1absentsuperscript𝑢𝑘superscriptsubscript𝑧1𝑘subscript𝑥𝐴subscript𝑥𝐵superscriptsubscript𝑧2𝑘subscript𝑥𝐴subscript𝑥𝐵superscript𝑢𝑘superscriptsuperscriptsubscript𝑧2𝑘subscript𝑥𝐴subscript𝑥𝐵superscriptsubscript𝑧1𝑘subscript𝑥𝐴subscript𝑥𝐵1\displaystyle=u^{(k)}(z_{1}^{(k)}(x_{A},x_{B}),z_{2}^{(k)}(x_{A},x_{B}))\circ u^{(k)}(z_{2}^{(k)}(x_{A},x_{B}),z_{1}^{(k)}(x_{A},x_{B}))^{-1}
=u(k)(z2(k)(xB,xA),z1(k)(xB,xA))u(k)(z1(k)(xB,xA),z2(k)(xB,xA))1absentsuperscript𝑢𝑘superscriptsubscript𝑧2𝑘subscript𝑥𝐵subscript𝑥𝐴superscriptsubscript𝑧1𝑘subscript𝑥𝐵subscript𝑥𝐴superscript𝑢𝑘superscriptsuperscriptsubscript𝑧1𝑘subscript𝑥𝐵subscript𝑥𝐴superscriptsubscript𝑧2𝑘subscript𝑥𝐵subscript𝑥𝐴1\displaystyle=u^{(k)}(z_{2}^{(k)}(x_{B},x_{A}),z_{1}^{(k)}(x_{B},x_{A}))\circ u^{(k)}(z_{1}^{(k)}(x_{B},x_{A}),z_{2}^{(k)}(x_{B},x_{A}))^{-1}
=[u(k)(z1(k)(xB,xA),z2(k)(xB,xA))u(k)(z2(k)(xB,xA),z1(k)(xB,xA))1]1absentsuperscriptdelimited-[]superscript𝑢𝑘superscriptsubscript𝑧1𝑘subscript𝑥𝐵subscript𝑥𝐴superscriptsubscript𝑧2𝑘subscript𝑥𝐵subscript𝑥𝐴superscript𝑢𝑘superscriptsuperscriptsubscript𝑧2𝑘subscript𝑥𝐵subscript𝑥𝐴superscriptsubscript𝑧1𝑘subscript𝑥𝐵subscript𝑥𝐴11\displaystyle=\left[u^{(k)}(z_{1}^{(k)}(x_{B},x_{A}),z_{2}^{(k)}(x_{B},x_{A}))\circ u^{(k)}(z_{2}^{(k)}(x_{B},x_{A}),z_{1}^{(k)}(x_{B},x_{A}))^{-1}\right]^{-1}
=δ(k)(xB,xA)1absentsuperscript𝛿𝑘superscriptsubscript𝑥𝐵subscript𝑥𝐴1\displaystyle=\delta^{(k)}(x_{B},x_{A})^{-1}

Then we can finalize the induction step:

d11.5(k)(xA,xB)subscriptsuperscript𝑑𝑘11.5subscript𝑥𝐴subscript𝑥𝐵\displaystyle d^{(k)}_{1\to 1.5}(x_{A},x_{B})=d11.5(k+1)(xA,xB)δ(k)(xA,xB)absentsubscriptsuperscript𝑑𝑘111.5subscript𝑥𝐴subscript𝑥𝐵superscript𝛿𝑘subscript𝑥𝐴subscript𝑥𝐵\displaystyle=d^{(k+1)}_{1\to 1.5}(x_{A},x_{B})\circ\delta^{(k)}(x_{A},x_{B})
=d21.5(k+1)(xB,xA)δ(k)(xB,xA)1=d21.5(k)(xB,xA)absentsubscriptsuperscript𝑑𝑘121.5subscript𝑥𝐵subscript𝑥𝐴superscript𝛿𝑘superscriptsubscript𝑥𝐵subscript𝑥𝐴1subscriptsuperscript𝑑𝑘21.5subscript𝑥𝐵subscript𝑥𝐴\displaystyle=d^{(k+1)}_{2\to 1.5}(x_{B},x_{A})\circ\delta^{(k)}(x_{B},x_{A})^{-1}=d^{(k)}_{2\to 1.5}(x_{B},x_{A})

From this follows that the method is symmetric by construction:

f12(xA,xB)subscript𝑓12subscript𝑥𝐴subscript𝑥𝐵\displaystyle f_{1\to 2}(x_{A},x_{B})=d11.5(0)(xA,xB)d21.5(0)(xA,xB)1absentsuperscriptsubscript𝑑11.50subscript𝑥𝐴subscript𝑥𝐵superscriptsubscript𝑑21.50superscriptsubscript𝑥𝐴subscript𝑥𝐵1\displaystyle=d_{1\to 1.5}^{(0)}(x_{A},x_{B})\circ d_{2\to 1.5}^{(0)}(x_{A},x_{B})^{-1}
=d21.5(0)(xB,xA)d11.5(0)(xB,xA)1=f21(xB,xA)absentsuperscriptsubscript𝑑21.50subscript𝑥𝐵subscript𝑥𝐴superscriptsubscript𝑑11.50superscriptsubscript𝑥𝐵subscript𝑥𝐴1subscript𝑓21subscript𝑥𝐵subscript𝑥𝐴\displaystyle=d_{2\to 1.5}^{(0)}(x_{B},x_{A})\circ d_{1\to 1.5}^{(0)}(x_{B},x_{A})^{-1}=f_{2\to 1}(x_{B},x_{A})

 

The proven relation holds exactly.

C.3 Topology preserving (Theorem 3)

Proof  As shown by Appendix D, each u(k)superscript𝑢𝑘u^{(k)} produces topology preserving (everywhere positive Jacobian determinants) deformations (architecture of u(k)superscript𝑢𝑘u^{(k)} is described in Section 3.4). Since the overall deformation is composition of multiple outputs of u(k)superscript𝑢𝑘u^{(k)} and their inverses, the whole deformation has also everywhere positive Jacobians since the Jacobian determinants at each point can be obtained as a product of the Jacobian determinants of the composed deformations.  

The inveritibility is not perfect if the compositions of u(k)superscript𝑢𝑘u^{(k)} and their inverses are resampled to the input image resolution, as is common practice in image registration. However, invertibility everywhere can be achieved by storing all the individual deformations and evaluating the composed deformation as their true composition (see Section 3.7 on inference variants and the results in Section 5).

D Deriving the optimal bound for control points

D.1 Proof summary

As discussed in Section 3.4, we limit absolute values of the predicted cubic spline control points defining the displacement field by a hard constraint γ(k)superscript𝛾𝑘\gamma^{(k)} for each resolution level k{0,,K1}𝑘0𝐾1k\in\{0,\dots,K-1\}. We want to find optimal γ(k)superscript𝛾𝑘\gamma^{(k)} which ensures invertibility of individual deformations and convergence of the fixed point iteration in deformation inversion layer. Note that the proof provides a significantly shorter and more general proof of the theorems 1 and 4 in (Choi and Lee, 2000).

We start by showing the optimal bound in continuous case (infinite resolution), and then extend it to our discrete case where values between the B-spline samples are defined by linear interpolation.

The derived continuous bound γ𝛾\gamma equals the reciprocal of the maximum possible matrix \infty norm of the local Jacobian matrices over all possible generated displacement fields. The matrix norm of a local Jacobian matrix equals the local Lipschitz constant and hence the bound ensures that the Lipschitz constant with respect to the \infty norm stays under one, which by (Chen et al., 2008) guarantees both invertibility and convergence of the fixed point iteration. However, a straightforward formula for the bound is intractable computationally in three dimensions, and we simplify it by showing that due to symmetries the absolute value around the derivatives can be removed from the function being maximixed, yielding a formula which can be evaluated exactly. We further show with a counter example that the proposed bound is tight.

The bound equals the bound obtained by (Choi and Lee, 2000) using a very different approach, further validating the result (their proof does not cover the convergence of the fixed point iteration and is less general).

Note that the bound ensures positivity of the Jacobians, not only invertibility, since we are limiting displacements (and zero displacement deformation has trivially positive Jacobian).

D.2 Proof of the continuous case

By (Chen et al., 2008) a deformation field is invertible by the proposed deformation inversion layer (and hence invertible in general) if its displacement field is contractive mapping with respect to some norm (the convergence is then also with respect to that norm). In finite dimensions convergence in any p-norm is equal and hence we should choose the norm which gives the loosest bound.

Let us choose ||||||\cdot||_{\infty} norm for our analysis, which, as it turns out, gives the loosest possible bound. Then the Lipschitz constant of a displacement field is equivalent to the maximum ||||||\cdot||_{\infty} operator norm of the local Jacobian matrices of the displacement field.

Since for matrices ||||||\cdot||_{\infty} norm corresponds to maximum absolute row sum it is enough to consider one component of the displacement field.

Let B::𝐵B:\mathbb{R}\to\mathbb{R} be a centered cardinal B-spline of some degree (actually any continuous almost everywhere differentiable function with finite support is fine) and let us consider an infinite grid of control points ϕ:n:italic-ϕsuperscript𝑛\phi:\mathbb{Z}^{n}\to\mathbb{R} where n𝑛n is the dimensionality of the displacement field. For notational convinience, let us define a set N:={1,,n}assign𝑁1𝑛N:=\{1,\dots,n\}.

Now let fϕsubscript𝑓italic-ϕf_{\phi} be the n𝑛n-dimensional displacement field (or one component of it) defined by the control point grid:

fϕ(x)=αnϕ(α)iNB(xiαi)subscript𝑓italic-ϕ𝑥subscript𝛼superscript𝑛italic-ϕ𝛼subscriptproduct𝑖𝑁𝐵subscript𝑥𝑖subscript𝛼𝑖f_{\phi}(x)=\sum_{\alpha\in\mathbb{Z}^{n}}\phi(\alpha)\prod_{i\in N}B(x_{i}-\alpha_{i})(13)

Note that since the function B𝐵B has finite support the first sum over nsuperscript𝑛\mathbb{Z}^{n} can be defined as a finite sum for any x𝑥x and is hence well-defined. Also, without loss of generality it is enough to look at region x[0,1]n𝑥superscript01𝑛x\in[0,1]^{n} due to the unit spacing of the control point grid.

For partial derivatives fϕxj:n:subscript𝑓italic-ϕsubscript𝑥𝑗superscript𝑛\frac{\partial f_{\phi}}{\partial x_{j}}:\mathbb{R}^{n}\to\mathbb{R} we have

fϕxj(x):=αnϕ(α)B(xjαj)iN{j}B(xiαi)=αnϕαDj(xα)assignsubscript𝑓italic-ϕsubscript𝑥𝑗𝑥subscript𝛼superscript𝑛italic-ϕ𝛼superscript𝐵subscript𝑥𝑗subscript𝛼𝑗subscriptproduct𝑖𝑁𝑗𝐵subscript𝑥𝑖subscript𝛼𝑖subscript𝛼superscript𝑛subscriptitalic-ϕ𝛼superscript𝐷𝑗𝑥𝛼\frac{\partial f_{\phi}}{\partial x_{j}}(x):=\sum_{\alpha\in\mathbb{Z}^{n}}\phi({\alpha})\ B^{\prime}(x_{j}-\alpha_{j})\prod_{i\in N\setminus\{j\}}B(x_{i}-\alpha_{i})=\sum_{\alpha\in\mathbb{Z}^{n}}\phi_{\alpha}D^{j}(x-\alpha)(14)

where Dj(xα):=B(xjαj)iN{j}B(xiαi)assignsuperscript𝐷𝑗𝑥𝛼superscript𝐵subscript𝑥𝑗subscript𝛼𝑗subscriptproduct𝑖𝑁𝑗𝐵subscript𝑥𝑖subscript𝛼𝑖D^{j}(x-\alpha):=B^{\prime}(x_{j}-\alpha_{j})\prod_{i\in N\setminus\{j\}}B(x_{i}-\alpha_{i}).

Following the power set notation, let us denote control points limited to some set S𝑆S\subset\mathbb{R} as Snsuperscript𝑆superscript𝑛S^{\mathbb{Z}^{n}}. That is, if ϕSnitalic-ϕsuperscript𝑆superscript𝑛\phi\in S^{\mathbb{Z}^{n}}, then for all αn𝛼superscript𝑛\alpha\in\mathbb{Z}^{n}, ϕ(α)Sitalic-ϕ𝛼𝑆\phi(\alpha)\in S.

Lemma 4

For all ϕ]1/K~n,1/K~n[n\phi\in\ ]-1/\tilde{K}_{n},1/\tilde{K}_{n}[^{\mathbb{Z}^{n}}, fϕsubscript𝑓italic-ϕf_{\phi} is a contractive mapping with respect to the ||||||\cdot||_{\infty} norm, where

K~n:=maxx[0,1]nϕ~[1,1]njN|fϕ~xj(x)|.assignsubscript~𝐾𝑛subscript𝑥superscript01𝑛~italic-ϕsuperscript11superscript𝑛subscript𝑗𝑁subscript𝑓~italic-ϕsubscript𝑥𝑗𝑥\tilde{K}_{n}:=\max_{\begin{subarray}{l}x\in[0,1]^{n}\\ \tilde{\phi}\in[-1,1]^{\mathbb{Z}^{n}}\end{subarray}}\ \sum_{j\in N}\left|\frac{\partial f_{\tilde{\phi}}}{\partial x_{j}}(x)\right|.(15)

Proof  For all x[0,1]n𝑥superscript01𝑛x\in[0,1]^{n}, ϕ]1/K~n,1/K~n[n\phi\in\ ]-1/\tilde{K}_{n},1/\tilde{K}_{n}[^{\mathbb{Z}^{n}}

jN|fϕxj(x)|subscript𝑗𝑁subscript𝑓italic-ϕsubscript𝑥𝑗𝑥\displaystyle\sum_{j\in N}\left|\frac{\partial f_{\phi}}{\partial x_{j}}(x)\right|<maxx~[0,1]nϕ~[1/K~n,1/K~n]njN|fϕ~x~j(x~)|absentsubscript~𝑥superscript01𝑛~italic-ϕsuperscript1subscript~𝐾𝑛1subscript~𝐾𝑛superscript𝑛subscript𝑗𝑁subscript𝑓~italic-ϕsubscript~𝑥𝑗~𝑥\displaystyle<\max_{\begin{subarray}{l}\tilde{x}\in[0,1]^{n}\\ \tilde{\phi}\in[-1/\tilde{K}_{n},1/\tilde{K}_{n}]^{\mathbb{Z}^{n}}\end{subarray}}\ \sum_{j\in N}\left|\frac{\partial f_{\tilde{\phi}}}{\partial\tilde{x}_{j}}(\tilde{x})\right|(16)
=maxx~[0,1]nϕ~[1,1]njN|fϕ~/K~nx~j(x~)|absentsubscript~𝑥superscript01𝑛~italic-ϕsuperscript11superscript𝑛subscript𝑗𝑁subscript𝑓~italic-ϕsubscript~𝐾𝑛subscript~𝑥𝑗~𝑥\displaystyle=\max_{\begin{subarray}{l}\tilde{x}\in[0,1]^{n}\\ \tilde{\phi}\in[-1,1]^{\mathbb{Z}^{n}}\end{subarray}}\ \sum_{j\in N}\left|\frac{\partial f_{\tilde{\phi}/\tilde{K}_{n}}}{\partial\tilde{x}_{j}}(\tilde{x})\right|
=1K~nmaxx~[0,1]nϕ~[1,1]njN|fϕ~/K~nx~j(x~)|=K~nK~n=1.absent1subscript~𝐾𝑛subscript~𝑥superscript01𝑛~italic-ϕsuperscript11superscript𝑛subscript𝑗𝑁subscript𝑓~italic-ϕsubscript~𝐾𝑛subscript~𝑥𝑗~𝑥subscript~𝐾𝑛subscript~𝐾𝑛1\displaystyle=\frac{1}{\tilde{K}_{n}}\ \max_{\begin{subarray}{l}\tilde{x}\in[0,1]^{n}\\ \tilde{\phi}\in[-1,1]^{\mathbb{Z}^{n}}\end{subarray}}\ \sum_{j\in N}\left|\frac{\partial f_{\tilde{\phi}/\tilde{K}_{n}}}{\partial\tilde{x}_{j}}(\tilde{x})\right|=\frac{\tilde{K}_{n}}{\tilde{K}_{n}}=1.

Sums of absolute values of partial derivatives are exactly the ||||||\cdot||_{\infty} operator norms of the local Jacobian matrices of f𝑓f, hence f𝑓f is a contraction.  

Lemma 5

For any kN𝑘𝑁k\in N, x[0,1]n,ϕ[1,1]nformulae-sequence𝑥superscript01𝑛italic-ϕsuperscript11superscript𝑛x\in[0,1]^{n},\ \phi\in[-1,1]^{\mathbb{Z}^{n}}, we can find some x~[0,1]n,ϕ~[1,1]nformulae-sequence~𝑥superscript01𝑛~italic-ϕsuperscript11superscript𝑛\tilde{x}\in[0,1]^{n},\tilde{\phi}\in[-1,1]^{\mathbb{Z}^{n}} such that

fϕ~xj(x~)={fϕxj(x)forj=kfϕxj(x)forjN{k}.subscript𝑓~italic-ϕsubscript𝑥𝑗~𝑥casessubscript𝑓italic-ϕsubscript𝑥𝑗𝑥for𝑗𝑘subscript𝑓italic-ϕsubscript𝑥𝑗𝑥for𝑗𝑁𝑘\frac{\partial f_{\tilde{\phi}}}{\partial x_{j}}(\tilde{x})=\begin{cases}-\frac{\partial f_{\phi}}{\partial x_{j}}(x)&\quad\text{for}\ j=k\\ \frac{\partial f_{\phi}}{\partial x_{j}}(x)&\quad\text{for}\ j\in N\setminus\{k\}.\end{cases}(17)

Proof  The B-splines are symmetric around origin:

B(x)=B(x)B(x)=B(x)𝐵𝑥𝐵𝑥superscript𝐵𝑥superscript𝐵𝑥B(x)=B(-x)\implies B^{\prime}(x)=-B^{\prime}(-x)(18)

Let us propose

x~i:={1xi,wheniNkxi,wheni=kassignsubscript~𝑥𝑖cases1subscript𝑥𝑖when𝑖𝑁𝑘subscript𝑥𝑖when𝑖𝑘\tilde{x}_{i}:=\begin{cases}1-x_{i},&\quad\text{when}\ i\in N\setminus k\\ x_{i},&\quad\text{when}\ i=k\end{cases}(19)

and ϕ~:n:~italic-ϕsuperscript𝑛\tilde{\phi}:\mathbb{Z}^{n}\to\mathbb{R} as ϕ~(α):=ϕ(g(α))assign~italic-ϕ𝛼italic-ϕ𝑔𝛼\tilde{\phi}(\alpha):=-\phi(g(\alpha)) where g:nn:𝑔superscript𝑛superscript𝑛g:\mathbb{Z}^{n}\to\mathbb{Z}^{n} is a bijection defined as follows:

g(α)i:={1αi,wheniNkαi,wheni=k.assign𝑔subscript𝛼𝑖cases1subscript𝛼𝑖when𝑖𝑁𝑘subscript𝛼𝑖when𝑖𝑘g(\alpha)_{i}:=\begin{cases}1-\alpha_{i},&\quad\text{when}\ i\in N\setminus k\\ \alpha_{i},&\quad\text{when}\ i=k.\end{cases}(20)

Then for all αn𝛼superscript𝑛\alpha\in\mathbb{Z}^{n}:

Dk(x~α)superscript𝐷𝑘~𝑥𝛼\displaystyle D^{k}(\tilde{x}-\alpha)=B(x~kαk)iN{k}B(x~iαi)absentsuperscript𝐵subscript~𝑥𝑘subscript𝛼𝑘subscriptproduct𝑖𝑁𝑘𝐵subscript~𝑥𝑖subscript𝛼𝑖\displaystyle=B^{\prime}(\tilde{x}_{k}-\alpha_{k})\prod_{i\in N\setminus\{k\}}B(\tilde{x}_{i}-\alpha_{i})(21)
=B(xkg(α)k)iN{k}B((xig(α)i))absentsuperscript𝐵subscript𝑥𝑘𝑔subscript𝛼𝑘subscriptproduct𝑖𝑁𝑘𝐵subscript𝑥𝑖𝑔subscript𝛼𝑖\displaystyle=B^{\prime}(x_{k}-g(\alpha)_{k})\prod_{i\in N\setminus\{k\}}B(-(x_{i}-g(\alpha)_{i}))
=Dk(xg(α))absentsuperscript𝐷𝑘𝑥𝑔𝛼\displaystyle=D^{k}(x-g(\alpha))

which gives

fϕ~x~k(x~)subscript𝑓~italic-ϕsubscript~𝑥𝑘~𝑥\displaystyle\frac{\partial f_{\tilde{\phi}}}{\partial\tilde{x}_{k}}(\tilde{x})=αnϕ~(α)Dk(x~α)absentsubscript𝛼superscript𝑛~italic-ϕ𝛼superscript𝐷𝑘~𝑥𝛼\displaystyle=\sum_{\alpha\in\mathbb{Z}^{n}}\tilde{\phi}(\alpha)D^{k}(\tilde{x}-\alpha)(22)
=αnϕ(g(α))Dk(xg(α))absentsubscript𝛼superscript𝑛italic-ϕ𝑔𝛼superscript𝐷𝑘𝑥𝑔𝛼\displaystyle=\sum_{\alpha\in\mathbb{Z}^{n}}-\phi(g(\alpha))D^{k}(x-g(\alpha))   g is bijective
=fϕxk(x).absentsubscript𝑓italic-ϕsubscript𝑥𝑘𝑥\displaystyle=-\frac{\partial f_{\phi}}{\partial x_{k}}(x).

And for all jN{k}𝑗𝑁𝑘j\in N\setminus\{k\}, αn𝛼superscript𝑛\alpha\in\mathbb{Z}^{n}

Dj(x~α)superscript𝐷𝑗~𝑥𝛼\displaystyle D^{j}(\tilde{x}-\alpha)=B(x~jαj)iN{j}B(x~iαi)absentsuperscript𝐵subscript~𝑥𝑗subscript𝛼𝑗subscriptproduct𝑖𝑁𝑗𝐵subscript~𝑥𝑖subscript𝛼𝑖\displaystyle=B^{\prime}(\tilde{x}_{j}-\alpha_{j})\prod_{i\in N\setminus\{j\}}B(\tilde{x}_{i}-\alpha_{i})(23)
=B(x~jαj)B(x~kαk)iN{j,k}B(x~iαi)absentsuperscript𝐵subscript~𝑥𝑗subscript𝛼𝑗𝐵subscript~𝑥𝑘subscript𝛼𝑘subscriptproduct𝑖𝑁𝑗𝑘𝐵subscript~𝑥𝑖subscript𝛼𝑖\displaystyle=B^{\prime}(\tilde{x}_{j}-\alpha_{j})\ B(\tilde{x}_{k}-\alpha_{k})\prod_{i\in N\setminus\{j,k\}}B(\tilde{x}_{i}-\alpha_{i})
=B((xjg(α)j))B(xkg(α)k)iN{j,k}B((xig(α)i))absentsuperscript𝐵subscript𝑥𝑗𝑔subscript𝛼𝑗𝐵subscript𝑥𝑘𝑔subscript𝛼𝑘subscriptproduct𝑖𝑁𝑗𝑘𝐵subscript𝑥𝑖𝑔subscript𝛼𝑖\displaystyle=B^{\prime}(-(x_{j}-g(\alpha)_{j}))\ B(x_{k}-g(\alpha)_{k})\prod_{i\in N\setminus\{j,k\}}B(-(x_{i}-g(\alpha)_{i}))
=B(xjg(α)j)B(xkg(α)k)iN{j,k}B(xig(α)i)absentsuperscript𝐵subscript𝑥𝑗𝑔subscript𝛼𝑗𝐵subscript𝑥𝑘𝑔subscript𝛼𝑘subscriptproduct𝑖𝑁𝑗𝑘𝐵subscript𝑥𝑖𝑔subscript𝛼𝑖\displaystyle=-B^{\prime}(x_{j}-g(\alpha)_{j})\ B(x_{k}-g(\alpha)_{k})\prod_{i\in N\setminus\{j,k\}}B(x_{i}-g(\alpha)_{i})
=B(xjg(α)j)iN{j}B(xig(α)i)absentsuperscript𝐵subscript𝑥𝑗𝑔subscript𝛼𝑗subscriptproduct𝑖𝑁𝑗𝐵subscript𝑥𝑖𝑔subscript𝛼𝑖\displaystyle=-B^{\prime}(x_{j}-g(\alpha)_{j})\prod_{i\in N\setminus\{j\}}B(x_{i}-g(\alpha)_{i})
=Dk(xg(α))absentsuperscript𝐷𝑘𝑥𝑔𝛼\displaystyle=-D^{k}(x-g(\alpha))

which gives for all jN{k}𝑗𝑁𝑘j\in N\setminus\{k\}

fϕ~x~j(x~)subscript𝑓~italic-ϕsubscript~𝑥𝑗~𝑥\displaystyle\frac{\partial f_{\tilde{\phi}}}{\partial\tilde{x}_{j}}(\tilde{x})=αnϕ~(α)Dj(x~α)absentsubscript𝛼superscript𝑛~italic-ϕ𝛼superscript𝐷𝑗~𝑥𝛼\displaystyle=\sum_{\alpha\in\mathbb{Z}^{n}}\tilde{\phi}(\alpha)D^{j}(\tilde{x}-\alpha)(24)
=αnϕ(g(α))Dj(xg(α))absentsubscript𝛼superscript𝑛italic-ϕ𝑔𝛼superscript𝐷𝑗𝑥𝑔𝛼\displaystyle=\sum_{\alpha\in\mathbb{Z}^{n}}-\phi(g(\alpha))-D^{j}(x-g(\alpha))   g is bijective
=fϕxj(x).absentsubscript𝑓italic-ϕsubscript𝑥𝑗𝑥\displaystyle=\frac{\partial f_{\phi}}{\partial x_{j}}(x).

 

Theorem 6

For all ϕ]1/Kn,1/Kn[n\phi\in\ ]-1/K_{n},1/K_{n}[^{\mathbb{Z}^{n}}, fϕsubscript𝑓italic-ϕf_{\phi} is a contractive mapping with respect to the ||||||\cdot||_{\infty} norm, where

Kn:=maxx[0,1]nαn|jNDαj(x)|.assignsubscript𝐾𝑛subscript𝑥superscript01𝑛subscript𝛼superscript𝑛subscript𝑗𝑁subscriptsuperscript𝐷𝑗𝛼𝑥K_{n}:=\max_{\begin{subarray}{l}x\in[0,1]^{n}\end{subarray}}\ \sum_{\alpha\in\mathbb{Z}^{n}}\left|\sum_{j\in N}D^{j}_{\alpha}(x)\right|.(25)

Proof  Let us show that Kn=K~nsubscript𝐾𝑛subscript~𝐾𝑛K_{n}=\tilde{K}_{n}.

K~nsubscript~𝐾𝑛\displaystyle\tilde{K}_{n}=maxx[0,1]nϕ[1,1]njN|fϕxj(x)|absentsubscript𝑥superscript01𝑛italic-ϕsuperscript11superscript𝑛subscript𝑗𝑁subscript𝑓italic-ϕsubscript𝑥𝑗𝑥\displaystyle=\max_{\begin{subarray}{l}x\in[0,1]^{n}\\ \phi\in[-1,1]^{\mathbb{Z}^{n}}\end{subarray}}\ \sum_{j\in N}\left|\frac{\partial f_{\phi}}{\partial x_{j}}(x)\right|(26)
=maxx[0,1]nϕ[1,1]njNfϕxj(x)absentsubscript𝑥superscript01𝑛italic-ϕsuperscript11superscript𝑛subscript𝑗𝑁subscript𝑓italic-ϕsubscript𝑥𝑗𝑥\displaystyle=\max_{\begin{subarray}{l}x\in[0,1]^{n}\\ \phi\in[-1,1]^{\mathbb{Z}^{n}}\end{subarray}}\ \sum_{j\in N}\frac{\partial f_{\phi}}{\partial x_{j}}(x)   (Lemma 5)
=maxx[0,1]nϕ[1,1]njNαnϕαDj(xα)absentsubscript𝑥superscript01𝑛italic-ϕsuperscript11superscript𝑛subscript𝑗𝑁subscript𝛼superscript𝑛subscriptitalic-ϕ𝛼superscript𝐷𝑗𝑥𝛼\displaystyle=\max_{\begin{subarray}{l}x\in[0,1]^{n}\\ \phi\in[-1,1]^{\mathbb{Z}^{n}}\end{subarray}}\ \sum_{j\in N}\sum_{\alpha\in\mathbb{Z}^{n}}\phi_{\alpha}\ D^{j}(x-\alpha)
=maxx[0,1]nϕ[1,1]nαnϕαjNDj(xα)absentsubscript𝑥superscript01𝑛italic-ϕsuperscript11superscript𝑛subscript𝛼superscript𝑛subscriptitalic-ϕ𝛼subscript𝑗𝑁superscript𝐷𝑗𝑥𝛼\displaystyle=\max_{\begin{subarray}{l}x\in[0,1]^{n}\\ \phi\in[-1,1]^{\mathbb{Z}^{n}}\end{subarray}}\ \sum_{\alpha\in\mathbb{Z}^{n}}\phi_{\alpha}\ \sum_{j\in N}D^{j}(x-\alpha)
=maxx[0,1]nαn|jNDαj(x)|=Knabsentsubscript𝑥superscript01𝑛subscript𝛼superscript𝑛subscript𝑗𝑁subscriptsuperscript𝐷𝑗𝛼𝑥subscript𝐾𝑛\displaystyle=\max_{\begin{subarray}{l}x\in[0,1]^{n}\end{subarray}}\ \sum_{\alpha\in\mathbb{Z}^{n}}\left|\sum_{j\in N}D^{j}_{\alpha}(x)\right|=K_{n}

The last step follows from the obvious fact that the sum is maximized when choosing each ϕαsubscriptitalic-ϕ𝛼\phi_{\alpha} to be either 111 or 11-1 based on the sign of the inner sum jNDj(xα)subscript𝑗𝑁superscript𝐷𝑗𝑥𝛼\sum_{j\in N}D^{j}(x-\alpha).

By Lemma 4 f𝑓f is then a contractive mapping with respect to the ||||||\cdot||_{\infty} norm.  

Theorem 6 proves that if we limit the control point absolute values to be less than 1/Kn1subscript𝐾𝑛1/K_{n}, then the resulting deformation is invertible by the fixed point iteration. Also, approximating Knsubscript𝐾𝑛K_{n} accurately is possible at least for n3𝑛3n\leq 3. Subset of nsuperscript𝑛\mathbb{Z}^{n} over which the sum needs to be taken depends on the support of the function B𝐵B which again depends on the degree of the B-splines used.

Next we want to show that the obtained bound is also tight bound for invertibility of the deformation. That also then shows that ||||||\cdot||_{\infty} norm gives the loosest possible bound.

Since fϕsubscript𝑓italic-ϕf_{\phi} corresponds only to one component of a displacement field, let us consider a fully defined displacement field formed by stacking n𝑛n number of fϕsubscript𝑓italic-ϕf_{\phi} together. Let us define

gϕ(x):=(fϕ)iN.assignsubscript𝑔italic-ϕ𝑥subscriptsubscript𝑓italic-ϕ𝑖𝑁g_{\phi}(x):=(f_{\phi})_{i\in N}.(27)
Theorem 7

There exists ϕ[1/Kn,1/Kn]nitalic-ϕsuperscript1subscript𝐾𝑛1subscript𝐾𝑛superscript𝑛\phi\in[-1/K_{n},1/K_{n}]^{\mathbb{Z}^{n}}, x[0,1]n𝑥superscript01𝑛x\in[0,1]^{n} s.t. det(gϕx+I)(x)=0subscript𝑔italic-ϕ𝑥𝐼𝑥0\det\left(\frac{\partial g_{\phi}}{\partial x}+I\right)(x)=0 where gϕxsubscript𝑔italic-ϕ𝑥\frac{\partial g_{\phi}}{\partial x} is the Jacobian matrix of gϕsubscript𝑔italic-ϕg_{\phi} and I𝐼I is the identity matrix.

Proof  By Lemma 5 and Theorem 6 there exists x[0,1]n𝑥superscript01𝑛x\in[0,1]^{n} and ϕ~[1,1]n~italic-ϕsuperscript11superscript𝑛\tilde{\phi}\in[-1,1]^{\mathbb{Z}^{n}} such that

jNfϕ~xj(x)=Knsubscript𝑗𝑁subscript𝑓~italic-ϕsubscript𝑥𝑗𝑥subscript𝐾𝑛\sum_{j\in N}\frac{\partial f_{\tilde{\phi}}}{\partial x_{j}}(x)=-K_{n}(28)

where all fϕ~xj(x)<0subscript𝑓~italic-ϕsubscript𝑥𝑗𝑥0\frac{\partial f_{\tilde{\phi}}}{\partial x_{j}}(x)<0.

Let us define ϕ:=ϕ~/Kn[1/Kn,1/Kn]nassignitalic-ϕ~italic-ϕsubscript𝐾𝑛superscript1subscript𝐾𝑛1subscript𝐾𝑛superscript𝑛\phi:=\tilde{\phi}/K_{n}\in[-1/K_{n},1/K_{n}]^{\mathbb{Z}^{n}}. Then

jNfϕxj(x)=1.subscript𝑗𝑁subscript𝑓italic-ϕsubscript𝑥𝑗𝑥1\sum_{j\in N}\frac{\partial f_{\phi}}{\partial x_{j}}(x)=-1.(29)

Now let yn𝑦superscript𝑛y\in\mathbb{R}^{n} be a vector consisting only of values 111, that is y=:(1)iNy=:(1)_{i\in N}. Then one has

(gϕx+I)(x)ysubscript𝑔italic-ϕ𝑥𝐼𝑥𝑦\displaystyle\left(\frac{\partial g_{{\phi}}}{\partial x}+I\right)(x)y=(gϕx(x))y+yabsentsubscript𝑔italic-ϕ𝑥𝑥𝑦𝑦\displaystyle=\left(\frac{\partial g_{{\phi}}}{\partial x}(x)\right)y+y(30)
=(jN1fϕxj(x))iN+(1)iNabsentsubscriptsubscript𝑗𝑁1subscript𝑓italic-ϕsubscript𝑥𝑗𝑥𝑖𝑁subscript1𝑖𝑁\displaystyle=\left(\sum_{j\in N}1\ \frac{\partial f_{{\phi}}}{\partial x_{j}}(x)\right)_{i\in N}+(1)_{i\in N}
=(1)iN+(1)iN=0.absentsubscript1𝑖𝑁subscript1𝑖𝑁0\displaystyle=(-1)_{i\in N}+(1)_{i\in N}=0.

In other words y𝑦y is an eigenvector of (gϕx+I)(x)subscript𝑔italic-ϕ𝑥𝐼𝑥\left(\frac{\partial g_{{\phi}}}{\partial x}+I\right)(x) with eigenvalue 00 meaning that the determinant of (gϕx+I)(x)subscript𝑔italic-ϕ𝑥𝐼𝑥\left(\frac{\partial g_{{\phi}}}{\partial x}+I\right)(x) is also 00.  

The proposed bound is hence the loosest possible since the deformation can have zero Jacobian at the bound, meaning it is not invertible.

D.3 Sampling based case (Equation 9)

The bound used in practice, given in Equation 9, is slightly different to the bound proven in Theorem 6. The reason is that for computational efficiency we do not use directly the cubic B-spline representation for the displacement field but instead take only samples of the displacement field in the full image resolution (see Appendix 3.4), and use efficient bi- or trilinear interpolation for defining the intermediate values. As a result the continuous case bound does not apply anymore.

However, finding the exact bounds for our approximation equals evaluating the maximum in Theorem 6 over a finite set of sampling locations and replacing Dj(x)superscript𝐷𝑗𝑥D^{j}(x) with finite difference derivatives. The mathematical argument for that goes almost identically and will not be repeated here. However, to justify using finite difference derivatives, we need the following two trivial remarks:

  • When defining a displacement field using grid of values and bi- or trilinear interpolation, the highest value for ||||||\cdot||_{\infty} operator norm is obtained at the corners of each interpolation patch.

  • Due to symmetry, it is enough to check derivative only at one of 2nsuperscript2𝑛2^{n} corners of each bi- or trilinear interpolation patch in computing the maximum (corresponding to finite difference derivative in only one direction over each dimension).

Maximum is evaluated over the relative sampling locations with respect to the resolution of the control point grid (which is in the resolution of the features z1(k)superscriptsubscript𝑧1𝑘z_{1}^{(k)} and z2(k)superscriptsubscript𝑧2𝑘z_{2}^{(k)}). The exact sampling grid depends on how the sampling is implemented (which is an implementation detail), and in our case we used the locations X:={1/2+12k+1+i2k|i}n[0,1]nassign𝑋superscriptconditional-set121superscript2𝑘1𝑖superscript2𝑘𝑖𝑛superscript01𝑛X:=\{1/2+\frac{1}{2^{k+1}}+\frac{i}{2^{k}}\ |\ i\in\mathbb{Z}\}^{n}\cap[0,1]^{n} which have, without loss of generality, been again limited to the unit cube.

No additional insights are required to show that the equation 9 gives the optimal bound. For concrete values of Kn(k)subscriptsuperscript𝐾𝑘𝑛K^{(k)}_{n}, see Table 16.

Table 16: Values of K2subscript𝐾2K_{2} and K3subscript𝐾3K_{3} for different sampling rates with respect to the control point grid. The bound for Sampling rate=Sampling rate\text{Sampling rate}=\infty is from (Choi and Lee, 2000). For each resolution level we define the maximum control point absolute values γ(k)superscript𝛾𝑘\gamma^{(k)} as 0.99×1Kn(k)0.991subscriptsuperscript𝐾𝑘𝑛0.99\times\frac{1}{K^{(k)}_{n}} (in our experiments we have n=3𝑛3n=3 dimensional data). Codebase contains implementation for computing the value for other k𝑘k.
kSampling rateK2(k)subscriptsuperscript𝐾𝑘2K^{(k)}_{2}K3(k)subscriptsuperscript𝐾𝑘3K^{(k)}_{3}
012.2222222222.777777778
122.0311686202.594390728
242.0841878262.512366240
382.0635700232.495476474
4162.0570749512.489089713
5322.0521773942.484247818
6642.0493304912.481890143
71282.0478714772.480726430
82562.0471363802.480102049
\infty\infty2.0463926752.479472335

E Deformation inversion layer memory usage

We conducted an experiment on the memory usage of the deformation inversion layer compared to the stationary velocity field (SVF) framework (Arsigny et al., 2006) since SVF framework could also be used to implement the suggested architecture in practice.

With the SVF framework one could slightly simplify the deformation update Equation 5 to the form

U(k):=exp(u(k)(z1(k),z2(k))u(k)(z2(k),z1(k)))assignsuperscript𝑈𝑘superscript𝑢𝑘superscriptsubscript𝑧1𝑘superscriptsubscript𝑧2𝑘superscript𝑢𝑘superscriptsubscript𝑧2𝑘superscriptsubscript𝑧1𝑘U^{(k)}:=\exp(u^{(k)}(z_{1}^{(k)},z_{2}^{(k)})-u^{(k)}(z_{2}^{(k)},z_{1}^{(k)}))(31)

where exp\exp is the SVF integration (corresponding to Lie algebra exponentiation), and u(k)superscript𝑢𝑘u^{(k)} now predicts an auxiliary velocity field. We compared memory usage of this to our implementation, and used the implementation by Dalca et al. (2018) for SVF integration.

The results are shown in Table 17. Our version implemented using the deformation inversion layer requires 555 times less data to be stored in memory for the backward pass compared to the SVF integration. The peak memory usage during the inversion is also slightly lower. The memory saving is due to the memory efficient back-propagation through the fixed point iteration layers, which requires only the final inverted volume to be stored for backward pass. Since our architecture requires two such operations for each resolution level (U(k)superscript𝑈𝑘U^{(k)} and its inverse), the memory saved during training is significant.

Table 17: Memory usage comparison between deformation inversion layer and stationary velocity field (SVF) based implementations. The comparison is between executing Equation 5 using deformation inversion layers and executing Equation 31 using SVF integration implementation by Dalca et al. (2018). Between passes memory usage refers to the amount memory needed for storing values between forward and backward passes, and peak memory usage refers to the peak amount of memory needed during forward and backward passes. A volume of shape (256,256,256)256256256(256,256,256) with 323232 bit precision was used. We used 777 scalings and squarings for the SVF integration.
MethodBetween passes memory usage (GB) \downarrowPeak memory usage (GB) \downarrow
Deformation inversion layer0.56250.5625\bm{0.5625}3.93753.9375\bm{3.9375}
SVF integration2.81252.81252.81254.1254.1254.125

F Extended related work

In this appendix we aim to provide a more thorough analysis of the related work, by introducing in more detail the works that we find closest to our method, and explaining how our method differs from those.

F.1 Classical registration methods

Classical registration methods, as opposed to the learning based methods, optimize the deformation independently for any given single image pair. For this reason they are sometimes called optimization based methods.

DARTEL by Ashburner (2007) is a classical optimization based registration method built on top of the stationary velocity field (SVF) (Arsigny et al., 2006) framework offering symmetric by construction, inverse consistent, and topology preserving registration. The paper is to our knowledge the first symmetric by construction, inverse consistent, and topology preserving registration method.

SyN by Avants et al. (2008) is another classical symmetric by construction, inverse consistent, and topology preserving registration method. The properties are achieved by the Large Deformation Diffeomorphic Metric Mapping LDDMM framework (Beg et al., 2005) in which diffeormphisms are generated from time-varying velocity fields (as opposed to the stationary ones in the SVF framework). The LDDMM framework has not been used much in unsupervised deep learning for generating diffeomorphims due to its computational cost, but some works do exist (Shen et al., 2019b; Ramon et al., 2022; Wang et al., 2023), and others which are inspired by the framework but make significant modifications, and as a result lose the by construction topology preserving properties (Wang and Zhang, 2020; Wu et al., 2022; Joshi and Hong, 2023). SyN is to our knowledge the first work suggesting matching the images in the intermediate coordinates for achieving symmetry, the idea which was also used in our work. However, the usual implementation of SyN in ANTs (Avants et al., 2009) is not as a whole symmetric since the affine registration is not applied in a symmetric manner.

SyN has performed well in evaluation studies between different classical registration methods. e.g. (Klein et al., 2009). However, it is significantly slower than the strong baselines included in our study, and has already earlier been compared with those(Balakrishnan et al., 2019; Mok and Chung, 2020a, b), and hence was not included in our study.

F.2 Deep learning methods (earlier work)

Unlike the optimization based methods above, deep learning methods train a neural network that, for two given input images, outputs a deformation directly. The benefits of this class of methods include the significant speed improvement and more robust performance (avoiding local optima) (De Vos et al., 2019). Our model belongs to this class of methods.

SYMNet by Mok and Chung (2020a) uses as single forward pass of a U-Net style neural network to predict two stationary velocity fields, v11.5subscript𝑣11.5v_{1\to 1.5} and v21.5subscript𝑣21.5v_{2\to 1.5} (in practice two 3 channeled outputs are extracted from the last layer features using separate convolutions). The stationary velocity fields are integrated into two half-way deformations (and their inverses). The training loss matches the images both in the intermediate coordinates and in the original coordinate spaces (using the composed full deformations). While the network is a priori symmetric with respect to the input images, changing the input order of of the images (concatenating the inputs in the opposite order for the U-Net) can in principle result in any two v11.5subscript𝑣11.5v_{1\to 1.5} and v21.5subscript𝑣21.5v_{2\to 1.5} (instead of swapping them), meaning that the method is not symmetric by construction as defined in Section 1 (this is confirmed by the cycle consistency experiments).

Our method does not use the stationary velocity field (SVF) framework to invert the deformations, but instead uses the novel deformation inversion layer. Also, SYMNet does not employ the multi-resolution strategy. The use of intermediate coordinates is similar to our work.

MICS by Estienne et al. (2021) uses a shared convolutional encoder E𝐸E to encode both input images into some feature representations E(xA)𝐸subscript𝑥𝐴E(x_{A}) and E(xB)𝐸subscript𝑥𝐵E(x_{B}). A convolutional decoder network D𝐷D is then used to extract gradient volumes (constrained to contain only positive values) of the deformations for both forward and inverse deformations with formulas

f(xA,xB)12=D(E(xA)E(xB)) and f(xA,xB)21=D(E(xB)E(xA)).𝑓subscriptsubscript𝑥𝐴subscript𝑥𝐵12𝐷𝐸subscript𝑥𝐴𝐸subscript𝑥𝐵 and 𝑓subscriptsubscript𝑥𝐴subscript𝑥𝐵21𝐷𝐸subscript𝑥𝐵𝐸subscript𝑥𝐴\nabla f(x_{A},x_{B})_{1\to 2}=D(E(x_{A})-E(x_{B}))\text{ and }\nabla f(x_{A},x_{B})_{2\to 1}=D(E(x_{B})-E(x_{A})).(32)

The final deformations are obtained from the gradient volumes by a cumulative sum operation. Since gradients are constrained to be positive the resulting deformation will be roughly invertible. However, as stated in their work, this only puts a positive constraint on the diagonal of the Jacobians, not on its determinant, unlike our work which guarantees positive determinants (Theorem 3).

While MICS is symmetric by construction in the sense that swapping xAsubscript𝑥𝐴x_{A} and xBsubscript𝑥𝐵x_{B} will result in swapping the predicted forward and inverse deformations, this symmetry is achieved by subtraction (Equation 32) instead of mathematical inverse operation (as in our work, Equation 2). As a result the predicted ”forward” and ”inverse” deformations are not actually by construction constrained to be forward and inverse deformations of each other. MICS uses a loss to enforce this. Also, while MICS employs a multi-step approach, the symmetric by construction property is lost over the whole architecture due to not using the intermediate coordinate approach employed by our work.

Additional baselines: In addition to SYMNet, cLapIRN(Mok and Chung, 2021) was chosen as a baseline because it was the best method on OASIS dataset in the Learn2Reg challenge (Hering et al., 2022). It employs a standard and straightforward multi-resolution approach, and is not topology preserving, inverse consistent, or symmetric. Apart from the multi-resolution approach, it is not methodologically close to our method. VoxelMorph (Balakrishnan et al., 2019) is a standard baseline in deep learning based unsupervised registration, and it is based on a straightforward application of U-Net architecture to image registration.

F.3 Deep learning methods (recent methods parallel with our work)

These methods are very recent deep learning based registrations methods which have been developed independently of and in parallel with our work. A comparison with these methods and our model is therefore a fruitful topic for future research.

Iglesias (2023) uses a similar approach to achieve symmetry as our work but in the stationary velocity field (SVF) framework. In SVF framework, given some velocity field v𝑣v, we have the property that exp(v)=exp(v)1\exp(v)=\exp(-v)^{-1} where exp\exp represents the integration operation (Arsigny et al., 2006) generating the deformation from the velocity field. Hence to obtain symmetric by construction method, one can modify the formula 2 to the form

f(xA,xB):=exp(u(xA,xB)u(xB,xA)).assign𝑓subscript𝑥𝐴subscript𝑥𝐵𝑢subscript𝑥𝐴subscript𝑥𝐵𝑢subscript𝑥𝐵subscript𝑥𝐴f(x_{A},x_{B}):=\exp(u(x_{A},x_{B})-u(x_{B},x_{A})).(33)

which will result in a method which is symmetric by construction, inverse consistent, and topology preserving. We measure memory usage against this formulation in Appendix E and show that our formulation using the novel deformation inversion layer requires storing 555 times less memory for the backward pass. Their method includes only a single registration step, and not the robust multi-resolution architecture like ours.

Greer et al. (2023) extends the approach the approach by Iglesias (2023) to multi-step formulation in very similar way to how we construct our multi-resolution architecture by using the intermediate coordinates (square roots of deformations). They also employ the SVF framework, as opposed to our work which uses the deformation inversion layers, which, as shown in Appendix E, requires storing 555 times less memory for the backward pass, which is significant for being able to train the multi-step network with many steps on large images (such as our OASIS raw data). Also, their paper treats the separate steps of multi-step architecture as independent whereas we develop very efficient multi-resolution formulation based on first extracting the multi-resolution features using ResNet-style encoder.

Zhang et al. (2023) propose a symmetric by construction multi-resolution architecture that is similar to ours. Differences include again using the SVF framework, as well as applying losses at different resolutions, as opposed to us only applying them at the final level.

G Deformation inversion layer practical convergence

We conducted an experiment on the fixed point iteration convergence in the deformation inversion layers with the model trained on OASIS dataset. The results can be seen in Figure 5. The main result was that in the whole OASIS test set of 9591 pairs not a single deformation required more than 8 iterations for convergence. Deformations requiring 8 iterations were only 0.05%percent0.050.05\% of all the deformations and a significant majority of the deformations (96%percent9696\%) required 222 to 555 iterations. In all the experiments, including this one, the stopping criterion for the iterations was maximum displacement error within the whole volume reaching below one hundredth of a voxel, which is a very small error.

Refer to caption
Figure 5: Number of fixed point iterations required for convergence in deformation inversion layers with the model trained on OASIS dataset. The stopping criterion for the fixed point iteration was maximum displacement error within the whole volume reaching below one hundredth of a voxel. All deformation inversions for the whole OASIS test set are included.

H Additional visualizations

Figures 6, and 7 visualize the differences in inverse consistency, and cycle consistency respectively.

Figures 8 and 9 visualize dice scores for individual anatomical regions for both OASIS and LPBA40 datasets. VoxelMorph and SYMNet perform systematically worse than our method, while cLapIRN and our method perform very similarly on most regions.

Figure 10 visualizes how the deformation is being gradually updated during the multi-resolution architecture.

Refer to caption
Figure 6: Visual inverse consistency comparison. The deformation 𝒇(𝒙𝑨,𝒙𝑩)𝟏𝟐𝒇(𝒙𝑨,𝒙𝑩)𝟐𝟏𝒇subscriptsubscript𝒙𝑨subscript𝒙𝑩bold-→12𝒇subscriptsubscript𝒙𝑨subscript𝒙𝑩bold-→21\bm{f(x_{A},x_{B})_{1\to 2}\circ f(x_{A},x_{B})_{2\to 1}} is visualized for SITReg and SYMNet models for a single image pair in LPBA40 experiment. Since cLapIRN and VoxelMorph do not generate explicit inverses, they are not included in the figure. Ideally, f(xA,xB)12f(xA,xB)21𝑓subscriptsubscript𝑥𝐴subscript𝑥𝐵12𝑓subscriptsubscript𝑥𝐴subscript𝑥𝐵21f(x_{A},x_{B})_{1\to 2}\circ f(x_{A},x_{B})_{2\to 1} should equal the identity mapping, and as can be seen, the property is well fulfilled by all of the three methods. Only one axial slice of the predicted 3D deformation is visible.
Refer to caption
Figure 7: Visual cycle consistency comparison. The deformation composition f(xA,xB)f(xB,xA)𝑓subscript𝑥𝐴subscript𝑥𝐵𝑓subscript𝑥𝐵subscript𝑥𝐴f(x_{A},x_{B})\circ f(x_{B},x_{A}) is visualized for each model for a single image pair in LPBA40 experiment. Ideally, changing the order of the input images should result in the same coordinate mapping but in the inverse direction, since anatomical correspondence is not dependent on the input order. In other words, the deformation composition f(xA,xB)f(xB,xA)𝑓subscript𝑥𝐴subscript𝑥𝐵𝑓subscript𝑥𝐵subscript𝑥𝐴f(x_{A},x_{B})\circ f(x_{B},x_{A}) should equal the identity deformation. As can be seen, the property is only fulfilled (up to small sampling errors) by our method and ByConstructionICON. Only one axial slice of the predicted 3D deformation is shown.
Refer to caption
Figure 8: Individual brain structure dice scores for the OASIS experiment. Boxplot shows performance of each of the compared methods on each of the brain structures in the OASIS dataset. Algorithms from left to right in each group: SITReg, ByConstructionICON, cLapIRN, VoxelMorph, SYMNet (original)
Refer to caption
Figure 9: Individual brain structure dice scores for the LPBA40 experiment. Boxplot shows performance of each of the compared methods on each of the brain structures in the LPBA40 dataset. Algorithms from left to right in each group: SITReg, cLapIRN, VoxelMorph, ByConstructionICON, SYMNet (original)
Refer to caption
Figure 10: Visualization of deformation being gradually updated. Each d11.5(k)(d21.5(k))1superscriptsubscript𝑑11.5𝑘superscriptsuperscriptsubscript𝑑21.5𝑘1d_{1\to 1.5}^{(k)}\circ\left(d_{2\to 1.5}^{(k)}\right)^{-1} corresponds to the full deformation after resolution level k𝑘k. The example is from the OASIS experiment.

I Details on statistical significance

We computed statistical significance of the results comparing the test set predictions of the trained models with each other. We measured the statistical significance using permutation test, and in practice sampled 100001000010000 permutations.

To establish for certain the relative performance of the methods with respect to the tight metrics, one should train multiple models per method with different random seeds. However, our claim is not that the developed method improves the results with respect to a single tight metric but rather that the overall performance is better by a clear margin (see Section 6).

J Clarifications on symmetry, inverse consistency, and topology preservation

Here we provide examples of symmetry, inverse consistency and lack of topology preservation to further clarify how the terms are used in the paper.

Since symmetry and inverse consistency are quite similar properties, their exact difference might remain unclear. Examples of registration methods that are inverse consistent by construction but not symmetric are many deep learning frameworks applying the stationary velocity field (Arsigny et al., 2006) approach, e.g, (Dalca et al., 2018; Krebs et al., 2018, 2019; Mok and Chung, 2020a). All of them use a neural network to predict a velocity field for an ordered pair of input images. The final deformation is then produced via Lie algebra exponentiation of the velocity field, that is, by integrating the velocity field over itself over unit time. Details of the exponentiation are not important here but the operation has an interesting property: By negating the velocity field to be exponentiated, the exponentiation results in inverse deformation. Denoting the Lie algebra exponential by exp\exp, and using notation from Section 1, we can define such methods as

{f12(xA,xB):=exp(g(xA,xB))f21(xA,xB):=exp(g(xA,xB))casessubscript𝑓12subscript𝑥𝐴subscript𝑥𝐵assignabsent𝑔subscript𝑥𝐴subscript𝑥𝐵subscript𝑓21subscript𝑥𝐴subscript𝑥𝐵assignabsent𝑔subscript𝑥𝐴subscript𝑥𝐵\begin{cases}f_{1\to 2}(x_{A},x_{B})&:=\exp(g(x_{A},x_{B}))\\ f_{2\to 1}(x_{A},x_{B})&:=\exp(-g(x_{A},x_{B}))\end{cases}(34)

where g𝑔g is the learned neural network predicting the velocity field. As a result, the methods are inverse consistent by construction since exp(g(xA,xB))=exp(g(xA,xB))1\exp(g(x_{A},x_{B}))=\exp(-g(x_{A},x_{B}))^{-1} (accuracy is limited by spatial sampling resolution). However, by changing the order of inputs to g𝑔g, there is no guarantee that g(xA,xB)=g(xB,xA)𝑔subscript𝑥𝐴subscript𝑥𝐵𝑔subscript𝑥𝐵subscript𝑥𝐴g(x_{A},x_{B})=-g(x_{B},x_{A}) and hence such methods are not symmetric by construction.

MICS (Estienne et al., 2021) is an example of a method which is symmetric by construction but not inverse consistent. MICS is composed of two components: encoder, say E𝐸E, and decoder, say D𝐷D, both of which are learned. The method can be defined as

{f12(xA,xB):=D(E(xA,xB)E(xB,xA))f21(xA,xB):=D(E(xB,xA)E(xA,xB)).casessubscript𝑓12subscript𝑥𝐴subscript𝑥𝐵assignabsent𝐷𝐸subscript𝑥𝐴subscript𝑥𝐵𝐸subscript𝑥𝐵subscript𝑥𝐴subscript𝑓21subscript𝑥𝐴subscript𝑥𝐵assignabsent𝐷𝐸subscript𝑥𝐵subscript𝑥𝐴𝐸subscript𝑥𝐴subscript𝑥𝐵\begin{cases}f_{1\to 2}(x_{A},x_{B})&:=D(E(x_{A},x_{B})-E(x_{B},x_{A}))\\ f_{2\to 1}(x_{A},x_{B})&:=D(E(x_{B},x_{A})-E(x_{A},x_{B})).\end{cases}(35)

As a result, the method is symmetric by construction since f12(xA,xB)=f21(xB,xA)subscript𝑓12subscript𝑥𝐴subscript𝑥𝐵subscript𝑓21subscript𝑥𝐵subscript𝑥𝐴f_{1\to 2}(x_{A},x_{B})=f_{2\to 1}(x_{B},x_{A}) holds exactly. However, there is no architectural guarantee that f12(xA,xB)subscript𝑓12subscript𝑥𝐴subscript𝑥𝐵f_{1\to 2}(x_{A},x_{B}) and f21(xB,xA)subscript𝑓21subscript𝑥𝐵subscript𝑥𝐴f_{2\to 1}(x_{B},x_{A}) are inverses of each other, and the paper proposes to encourage that using a loss function. In the paper they use such components in multi-steps manner, and as a result the overall architecture is no longer symmetric.

Lack of topology preservation means in practice that the predicted deformation folds on top of itself. An example of such deformation is shown in Figure 11.

Refer to caption
Figure 11: Visualization of a 2D deformation which is not topology preserving. The deformation can be seen folding on top of itself.