Weakly Supervised Lymph Nodes Segmentation Based on Partial Instance Annotations with Pre-trained Dual-branch Network and Pseudo Label Learning

Litingyu Wang1, Yijie Qu1, Xiangde Luo1,2, Wenjun Liao1,3, Shichuan Zhang1,3, Guotai Wang1,2Orcid
1: University of Electronic Science and Technology of China, Chengdu, China, 2: Shanghai AI Laboratory, Shanghai, China, 3: Department of Radiation Oncology, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, Chengdu, China
Publication date: 2024/08/05
https://doi.org/10.59275/j.melba.2024-489g
PDF · Code · arXiv

Abstract

Assessing the presence of potentially malignant lymph nodes aids in estimating cancer progression, and identifying surrounding benign lymph nodes can assist in determining potential metastatic pathways for cancer. For quantitative analysis, automatic segmentation of lymph nodes is crucial. However, due to the labor-intensive and time-consuming manual annotation process required for a large number of lymph nodes, it is more practical to annotate only a subset of the lymph node instances to reduce annotation costs. In this study, we propose a pre-trained Dual-Branch network with Dynamically Mixed Pseudo label (DBDMP) to learn from partial instance annotations for lymph nodes segmentation. To obtain reliable pseudo labels for lymph nodes that are not annotated, we employ a dual-decoder network to generate different outputs that are then dynamically mixed. We integrate the original weak partial annotations with the mixed pseudo labels to supervise the network. To further leverage the extensive amount of unannotated voxels, we apply a self-supervised pre-training strategy to enhance the model’s feature extraction capability. Experiments on the mediastinal Lymph Node Quantification (LNQ) dataset demonstrate that our method, compared to directly learning from partial instance annotations, significantly improves the Dice Similarity Coefficient (DSC) from 11.04% to 54.10% and reduces the Average Symmetric Surface Distance (ASSD) from 20.83 mm to 8.72 mm. The code is available at https://github.com/WltyBY/LNQ2023_training_code

Keywords

Lymph Nodes Segmentation · Label-efficient Learning · Pseudo Labels

Bibtex @article{melba:2024:017:wang, title = "Weakly Supervised Lymph Nodes Segmentation Based on Partial Instance Annotations with Pre-trained Dual-branch Network and Pseudo Label Learning", author = "Wang, Litingyu and Qu, Yijie and Luo, Xiangde and Liao, Wenjun and Zhang, Shichuan and Wang, Guotai", journal = "Machine Learning for Biomedical Imaging", volume = "2", issue = "MICCAI 2023 LNQ challenge special issue", year = "2024", pages = "1030--1047", issn = "2766-905X", doi = "https://doi.org/10.59275/j.melba.2024-489g", url = "https://melba-journal.org/2024:017" }
RISTY - JOUR AU - Wang, Litingyu AU - Qu, Yijie AU - Luo, Xiangde AU - Liao, Wenjun AU - Zhang, Shichuan AU - Wang, Guotai PY - 2024 TI - Weakly Supervised Lymph Nodes Segmentation Based on Partial Instance Annotations with Pre-trained Dual-branch Network and Pseudo Label Learning T2 - Machine Learning for Biomedical Imaging VL - 2 IS - MICCAI 2023 LNQ challenge special issue SP - 1030 EP - 1047 SN - 2766-905X DO - https://doi.org/10.59275/j.melba.2024-489g UR - https://melba-journal.org/2024:017 ER -

2024:017 cover

Disclaimer: the following html version has been automatically generated and the PDF remains the reference version. Feedback can be sent directly to publishing-editor@melba-journal.org

1 Introduction

Lymph node segmentation is essential in various medical applications, particularly in the diagnosis, staging and treatment planning of diseases such as cancer (Bouget et al., 2023). By segmenting lymph nodes and monitoring factors, such as their size and shape, clinicians can track disease progression and formulate treatment plans (Li and Xia, 2020). Deep learning methods have shown promising results in medical image segmentation (Chen et al., 2021; Isensee et al., 2021). However, obtaining the necessary pixel-level annotations is extremely time-consuming and labor-intensive (Pathak et al., 2015). Therefore, leveraging weak annotations to train a deep learning model is highly desirable for reducing annotation costs.

Weakly supervised learning can be extremely beneficial for training deep learning models in lymph node segmentation. Lymph nodes are distributed throughout the human body, often small in size, making it impractical to label all lymph nodes in a given area. In traditional weakly supervised tasks, labels for training segmentation models typically include image-level (Fu et al., 2023), box-level (Oh et al., 2021), point-level (Zhai et al., 2023) and scribble-level (Luo et al., 2022a) annotations. Image-level annotations merely indicate the presence of the target in the image and lack detailed information on its shape, intensity, and location, potentially leading to subpar performance. Due to the subtle contrast between lymph nodes and surrounding tissues in Computed Tomography (CT) scans, providing supervision information around the boundary poses a challenge for box-level and point/scribble-level annotations. In contrast, this study adopts a different weak annotation strategy called partial instance annotation, where only a small subset of lymph nodes are annotated in a volume. This annotation method provides the model with more information about the target’s size, shape and boundary than the other weak annotations. Additionally, compared to fully supervised segmentation datasets, the annotation cost is significantly reduced.

Some researchers (Bouget et al., 2023; Feulner et al., 2013) have proposed annotating only a subset of lymph nodes for training, for example, only annotating lymph nodes with a high probability of disease based on size (minimum diameter value larger than 10 mm𝑚𝑚mm). However, some works still use fully supervised training procedures on datasets labeled in this manner. Oda et al. (2018) utilized Fully Convolutional Networks (FCNs) for mediastinal lymph node detection and segmentation, which is trained on annotated lymph nodes and other anatomical structures to address data imbalance. Bouget et al. (2023) introduced anatomical prior knowledge during training to assist the model in distinguishing lymph nodes from similar surrounding structures. However, solely segmenting diseased lymph nodes is inadequate for clinical use, as both diseased and normal lymph nodes are essential for diagnosing and treating diseases. Diseased lymph nodes offer insights into affected areas, while nearby normal lymph nodes can indicate potential metastasis paths for cancer. Therefore, it is imperative to segment both diseased and normal lymph nodes, even if only a subset of instances are annotated for training purpose.

In this work, we propose a novel framework named pre-trained Dual-Branch network with Dynamically Mixed Pseudo labels (DBDMP) which integrates self- and weakly supervised learning concepts along with noisy label learning to train a segmentation model with partial instance annotations. To better improve the feature extraction capability of the model, we employ a self-supervised pre-training method called Model Genesis (Zhou et al., 2021), which involves an image reconstruction task. Additionally, within the framework of weakly supervised learning, we utilize a noise-robust loss to enhance learning from partial instance annotations. Furthermore, to effectively leverage unlabeled pixels during training, we introduce a real-time pseudo label learning strategy. We dynamically mix the outputs from two decoders to obtain soft pseudo labels, which are more robust to noise compared to hard pseudo labels  (Müller et al., 2019). Subsequently, we merge the original partial annotations with the mixed predictions, leveraging the complementary information between the two kinds of labels for robust learning. The main contributions of this work are summarized as follows:

  • \bullet

    We propose a novel pseudo label generation strategy for learning from partial instance annotations for lymph node segmentation. By assigning pseudo labels to unannotated lymph nodes instead of directly treating them as background, our approach effectively enhances the segmentation model’s recall and reduces false negatives. Furthermore, the utilization of soft pseudo labels is more noise-tolerant than hard pseudo labels, which makes the training process more robust.

  • \bullet

    During the pseudo label learning stage, a consensus-aware Cross-Entropy loss is proposed. The weight of each pixel is determined by the consistency between the two predictions derived from the weakly supervised learning framework. This approach facilitates the gradual learning of newly predicted foreground voxels by the model while mitigating the risk of being misled by incorrect ones.

  • \bullet

    We adopt Model Genesis to initialize model parameters, enhancing the model’s capability to extract superior features and edge information through the reconstruction of corrupted images.

Our method was validated on the Mediastinal Lymph Node Quantification (LNQ) dataset, and promising results have been achieved. In the LNQ challenge held on MICCAI 2023, we secured the 4thsuperscript4𝑡4^{th} position without utilizing any additional datasets for training, while other participants used extra training sets. Furthermore, our final methods attained a Dice Similarity Coefficient (DSC) score of 54.10% on the validation set and 55.44% on the test set, which correspond to an improvement of 43.06 and 36.40 percentage points compared with supervised learning from the annotated instances only, respectively. It is worth noting that our best DSC on the test set is 57.36%.

2 Related Works

Lymph Nodes Segmentation

Numerous efforts based on traditional vision-based methods have been dedicated to lymph node detection and segmentation, including Marginal Space Learning (MSL) (Barbu et al., 2011) and atlas-based segmentation (Stapleford et al., 2010), etc. However, traditional methods may face challenges such as suboptimal performance or excessive computation time (Zhao et al., 2020). In recent years, deep learning has been applied to lymph node segmentation due to its outstanding performance in tasks such as image classification and segmentation. Nogues et al. (2016) presented a method for automatic segmentation of lymph node clusters in CT images using holistically-nested neural networks and structured optimization. Bouget et al. (2019) proposed a 2D pipeline that integrates the outputs of U-Net (Ronneberger et al., 2015) and Mask R-CNN (He et al., 2017) for segmentation and improves the performance with the instance detection. Xu et al. (2021) introduced a Cosine-Sine loss function and a multi-scale Atrous Spatial Pyramid Pooling (ASPP) module to the SegNet (Badrinarayanan et al., 2017) architecture to address the voxel class imbalance and enhance multi-scale information. Although these methods have achieved success in lymph node segmentation, they all relied on fully annotated training datasets with high annotation costs.

Label-efficient Learning

The objective of label-efficient learning is to reduce the cost and time required by the labeling process while achieving performance comparable to fully supervised methods, especially for image segmentation tasks where it is expensive and time-consuming to obtain dense annotations (Shen et al., 2023). Label-efficient learning techniques encompass semi-supervised learning (Luo et al., 2022b), active learning (Settles, 2009), weakly supervised learning (Luo et al., 2022a) and noisy label learning (Wang et al., 2020b), among others. For example, Lin et al. (2016) utilized a graphical model that jointly propagates information from scribbles to unlabeled pixels based on superpixels (Ren and Malik, 2003). Luo et al. (2022a) employed auxiliary branch to generate pseudo labels in real-time and used a specific loss function to expand the scribbled regions. To ensure robustness against inaccurate annotations in segmentation tasks, Liu et al. (2022) enforced multi-scale cross-view consistency, and Wang et al. (2020a) introduced a noise-robust Dice loss. Compared to inaccurate annotations in existing noise-robust methods, partial instance learning has a larger degree of errors due to that most instances have been erroneously taken as the background. Furthermore, as the annotation type is different from the above weak annotations, existing weakly supervised segmentation methods cannot be directly used for learning from partial instance annotations.

Self-supervised Learning

Self-supervised learning serves as a mechanism for models to learn rich feature representations from unlabeled data, thereby reducing reliance on large labeled datasets. This is commonly achieved by designing a pretext task. Gidaris et al. (2018) designed a classification-based pretext task to predict discretized rotation angles of an input image. Nogues et al. (2016) implemented the self-supervised task by solving jigsaw puzzles. Zhou et al. (2021) introduced Model Genesis which reconstructs a corrupted input to its original state. Designs like Model Genesis allow the model to extract universal image features effectively. Lei et al. (2021) proposed a novel contrastive learning approach, estimating the relative 3D offset between any pair of patches within the same volume. This method can perform well with just one-shot fine-tuning, while most other methods require fully supervised fine-tuning in the downstream task. However, in existing works, models trained by self-supervised learning are mainly fine-tuned with a small set of fully annotated images in downstream tasks, while applying them to weakly supervised learning has rarely been investigated.

Refer to caption
Figure 1: An overview of the proposed DBDMP which utilizes a dual-branch network with one shared encoder and two decoders. (a) In the self-supervised learning stage, Model Genesis is employed for pre-training. (b) In the downstream learning stage, a mixture of the outputs from the two decoders is combined with the original partial annotation to obtain a pseudo label. We also use a consensus-aware loss KLCEsubscript𝐾𝐿𝐶𝐸\mathcal{L}_{KLCE} to avoid over-fitting noise in the pseudo labels.

3 Methods

Fig. 1 illustrates the proposed partial instance annotation learning framework named DBDMP, which consists of a self-supervised pre-training stage and a pseudo label learning stage to deal with partial instance annotations. To achieve more stable predictions, we introduce a network with one encoder and two decoders to generate pseudo labels for unannotated instances. In self-supervised pre-training stage, as shown in Fig. 1(a), a model with dual branches is trained separately to improve feature extraction capabilities by reconstructing corrupted images. In Fig. 1(b), the outputs of the two decoders are mixed to obtain the pseudo label, aiming to leverage the prediction from the auxiliary branch to complement that from the main branch and supplement weak annotation information. To robustly learn the pseudo label, we introduce a consensus-aware loss function that assigns higher weights to voxels with more reliable pseudo labels. Additionally, to deal with the extreme imbalance between foreground and background voxels, we prioritize learning the foreground voxels with confidence and give lower weight to the learning of background voxels, with the aim of mining more potential lymph nodes that are unannotated.

3.1 Dual-branch Network

As shown in Fig. 1, the dual-branch network, which extends from the VNet architecture (Milletari et al., 2016), comprises a shared encoder and two decoders inspired by Luo et al. (2022a). Let’s denote the inputs of the encoder as x𝑥x and the output features as f𝑓f, which include features from the bottleneck layer and skip connections at different resolutions. The two decoders share the same structure, but have different inputs and parameters. The main decoder directly utilizes the features f𝑓f from the encoder as input, while the auxiliary decoder takes perturbed f𝑓f through dropout as input. We define the mappings of features from f𝑓f to the outputs of the main and auxiliary decoders as gm()subscript𝑔𝑚g_{m}(\cdot) and ga()subscript𝑔𝑎g_{a}(\cdot), respectively. In detail, one convolution block contains two convolution layers and a residual connection with Instance Normalization (IN) and Leaky ReLU activation function. Both the encoder and decoder are symmetrical structures with five different resolutions.

Refer to caption
Figure 2: The transformations used to generate the input for self-supervised training: I. Non-linear transformation, II. Local pixel shuffling, III. In-painting, IV. Out-painting. RCT (Randomly Composed Transformation) means that the basic transformations are composed, each with a probability to be used. Note that in-painting and out-painting are not performed together each time.

3.2 Self-supervised Pre-training

An appropriate pretext task can empower neural networks to learn low-level and high-level features that are conducive to downstream tasks (Jing and Tian, 2020). Zhou et al. (2021) introduced a self-supervised method called Model Genesis, which performs an image reconstruction process, and has shown promising results for downstream supervised segmentation tasks. Therefore, we use Model Genesis to pre-train the dual-branch network. Unlike the original Model Genesis that only trains one decoder, we extend it by training two decoders for the reconstruction process during pre-training.

Model Genesis (Zhou et al., 2021) employed three types of transformations on the original images, as detailed in Fig. 2: 1) Non-linear transformation integrates the Bézier Curve (Mortenson, 1999) to assign a unique determined value to each pixel, encouraging the model to focus on the information of image appearance and intensity distribution. 2) Local pixel shuffling samples a window smaller than the model’s receptive field in the patch and rearranges the internal pixels to encourage the model to learn the local texture and boundary. 3) Out-painting or In-painting: Out-painting sets the outer pixels of a shape to random values, while the inner pixels retain their original intensities. In-painting follows the opposite way. The network learns visual features of images by reconstructing the original images from the corrupted version. In the self-supervised pre-training as shown in Fig. 1(a), the outputs from the main and auxiliary decoders compute the Mean Squared Error (MSE) MSEsubscript𝑀𝑆𝐸\mathcal{L}_{MSE} separately with the original image y𝑦y. The reconstruction loss is defined as:

rec=MSE(pm,y)+MSE(pa,y)subscript𝑟𝑒𝑐subscript𝑀𝑆𝐸superscript𝑝𝑚𝑦subscript𝑀𝑆𝐸superscript𝑝𝑎𝑦\displaystyle\mathcal{L}_{rec}=\mathcal{L}_{MSE}(p^{m},y)+\mathcal{L}_{MSE}(p^{a},y)(1)

where pm=gm(f)superscript𝑝𝑚subscript𝑔𝑚𝑓p^{m}=g_{m}(f) and pa=ga(f)superscript𝑝𝑎subscript𝑔𝑎𝑓p^{a}=g_{a}(f) are the predictions of the main and auxiliary decoder, respectively. And y𝑦y denotes the reconstruction target, i.e., the original input image.

3.3 Supervised Loss for Partial Instance Annotations

Partial instance annotations can be considered as a noisy label learning problem, where some foreground regions are incorrectly labeled as the background. However, compared to conventional noisy label learning scenarios, the noise in the background of partial instance annotations can be excessive, with a large amount of false negatives. Therefore, we integrate weakly supervised learning and noisy label learning methods to effectively learn from partial instance annotations and generate reliable supervision signals. This combination enables us to mitigate the impact of excessive noise in the background and produce more effective supervision signals for the learning process.

Firstly, in our quest for more reliable supervision signals, we employ a noisy label learning technique to learn from partial instance annotation. Given a large amount of false negatives in the labels, we utilize the Symmetric Cross-Entropy (SCE) loss to balance the confidence between the partial instance annotation and the model’s prediction, as proposed by Wang et al. (2019):

SCE(p,y)=γ×CE(p,y)+CE(y,p)subscript𝑆𝐶𝐸𝑝𝑦𝛾subscript𝐶𝐸𝑝𝑦subscript𝐶𝐸𝑦𝑝\displaystyle\mathcal{L}_{SCE}(p,y)=\gamma\times\mathcal{L}_{CE}(p,y)+\mathcal{L}_{CE}(y,p)(2)

where CEsubscript𝐶𝐸\mathcal{L}_{CE} is the widely used Cross-Entropy loss. p𝑝p and y𝑦y are the predicted probability map in a certain branch and partial instance annotation. The relationship between γ𝛾\gamma and 1 indicates which one is more trustworthy between the prediction and the label. In this work we set γ𝛾\gamma to 0.8 due to that the partial instance annotation is less credible than the model’s prediction.

Secondly, we introduce the Partial Cross-Entropy (PCE) loss (Lee and Jeong, 2020) to ensure that the foreground in the partial instance label can be reliably learned. Unlike the PCE loss used in scribble-level annotations (Luo et al., 2022a), which supervises all the labeled voxels (including all target categories as well as the background) and does not calculate on unlabeled voxels, for partial instance annotations, we compute the cross-entropy only for the foreground voxels:

PCE(p,y)=iΩyilogpisubscript𝑃𝐶𝐸𝑝𝑦subscript𝑖Ωsubscript𝑦𝑖subscript𝑝𝑖\displaystyle\mathcal{L}_{PCE}(p,y)=-\sum_{i\in\Omega}y_{i}\log p_{i}(3)

where ΩΩ\Omega is the foreground voxels in partial instance annotation. pisubscript𝑝𝑖p_{i} and yisubscript𝑦𝑖y_{i} denote the predicted probability and partial instance annotation of voxel i𝑖i.

Finally, despite that PCEsubscript𝑃𝐶𝐸\mathcal{L}_{PCE} helps to improve recall of lymph nodes, it increases the risk of false positives. To deal with this problem, we additionally introduce a Tversky loss (Salehi et al., 2017) for supervision. Unlike Dice loss, which treats False Positives (FPs) and False Negatives (FNs) samples equally, the Tversky loss can balance the importance of both with different weights and mitigate class imbalance simultaneously:

Tversky(p,y)=TPTP+α×FP+(1α)×FNsubscript𝑇𝑣𝑒𝑟𝑠𝑘𝑦𝑝𝑦𝑇𝑃𝑇𝑃𝛼𝐹𝑃1𝛼𝐹𝑁\displaystyle\mathcal{L}_{Tversky}(p,y)=\frac{TP}{TP+\alpha\times FP+(1-\alpha)\times FN}(4)

where N𝑁N is the number of voxels, TP=i=1Npiyi𝑇𝑃superscriptsubscript𝑖1𝑁subscript𝑝𝑖subscript𝑦𝑖TP=\sum_{i=1}^{N}p_{i}y_{i}, FP=i=1Npi(1yi)𝐹𝑃superscriptsubscript𝑖1𝑁subscript𝑝𝑖1subscript𝑦𝑖FP=\sum_{i=1}^{N}p_{i}(1-y_{i}) and FN=i=1N(1pi)yi𝐹𝑁superscriptsubscript𝑖1𝑁1subscript𝑝𝑖subscript𝑦𝑖FN=\sum_{i=1}^{N}(1-p_{i})y_{i}. By adjusting the hyper-parameter α𝛼\alpha, we can control the importance between FPs and FNs. To predict more foreground voxels (false positive samples relative to partial instance annotation), α𝛼\alpha is set to 0.4 based on experiments.

For partial instance annotation, the supervised loss for each decoder is a combination of PCEsubscript𝑃𝐶𝐸\mathcal{L}_{PCE}, SCEsubscript𝑆𝐶𝐸\mathcal{L}_{SCE} and Tverskysubscript𝑇𝑣𝑒𝑟𝑠𝑘𝑦\mathcal{L}_{Tversky}:

sup(p,y)=SCE(p,y)+PCE(p,y)+Tversky(p,y)subscript𝑠𝑢𝑝𝑝𝑦subscript𝑆𝐶𝐸𝑝𝑦subscript𝑃𝐶𝐸𝑝𝑦subscript𝑇𝑣𝑒𝑟𝑠𝑘𝑦𝑝𝑦\displaystyle\mathcal{L}_{sup}(p,y)=\mathcal{L}_{SCE}(p,y)+\mathcal{L}_{PCE}(p,y)+\mathcal{L}_{Tversky}(p,y)(5)

3.4 Online Pseudo Label Learning

Due to the presence of incorrectly labeled background voxels in partial instance annotations, it is unreliable to directly extract supervisory signals from them. Inspired by pseudo label learning for scribble annotations (Luo et al., 2022a), we first dynamically mix the predictions from the two decoders:

y~mix=θ×pm+(1.0θ)×pasubscript~𝑦𝑚𝑖𝑥𝜃superscript𝑝𝑚1.0𝜃superscript𝑝𝑎\displaystyle\tilde{y}_{mix}=\theta\times p^{m}+(1.0-\theta)\times p^{a}(6)

where θ𝜃\theta is randomly generated from a uniform distribution between 0 and 1 at each iteration, enhancing the diversity of the pseudo label and compelling the model to continually update its predictions (Huo et al., 2021).

Then, we apply a sharpening function to adjust the entropy of the label distribution. The predicted probability of class k𝑘k can be define as:

y~k=ey~mixk/τjCey~mixj/τsuperscript~𝑦𝑘superscript𝑒superscriptsubscript~𝑦𝑚𝑖𝑥𝑘𝜏subscript𝑗𝐶superscript𝑒superscriptsubscript~𝑦𝑚𝑖𝑥𝑗𝜏\displaystyle\tilde{y}^{k}=\frac{e^{\tilde{y}_{mix}^{k}/\tau}}{\sum_{j\in C}e^{\tilde{y}_{mix}^{j}/\tau}}(7)

where y~mixksuperscriptsubscript~𝑦𝑚𝑖𝑥𝑘\tilde{y}_{mix}^{k} is the mixed output of class k𝑘k and C𝐶C is the set of all categories. τ𝜏\tau is a temperature that is normally set to 1 (Hinton et al., 2015). When τ>1𝜏1\tau>1, the labels become smoother, leading to increased entropy within the labels. Consequently, the information carried by negative labels is relatively amplified, directing the model training to pay more attention to negative labels. Conversely, when τ<1𝜏1\tau<1, the labels become sharper. Properly sharpening the labels can enhance their robustness to noise while also maintaining the differences between classes. We set τ𝜏\tau to 0.3 in our implementation.

Finally, we integrate the mixed pseudo label with the partial instance annotation to obtain the final pseudo label, ensuring that the pseudo label complements the partial annotation. The final pseudo label is denoted as y^^𝑦\hat{y}, and its ithsuperscript𝑖𝑡i^{th} element is defined as y^i=yi+(1.0yi)y~isubscript^𝑦𝑖subscript𝑦𝑖1.0subscript𝑦𝑖subscript~𝑦𝑖\hat{y}_{i}=y_{i}+(1.0-y_{i})\tilde{y}_{i}, i.e., the zero region in y𝑦y is replaced by the corresponding values from y~~𝑦\tilde{y}.

The perturbation introduced in the auxiliary decoder may lead to uncontrollable effects. Ideally, predictions for background voxels near the classification boundary should shift towards the foreground space. However, foreground voxels in partial instance annotations may be predicted as background, leading to misleading effects in the model’s training. To mitigate such adverse effects, we only learn from pixels with minor discrepancies based on the consistency of the two outputs, ensuring a smooth and gradual learning process. We utilize Kullback-Leibler (KL) divergence to estimate the consistency of the two outputs and use it to generate voxel-wise weights for the Cross-Entropy loss. Following the approach outlined in Zheng and Yang (2021), the weight for voxel i𝑖i is defined as:

𝒲i=eKL(pim,pia)subscript𝒲𝑖superscript𝑒𝐾𝐿subscriptsuperscript𝑝𝑚𝑖subscriptsuperscript𝑝𝑎𝑖\displaystyle\mathcal{W}_{i}=e^{-KL(p^{m}_{i},p^{a}_{i})}(8)

where KL(pim,pia)=pialog(pia/pim)𝐾𝐿subscriptsuperscript𝑝𝑚𝑖subscriptsuperscript𝑝𝑎𝑖subscriptsuperscript𝑝𝑎𝑖𝑙𝑜𝑔subscriptsuperscript𝑝𝑎𝑖subscriptsuperscript𝑝𝑚𝑖KL(p^{m}_{i},p^{a}_{i})=p^{a}_{i}log(p^{a}_{i}/p^{m}_{i}) is the KL divergence loss calculated from the ithsuperscript𝑖𝑡i^{th} voxel of the two probability maps pmsuperscript𝑝𝑚p^{m} and pasuperscript𝑝𝑎p^{a}. When the predictions of a certain voxel from the main decoder and the auxiliary decoder are highly dissimilar, Eq. (8) will lead to a lower value of 𝒲isubscript𝒲𝑖\mathcal{W}_{i}. Based on this observation, the learning loss of the main branch for pseudo label can be formulated as:

p(pm,pa,y^)=1Wi[𝒲i(y^ilogpim)+KL(pim,pia)]subscript𝑝superscript𝑝𝑚superscript𝑝𝑎^𝑦1𝑊subscript𝑖delimited-[]subscript𝒲𝑖subscript^𝑦𝑖subscriptsuperscript𝑝𝑚𝑖𝐾𝐿subscriptsuperscript𝑝𝑚𝑖subscriptsuperscript𝑝𝑎𝑖\displaystyle\mathcal{L}_{p}(p^{m},p^{a},\hat{y})=\frac{1}{W}\sum_{i}[\mathcal{W}_{i}(-\hat{y}_{i}\log p^{m}_{i})+KL(p^{m}_{i},p^{a}_{i})](9)

where W𝑊W is the sum of 𝒲isubscript𝒲𝑖\mathcal{W}_{i} for all voxels. The introduction of KL𝐾𝐿KL can avoid excessive discrepancies between the predictions of the two decoders.

The proposed DBDMP framework learns from both partial instance annotation and pseudo label by minimizing the following combined objective function:

total=subscript𝑡𝑜𝑡𝑎𝑙absent\displaystyle\mathcal{L}_{total}=sup(pm,y)+sup(pa,y)subscript𝑠𝑢𝑝superscript𝑝𝑚𝑦subscript𝑠𝑢𝑝superscript𝑝𝑎𝑦\displaystyle\mathcal{L}_{sup}(p^{m},y)+\mathcal{L}_{sup}(p^{a},y)
+λp[p(pm,pa,y^)+p(pa,pm,y^)]subscript𝜆𝑝delimited-[]subscript𝑝superscript𝑝𝑚superscript𝑝𝑎^𝑦subscript𝑝superscript𝑝𝑎superscript𝑝𝑚^𝑦\displaystyle+\lambda_{p}[\mathcal{L}_{p}(p^{m},p^{a},\hat{y})+\mathcal{L}_{p}(p^{a},p^{m},\hat{y})](10)

where y𝑦y and y^^𝑦\hat{y} are partial instance annotation and the generated pseudo label, respectively. λpsubscript𝜆𝑝\lambda_{p} is the trade-off weight that schedules with an epoch-dependent sigmoid-like ramp-up function in the first 100 epochs as the pseudo labels in the early training stage can be in poor equality:

λp=λ×e5×(1t/tmax)2subscript𝜆𝑝𝜆superscript𝑒5superscript1𝑡subscript𝑡𝑚𝑎𝑥2\lambda_{p}=\lambda\times e^{-5\times(1-t/t_{max})^{2}}(11)

where λ𝜆\lambda is a hyper-parameter that represents the final value of λpsubscript𝜆𝑝\lambda_{p}. tmaxsubscript𝑡𝑚𝑎𝑥t_{max} is set to 99 which means the maximal epoch for ramp-up and t𝑡t is the current epoch.

4 Experiments

4.1 Dataset

LNQ2023 Challenge Dataset

The Mediastinal Lymph Node Quantification (LNQ): Segmentation of Heterogeneous CT Data Challenge dataset includes 513 CT volumes. Each volume contains 48 to 656 slices with slice thickness ranging from 2.0 to 5.5 mm𝑚𝑚mm and pixel size 1.0 mm𝑚𝑚mm ×\times 1.0 mm𝑚𝑚mm. The matrix size in the axial plane is 512×\times512. The images were split at patient level into 393, 20, and 100 for training, validation, and testing, respectively. In the training set, cases are partially annotated, meaning only one or several positive lymph nodes in the volumes are labeled, while all diseased lymph nodes in the validation and test sets are fully annotated.

Refer to caption
Figure 3: Sensitivity analysis of hyper-parameters τ𝜏\tau, γ𝛾\gamma, α𝛼\alpha and λ𝜆\lambda, respectively.

4.2 Implementation Details

Our method was implemented in nnUNet (Isensee et al., 2021), which is a Pytorch-based (Paszke et al., 2019) toolkit for image computing with deep learning. The implementation was carried out on a single NVIDIA 2080Ti GPU with 11GB VRAM. We utilized a VNet-like (Milletari et al., 2016) network as the backbone for all experiments, and extended it to two decoders, as detailed in Section 3.1.

For preprocessing, we first cropped the volumes to the lung region based on intensity. Subsequently, we resampled each volume into the resolution of 3.0 mm×mm\times0.8 mm×mm\times0.8 mm𝑚𝑚mm. Finally, we normalized each volume to have zero mean and unit variance. Our networks were trained using a patch-based approach with a patch size of 224×128×6422412864224\times 128\times 64 and a batch size of 2. We employed the polynomial learning rate strategy to decay the learning rate in each epoch.

For self-supervised pre-training in Fig. 1(a), we used Stochastic Gradient Descent (SGD) optimizer with a momentum of 0.99, an initial learning rate of 0.01, and weight decay of 3×1053superscript1053\times 10^{-5} to minimize the reconstruction loss in Eq. (1). The training process lasted for 1000 epochs, with 250 iterations in each epoch. During weakly supervised training, the segmentation model was initialized with the weights obtained from self-supervised pre-training. We minimized the loss functions in Eq. (10) using SGD optimizer with a momentum of 0.9, while keeping the other parameters the same as those in the self-supervised pre-training stage. The training epoch was 300 with 250 iterations in each epoch.

During the inference stage, we loaded the weights from the final epoch and only utilized predictions from the main decoder as the final outputs. All inference processes were conducted using a sliding window strategy. We also applied a specific post-processing method, which involved removing lymph node regions at the boundaries of an image and eliminating a portion of the lymph nodes based on voxel intensity and the actual volume. For quantitative evaluation, we calculated the Dice Similarity Coefficient (DSC) and the Average Symmetric Surface Distance (ASSD) between a segmentation result and the ground truth. In light of the potential for samples with failed predictions, the ASSD for these samples is missing, so we fill them with the maximum value of the successfully calculated ASSD and then average them. This may result in larger mean ASSD values.

Table 1: Ablation study of our proposed method on LNQ2023 Challenge Dataset for both validation set and test set. DBN is the Dual-branch Network detailed in Section 3.1. DBN\dagger means using the pre-training weight generated by Model Genesis (Zhou et al., 2021) in Section 3.2. Dicesubscript𝐷𝑖𝑐𝑒\mathcal{L}_{Dice} is the widely used Dice loss and KLCEsubscript𝐾𝐿𝐶𝐸\mathcal{L}_{KLCE} is explained in detail in Eq. (9). Training VNet with one encoder and one decoder by CEsubscript𝐶𝐸\mathcal{L}_{CE} is served as the baseline method.
Networksupsubscript𝑠𝑢𝑝\mathcal{L}_{sup}psubscript𝑝\mathcal{L}_{p}Validation SetTest Set    [t]
DSC\uparrow(%)ASSD\downarrow(mm)DSC\uparrow(%)ASSD\downarrow(mm)   
(a)DBNCEsubscript𝐶𝐸\mathcal{L}_{CE}Dicesubscript𝐷𝑖𝑐𝑒\mathcal{L}_{Dice}15.12±14.7526.94±11.7823.22±17.5025.41±15.33
(b)DBNCE+Tverskysubscript𝐶𝐸subscript𝑇𝑣𝑒𝑟𝑠𝑘𝑦\mathcal{L}_{CE}+\mathcal{L}_{Tversky}Dicesubscript𝐷𝑖𝑐𝑒\mathcal{L}_{Dice}34.99±25.8216.39±12.7345.10±21.5813.54±12.05
(c)DBNCE+Tverskysubscript𝐶𝐸subscript𝑇𝑣𝑒𝑟𝑠𝑘𝑦\mathcal{L}_{CE}+\mathcal{L}_{Tversky}KLCEsubscript𝐾𝐿𝐶𝐸\mathcal{L}_{KLCE}32.10±24.2915.88±9.0940.26±22.3915.19±10.94
(d)DBNSCE+Tverskysubscript𝑆𝐶𝐸subscript𝑇𝑣𝑒𝑟𝑠𝑘𝑦\mathcal{L}_{SCE}+\mathcal{L}_{Tversky}KLCEsubscript𝐾𝐿𝐶𝐸\mathcal{L}_{KLCE}31.48±25.6817.63±11.8439.34±22.5816.54±15.75
(e)DBNPCE+Tverskysubscript𝑃𝐶𝐸subscript𝑇𝑣𝑒𝑟𝑠𝑘𝑦\mathcal{L}_{PCE}+\mathcal{L}_{Tversky}KLCEsubscript𝐾𝐿𝐶𝐸\mathcal{L}_{KLCE}52.53±22.298.39±6.8457.36±17.099.85±13.25
(f)DBNPCE+SCE+Tverskysubscript𝑃𝐶𝐸subscript𝑆𝐶𝐸subscript𝑇𝑣𝑒𝑟𝑠𝑘𝑦\mathcal{L}_{PCE}+\mathcal{L}_{SCE}+\mathcal{L}_{Tversky}KLCEsubscript𝐾𝐿𝐶𝐸\mathcal{L}_{KLCE}53.31±20.408.07±6.5556.10±17.5810.28±12.75
(g)DBN\daggerPCE+SCE+Tverskysubscript𝑃𝐶𝐸subscript𝑆𝐶𝐸subscript𝑇𝑣𝑒𝑟𝑠𝑘𝑦\mathcal{L}_{PCE}+\mathcal{L}_{SCE}+\mathcal{L}_{Tversky}KLCEsubscript𝐾𝐿𝐶𝐸\mathcal{L}_{KLCE}54.10±21.928.72±7.7155.44±18.989.35±7.69 [b]
BaselineVNetCEsubscript𝐶𝐸\mathcal{L}_{CE}11.04±17.8620.83±8.6419.04±19.4018.23±10.10
Refer to caption
Figure 4: Visualization of segmentation results of the ablation study in Table 1.

4.3 Results

Sensitivity Analysis of Some Hyper-parameters

We conducted experiments to evaluate the sensitivity of the hyper-parameters τ𝜏\tau in Eq. (7), γ𝛾\gamma in Eq. (2), α𝛼\alpha in Eq. (4) and λ𝜆\lambda in Eq. (11). Fig. 3 presents the results obtained on the validation set.

The hyper-parameter τ𝜏\tau governs the extent of sharpening applied to the soft pseudo labels. We investigated the segmentation performance of the proposed framework by setting τ𝜏\tau to 0.1, 0.2, 0.3, and 0.4, respectively. As illustrated in Fig. 3(a), when τ𝜏\tau increases from 0.1 to 0.3, the DSC score improves. However, the performance is decreased when τ𝜏\tau increases to 0.4, showing that the best value of τ𝜏\tau is 0.3.

The hyper-parameter γ𝛾\gamma signifies the degree of confidence between the model’s predictions and partial instance annotations. The results depicted in Fig. 3(b) indicate that, considering both DSC and ASSD, the model achieved the best result when γ=𝛾absent\gamma= 0.8. This suggests that the model deems its own predictions more reliable than the partial instance annotations.

The hyper-parameter α𝛼\alpha in Eq. (4) balances the penalty imposed on False Positives (FPs) and False Negatives (FNs), and was tested with different values in {0.3, 0.4, 0.5, 0.6}. When α𝛼\alpha is smaller, the model imposes less penalty on FPs, thus encouraging the model to predict more positive results than those in the partial instance annotations. However, a weaker penalty on FPs may lead to an over-prediction of foreground voxels. In Fig. 3(c), α=𝛼absent\alpha= 0.4 and α=𝛼absent\alpha= 0.5 achieved very close DSC scores, and α=𝛼absent\alpha= 0.5 has a lower ASSD values. When α=𝛼absent\alpha= 0.3, the performance was much lower. The increased ASSD may be due to that the model predicted some foreground voxels that are far from the actual lymph nodes.

The value of λ𝜆\lambda represents the confidence in the quality of the generated pseudo labels during the training process. We conducted experimental tests with the set {1.4, 1.6, 1.8, 2.0, 2.2}. As illustrated in Fig. 3(d), the result is notably superior when λ=𝜆absent\lambda= 2.0 compared to other settings.

Ablation Study

We conducted additional experiments to validate the effectiveness of Dual-Branch Network (DBN) and the modifications made to adapt the work of Luo et al. (2022a) for learning from partial instance annotations. The quantitative results on the validation set and test set are presented in Table. 1, where the baseline method was taking the partial annotations as full ones to train a VNet with cross entropy loss.

Table 1 shows that the baseline method only achieved an average Dice of 11.04% on the validation set, indicating insufficient supervision from the partial annotations. By leveraging pseudo labels from the dual-branch network with Dicesubscript𝐷𝑖𝑐𝑒\mathcal{L}_{Dice}, it was improved to 15.12%. By additionally introducing the Tversky loss for the original partial annotations, the average Dice was 34.99%. By combing PCE+SCE+Tverskysubscript𝑃𝐶𝐸subscript𝑆𝐶𝐸subscript𝑇𝑣𝑒𝑟𝑠𝑘𝑦\mathcal{L}_{PCE}+\mathcal{L}_{SCE}+\mathcal{L}_{Tversky} for partial instance annotations and LKLCEsubscript𝐿𝐾𝐿𝐶𝐸L_{KLCE} for pseudo labels, the average Dice was 53.31%, and leveraging a pretrained model based on Model Genesis further improved it to 54.10%, showing the effectiveness the loss design and pretraining strategy of our method.

On the testing set, comparison between (c) and (e) in Table 1 shows that replacing CEsubscript𝐶𝐸\mathcal{L}_{CE} by PCEsubscript𝑃𝐶𝐸\mathcal{L}_{PCE} substantially improved the average Dice from 40.26% to 57.36%, showing the effectiveness of reducing the contribution of background voxels in the loss calculation when many lymph nodes are incorrectly labeled as the background in parial instance annotations. The proposed method achieved the lowest ASSD value of 9.35 mm𝑚𝑚mm. Despite that using PCE+SCE+Tverskysubscript𝑃𝐶𝐸subscript𝑆𝐶𝐸subscript𝑇𝑣𝑒𝑟𝑠𝑘𝑦\mathcal{L}_{PCE}+\mathcal{L}_{SCE}+\mathcal{L}_{Tversky} for partial instance annotations achieved a lower DSC value than PCE+Tverskysubscript𝑃𝐶𝐸subscript𝑇𝑣𝑒𝑟𝑠𝑘𝑦\mathcal{L}_{PCE}+\mathcal{L}_{Tversky}, their gap is relatively small considering the performance of the other methods. The different performance between validation set and testing set is mainly from the data distribution shift between the two subsets.

Fig. 4 shows a visual comparison between the compared methods listed in Table 1. It can be observed that the baseline method has obvious under-segmentation, due to taking false negatives in the partial annotation as the background. The naive pseudo label learning method (a) has a lot of over-segmentation, due to that the pseudo labels contain many false positives. By using LKLCEsubscript𝐿𝐾𝐿𝐶𝐸L_{KLCE} for the pseudo labels and LPCEsubscript𝐿𝑃𝐶𝐸L_{PCE}, LSCEsubscript𝐿𝑆𝐶𝐸L_{SCE}, LTverskysubscript𝐿𝑇𝑣𝑒𝑟𝑠𝑘𝑦L_{Tversky} for partial annotations respectively, the performance continues to improve.

5 Discussion and Conclusion

In this study, we explored a weakly supervised learning framework based on partial instance annotations for lymph node segmentation. For such annotations, the key is to identify trustworthy background regions and provide strong foreground signals during training to ensure robust learning of foreground voxels. Our method deals with this problem by generating pseudo labels to mine more potential lymph nodes. As pseudo labels from a single prediction branch may have bias, we propose dynamic mixture of predictions from two branches, leading to more stable pseudo labels and better uncertainty estimation of them based on divergence between the two branches. Loss functions are also carefully designed to highlight the foreground class while reducing the effect of noise in pseudo labels.

This work also has some limitations that could be addressed in the future. First, the segmentation model in our work only learns from the LNQ dataset, and the performance may be further improved by leveraging other existing fully supervised datasets. We believe our approach is adaptable enough for mixed datasets that have both partially and fully annotated cases. Second, the LNQ dataset has only large lymph nodes labeled in the training set, and the distributions of the labeled ones for training and those for testing may be different, making it more challenging to obtain robust performance during testing, especially for small lymph nodes that have not been annotated in the training set. Improving the diversity of the labeled cases under the same annotation budget is a potential solution for this problem, such as making the labeled cases contain lymph nodes with different scales, positions, and shapes. In addition, the loss function in this work has several hyper-parameters, and they are searched manually. In the future, it would be interesting to automate the determination of these hyper-parameters.

In conclusion, we proposed a partial instance annotation learning framework that combines weakly supervised learning and noisy label learning for lymph node segmentation. By introducing a dual-branch network, we dynamically mixed the outputs from the two decoders and fused them with partial instance annotations to obtain reliable pseudo labels. In learning from partial instance annotations, the introduction of multiple loss functions not only provides more reliable foreground and background supervision signals but also facilitates the segmentation of potential lymph nodes that are not labeled out. We conducted experiments using the dataset from the Mediastinal Lymph Node Quantification Challenge, without using any other datasets for pre-training or during the training stage. We finally achieved an average DSC of 54.10% and 55.44%, and average ASSD of 8.72 mm𝑚𝑚mm and 9.35 mm𝑚𝑚mm on validation set and test set, respectively. In the future, it is of interest to leverage other labeled or unlabeled datasets to assist the learning process, such as using unannotated datasets for self-supervised pre-training, or leveraging a small number of fully labeled images to boost the segmentation performance.


Acknowledgments

This work was supported by National Natural Science Foundation of China under grant 62271115, Radiation Oncology Key Laboratory of Sichuan Province Open Found under grant 2022ROKF04, and Science and Technology Department of Sichuan Province under grant 2022YFSY0055.


Ethical Standards

The work follows appropriate ethical standards in conducting research and writing the manuscript, following all applicable laws and regulations regarding treatment of animals or human subjects.


Conflicts of Interest

We declare we do not have conflicts of interest.


Data availability

All data used in the paper are from the Mediastinal Lymph Node Quantification (LNQ): Segmentation of Heterogeneous CT Data competition. The training set data and annotations, as well as the validation set data, can be obtained from the competition website https://lnq2023.grand-challenge.org/. However, annotations for the validation set and both the data and annotations for the test set are not available on the website.

References

  • Badrinarayanan et al. (2017) Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12):2481–2495, 2017.
  • Barbu et al. (2011) Adrian Barbu, Michael Suehling, Xun Xu, David Liu, S Kevin Zhou, and Dorin Comaniciu. Automatic detection and segmentation of lymph nodes from ct data. IEEE Transactions on Medical Imaging, 31(2):240–250, 2011.
  • Bouget et al. (2019) David Bouget, Arve Jørgensen, Gabriel Kiss, Haakon Olav Leira, and Thomas Langø. Semantic segmentation and detection of mediastinal lymph nodes and anatomical structures in ct data for lung cancer staging. International journal of computer assisted radiology and surgery, 14:977–986, 2019.
  • Bouget et al. (2023) David Bouget, André Pedersen, Johanna Vanel, Haakon O Leira, and Thomas Langø. Mediastinal lymph nodes segmentation using 3d convolutional neural network ensembles and anatomical priors guiding. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 11(1):44–58, 2023.
  • Chen et al. (2021) Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L Yuille, and Yuyin Zhou. Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306, 2021.
  • Feulner et al. (2013) Johannes Feulner, S Kevin Zhou, Matthias Hammon, Joachim Hornegger, and Dorin Comaniciu. Lymph node detection and segmentation in chest ct data using discriminative learning and a spatial prior. Medical image analysis, 17(2):254–270, 2013.
  • Fu et al. (2023) Jia Fu, Tao Lu, Shaoting Zhang, and Guotai Wang. UM-CAM: Uncertainty-weighted multi-resolution class activation maps for weakly-supervised fetal brain segmentation. In MICCAI, pages 315–324, 2023.
  • Gidaris et al. (2018) Spyros Gidaris, Praveer Singh, and Nikos Komodakis. Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728, 2018.
  • He et al. (2017) Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
  • Hinton et al. (2015) Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  • Huo et al. (2021) Xinyue Huo, Lingxi Xie, Jianzhong He, Zijie Yang, Wengang Zhou, Houqiang Li, and Qi Tian. Atso: Asynchronous teacher-student optimization for semi-supervised image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1235–1244, 2021.
  • Isensee et al. (2021) Fabian Isensee, Paul F Jaeger, Simon AA Kohl, Jens Petersen, and Klaus H Maier-Hein. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2):203–211, 2021.
  • Jing and Tian (2020) Longlong Jing and Yingli Tian. Self-supervised visual feature learning with deep neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 43(11):4037–4058, 2020.
  • Lee and Jeong (2020) Hyeonsoo Lee and Won-Ki Jeong. Scribble2label: Scribble-supervised cell segmentation via self-generating pseudo-labels with consistency. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23, pages 14–23. Springer, 2020.
  • Lei et al. (2021) Wenhui Lei, Wei Xu, Ran Gu, Hao Fu, Shaoting Zhang, Shichuan Zhang, and Guotai Wang. Contrastive learning of relative position regression for one-shot object localization in 3d medical images. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part II 24, pages 155–165. Springer, 2021.
  • Li and Xia (2020) Zhe Li and Yong Xia. Deep reinforcement learning for weakly-supervised lymph node segmentation in ct images. IEEE Journal of Biomedical and Health Informatics, 25(3):774–783, 2020.
  • Lin et al. (2016) Di Lin, Jifeng Dai, Jiaya Jia, Kaiming He, and Jian Sun. Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3159–3167, 2016.
  • Liu et al. (2022) Sheng Liu, Kangning Liu, Weicheng Zhu, Yiqiu Shen, and Carlos Fernandez-Granda. Adaptive early-learning correction for segmentation from noisy annotations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2606–2616, 2022.
  • Luo et al. (2022a) Xiangde Luo, Minhao Hu, Wenjun Liao, Shuwei Zhai, Tao Song, Guotai Wang, and Shaoting Zhang. Scribble-supervised medical image segmentation via dual-branch network and dynamically mixed pseudo labels supervision. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 528–538. Springer, 2022a.
  • Luo et al. (2022b) Xiangde Luo, Guotai Wang, Wenjun Liao, Jieneng Chen, Tao Song, Yinan Chen, Shichuan Zhang, Dimitris N Metaxas, and Shaoting Zhang. Semi-supervised medical image segmentation via uncertainty rectified pyramid consistency. Medical Image Analysis, 80:102517, 2022b.
  • Milletari et al. (2016) Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV), pages 565–571. Ieee, 2016.
  • Mortenson (1999) Michael E Mortenson. Mathematics for computer graphics applications. Industrial Press Inc., 1999.
  • Müller et al. (2019) Rafael Müller, Simon Kornblith, and Geoffrey E Hinton. When does label smoothing help? Advances in neural information processing systems, 32, 2019.
  • Nogues et al. (2016) Isabella Nogues, Le Lu, Xiaosong Wang, Holger Roth, Gedas Bertasius, Nathan Lay, Jianbo Shi, Yohannes Tsehay, and Ronald M Summers. Automatic lymph node cluster segmentation using holistically-nested neural networks and structured optimization in ct images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 388–397. Springer, 2016.
  • Oda et al. (2018) Hirohisa Oda, Holger R Roth, Kanwal K Bhatia, Masahiro Oda, Takayuki Kitasaka, Shingo Iwano, Hirotoshi Homma, Hirotsugu Takabatake, Masaki Mori, Hiroshi Natori, et al. Dense volumetric detection and segmentation of mediastinal lymph nodes in chest ct images. In Medical Imaging 2018: Computer-Aided Diagnosis, volume 10575, page 1057502. SPIE, 2018.
  • Oh et al. (2021) Youngmin Oh, Beomjun Kim, and Bumsub Ham. Background-aware pooling and noise-aware loss for weakly-supervised semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6913–6922, 2021.
  • Paszke et al. (2019) Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. NeurIPS, 32, 2019.
  • Pathak et al. (2015) Deepak Pathak, Philipp Krahenbuhl, and Trevor Darrell. Constrained convolutional neural networks for weakly supervised segmentation. In Proceedings of the IEEE international conference on computer vision, pages 1796–1804, 2015.
  • Ren and Malik (2003) Ren and Malik. Learning a classification model for segmentation. In Proceedings ninth IEEE international conference on computer vision, pages 10–17. IEEE, 2003.
  • Ronneberger et al. (2015) Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  • Salehi et al. (2017) Seyed Sadegh Mohseni Salehi, Deniz Erdogmus, and Ali Gholipour. Tversky loss function for image segmentation using 3d fully convolutional deep networks. In International workshop on machine learning in medical imaging, pages 379–387. Springer, 2017.
  • Settles (2009) Burr Settles. Active learning literature survey. 2009.
  • Shen et al. (2023) Wei Shen, Zelin Peng, Xuehui Wang, Huayu Wang, Jiazhong Cen, Dongsheng Jiang, Lingxi Xie, Xiaokang Yang, and Q Tian. A survey on label-efficient deep image segmentation: Bridging the gap between weak supervision and dense prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  • Stapleford et al. (2010) Liza J Stapleford, Joshua D Lawson, Charles Perkins, Scott Edelman, Lawrence Davis, Mark W McDonald, Anthony Waller, Eduard Schreibmann, and Tim Fox. Evaluation of automatic atlas-based lymph node segmentation for head-and-neck cancer. International Journal of Radiation Oncology* Biology* Physics, 77(3):959–966, 2010.
  • Wang et al. (2020a) Guotai Wang, Xinglong Liu, Chaoping Li, Zhiyong Xu, Jiugen Ruan, Haifeng Zhu, Tao Meng, Kang Li, Ning Huang, and Shaoting Zhang. A noise-robust framework for automatic segmentation of covid-19 pneumonia lesions from ct images. IEEE Transactions on Medical Imaging, 39(8):2653–2663, 2020a.
  • Wang et al. (2020b) Lu Wang, Dong Guo, Guotai Wang, and Shaoting Zhang. Annotation-efficient learning for medical image segmentation based on noisy pseudo labels and adversarial learning. IEEE Transactions on Medical Imaging, 40(10):2795–2807, 2020b.
  • Wang et al. (2019) Yisen Wang, Xingjun Ma, Zaiyi Chen, Yuan Luo, Jinfeng Yi, and James Bailey. Symmetric cross entropy for robust learning with noisy labels. In Proceedings of the IEEE/CVF international conference on computer vision, pages 322–330, 2019.
  • Xu et al. (2021) Guoping Xu, Hanqiang Cao, Jayaram K Udupa, Yubing Tong, and Drew A Torigian. Disegnet: A deep dilated convolutional encoder-decoder architecture for lymph node segmentation on pet/ct images. Computerized Medical Imaging and Graphics, 88:101851, 2021.
  • Zhai et al. (2023) Shuwei Zhai, Guotai Wang, Xiangde Luo, Qiang Yue, Kang Li, and Shaoting Zhang. Pa-seg: learning from point annotations for 3d medical image segmentation using contextual regularization and cross knowledge distillation. IEEE Transactions on Medical Imaging, 2023.
  • Zhao et al. (2020) Xingyu Zhao, Peiyi Xie, Mengmeng Wang, Wenru Li, Perry J Pickhardt, Wei Xia, Fei Xiong, Rui Zhang, Yao Xie, Junming Jian, et al. Deep learning–based fully automated detection and segmentation of lymph nodes on multiparametric-mri for rectal cancer: A multicentre study. EBioMedicine, 56, 2020.
  • Zheng and Yang (2021) Zhedong Zheng and Yi Yang. Rectifying pseudo label learning via uncertainty estimation for domain adaptive semantic segmentation. International Journal of Computer Vision, 129(4):1106–1120, 2021.
  • Zhou et al. (2021) Zongwei Zhou, Vatsal Sodha, Jiaxuan Pang, Michael B Gotway, and Jianming Liang. Models genesis. Medical image analysis, 67:101840, 2021.