|Year : 2017 | Volume
| Issue : 4 | Page : 186-192
Evaluation of different atlas selection strategies for multi-atlas segmentation of low-dose computed tomographic images of whole-body positron emission tomography/computed tomography
Hongkai Wang1, Nan Zhang1, Li Huo2, Bin Zhang1
1 Department of Biomedical Engineering, Dalian University of Technology, Dalian, Liaoning, China
2 Department of Nuclear Medicine, Peking Union Medical College Hospital, Beijing, China
|Date of Web Publication||26-Mar-2018|
Department of Biomedical Engineering, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning 116024
Source of Support: None, Conflict of Interest: None
Background and Objectives: The increasing clinical use of torso positron emission tomography/computed tomography (PET/CT) demands automated segmentation of torso organs from PET/CT images. We attempt to use the multi-atlas segmentation approach for trunk organ segmentation from the low-dose CT images of PET/CT. Since atlas selection is a prerequisite step for multi-atlas segmentation, this study focuses on evaluating the performance of different atlas selection strategies for torso organ segmentation. Methods: We evaluated two criteria for atlas selection, including image similarity and body mass index (BMI) difference between the atlas and the target image. Based on the two criteria, ten atlases are selected and registered to the target image, followed by the label fusion step to achieve final segmentation. Results: The BMI criterion yields comparable segmentation accuracy to the image similarity criterion but with much less computation time. All the evaluated atlas selection methods have Dice >0.9 for the lungs, heart, and liver and Dice < 0.85 for the skeleton, spleen, and kidneys. The inter-method differences are not significant for the high-contrast and big-sized organs such as skeleton, lungs, heart, and liver. For the low-contrast and smaller-sized organs such as spleen and kidneys, none of the atlas selection methods significantly outperforms random atlas selection. Conclusions: BMI is an effective and efficient atlas selection criterion for low-dose torso CT images. The spleen and kidneys are difficult to get good segmentation, no matter which atlas selection method is used. It is important to develop more effective atlas selection methods for the spleen and kidneys.
Keywords: Multi-atlas, segmentation, selection, positron emission tomography/computed tomography
|How to cite this article:|
Wang H, Zhang N, Huo L, Zhang B. Evaluation of different atlas selection strategies for multi-atlas segmentation of low-dose computed tomographic images of whole-body positron emission tomography/computed tomography. Digit Med 2017;3:186-92
|How to cite this URL:|
Wang H, Zhang N, Huo L, Zhang B. Evaluation of different atlas selection strategies for multi-atlas segmentation of low-dose computed tomographic images of whole-body positron emission tomography/computed tomography. Digit Med [serial online] 2017 [cited 2018 Sep 19];3:186-92. Available from: http://www.digitmedicine.com/text.asp?2017/3/4/186/228664
| Introduction|| |
Nowadays, whole-body positron emission tomography/computed tomography (PET/CT) imaging is widely used for lesion diagnosis, radiotherapy assessment, and pharmacokinetic studies. As a dual-modality imaging approach, PET/CT scanning illustrates both anatomy information and radiotracer metabolism. For quantitative analysis of tracer metabolism in different torso structures, segmentation of multiple organs is a prerequisite step so that the radiotracer uptake can be measured from each segmented organ region. As the CT image of a PET/CT scan provides anatomical information of the patient body, the segmentation of multiple torso organs is usually performed based on the CT image. However, according to the routine clinical PET/CT scan protocol, the CT images are generally acquired with low X-ray dose, leading to imperfect soft tissue contrast and high-image noise which pose challenges to accurate organ segmentation.
To tackle the challenges of imperfect image quality, domain knowledge of human torso anatomy is incorporated into the segmentation algorithm. Wang et al. developed the automatic anatomy recognition (AAR) methodology to hierarchically register fuzzy models of organ anatomy to the target PET/CT images and perform organ segmentation using the iterative relative fuzzy-connectedness method utilizing intensity and texture information of both CT and PET images. Despite the effectiveness of their method, the AAR workflow is complicated to reimplement, and the PET texture is unstable for different radiotracers or for multicenter data with different PET acquisition protocols. Another category of methods using prior anatomical knowledge is the multi- Atlas More Details segmentation approach which has gained success for medical images of the brain,,, heart,, bones,, and others. The multi-atlas segmentation method propagates the structure labels from different atlases into the same target image via image registration and fuses the labels from different atlases to form the final segmentation. For torso CT images, multi-atlas segmentation has been applied for standard diagnostic CT images or contrast-enhanced CT images  but has not been used for low-dose torso CT as we know.
For multi-atlas segmentation, the choice of atlases can inherently affect the algorithm performance. It is not guaranteed that the best segmentation accuracy can be achieved if all the available atlases are used. On the other hand, using more atlases can certainly increase the computation cost. Therefore, it is desirable to select a subset of all atlases which can yield best segmentation accuracy within acceptable computation time. The importance of atlas selection has been emphasized by several previous studies for multi-atlas segmentation of brain magnetic resonance images. A commonly adopted strategy for atlas selection is as follows:first, if a reference subject image IR is selected, then all the N available atlases and the target image are registered to IR. Within the image space of IR, the intensity similarity between the target image and each atlas is measured, using similarity metrics such as mutual information and cross-correlation (CC). The atlases are ranked according to their similarity to the target image, and the most similar K images are selected for the use of multi-atlas segmentation, where K<<N to maintain acceptable computation time. Such selection strategy favors the atlases similar to the target image in terms of global intensity appearance. There is also another type of selection strategy based on physiological information, which affects the anatomical shapes, such as patient age and body mass index (BMI).
To the best of our knowledge, the study on atlas selection strategy of whole-body low-dose CT images for multiple torso organ segmentation is rare. In this article, we will compare the atlas selection methods based on image similarity and physiological information and find out the selection methods yielding best multi-atlas organ segmentation accuracy. The physiological information used in this study is the BMI which is related to the fat amount in torso region. Since BMI is much more convenient to calculate than the image similarity metrics, it will be interesting to see if this simple parameter can result in comparable segmentation performance to the image similarity criteria. Our objective is to get effective selection rules for multi-organ segmentation from low-dose CT images, so as to support organ-level radiotracer quantification for whole-body PET/CT images. If no effective atlas selection method is found by this study, we hope that the evaluation results can reveal the rules for developing more advanced atlas selection algorithm for future research.
| Methods|| |
Atlas and test dataset
In this study, 78 whole-body PET/CT images (including 40 males and 38 females) were retrieved from the database of the Department of Nuclear Medicine, Peking Union Medical College Hospital. All the images were selected from Chinese subjects diagnosed as asymptomatic. We use asymptomatic subjects to eliminate the interference of various diseases on atlas selection and leave the study on disease-oriented atlas selection for future research. The ages of males range from 25 to 70 years with a mean value of 41 while of females range from 32 to 70 years with a mean value of 52. The CT images were acquired under low X-ray dose (100-140 kV tube potential and 28-298 mA current), with 1.4 mm pixel size and 3.0 mm inter-slice spacing. The target organs include skeleton, lungs, heart, liver, spleen, left kidney, and right kidney, which are segmented by trained students and proved by the experienced radiologist. For the evaluation, leave-one-out tests were conducted, where each of the 78 images was used as the target test image, and the other 77 images are used as atlases.
Multi-atlas segmentation workflow
This study follows typical multi-atlas segmentation workflow as illustrated in [Figure 1]. The atlas dataset contains the CT images and their corresponding segmentation label images. For the first step, the CT images of K atlases are selected from the atlas dataset and are registered to the target CT image, respectively. The deformation field obtained from CT registration is used to map the atlas labels into the target image space; this process is called label propagation. Based on propagated labels of the registered atlases, the label fusion step is performed to estimate the final label of each voxel. In this study, an empirical value of K = 10 is used.
|Figure 1: The workflow of multi-atlas segmentation for low-dose torso computed tomography images|
Click here to view
For atlas registration, the ANTS software package (jointly developed by University of Pennsylvania, University of Virginia, and University of Iowa), is used to perform initial affine alignment followed by diffeomorphic registration. We use diffeomorphic registration because it compensates for the nonlinear large inter-subject anatomical deformation and meanwhile well preserves the organ topology. Both the affine and diffeomorphic registrations use the Mattes mutual information as the similarity metric, and the diffeomorphic transform model is the symmetric diffeomorphic normalization (SyN) with gradient step length of 2 deformation field update variance in 3 voxels and total field variance in 1 voxel. For label fusion, we use the generalized local weighting voting method  which calculates voxel-wise fusion weights according to the local similarity between the transformed atlases and the target image within the neighborhood of each voxel. The local neighborhood size is 5 voxels. In this study, the multi-atlas segmentation is performed in a gender-dependent manner, i.e., only the atlases with the same gender of the target image are selected and used for the segmentation.
Atlas selection methods
We evaluate two categories of atlas selection methods, i.e., image similarity criterion and physiological criterion.
To calculate the image similarity criterion, both the atlases and the target image are registered to the same reference subject image so as to compare them in a normalized image space. The reference subject image is selected for each gender with median body height and weight of that gender. We use the CC metric for similarity measurement because the images being compared are of the same modality (low-dose CT). The atlases are ranked according to their CC values in descending order, and the top K atlases are selected. We tested two types of spatial transforms for the registration with the reference image, including affine transform and diffeomorphic transform. The parameters of both transform types are the same as those used for label propagation in subsection 2.1. The affine transform uses only linear deformation, while the diffeomorphic transform uses smooth nonlinear warping. Both transform types have been used for brain atlas selection in the previous studies., Since the affine transform does not introduce any nonlinear deformation, it is assumed that it will select atlases which are naturally similar to the target subject. However, it is necessary to validate if this assumption also works for the torso images.
For the physiological criterion, we use the BMI to select the most similar atlases in terms of body fat amount. The BMI is defined as the body mass (in kg) divided by the square of the body height (in m)
Unlike the previous studies for brain atlases selection, we do not use age as the selection criterion because the aging effect on torso anatomy is unclear. The anatomical reasoning for using BMI is that the accumulation of subcutaneous and visceral fat can noticeably alter torso organ shapes and positions; thus, similar BMI may cause similar organ anatomy. To calculate the BMI of each subject, the body weight and height are retrieved from the DICOM header of the PET/CT images. For atlas selection, the top K atlases with minimum absolute differences from the target subject are selected.
| Results|| |
As described above, this study compares the atlas selection methods based on image similarity criterion and BMI criterion. The image similarity criterion is obtained via two different types of spatial transforms, i.e., affine transform or diffeomorphic transform. Since the two types of spatial transforms will yield different atlas selection results, we also want to compare the results of the two spatial transforms. Therefore, we are totally evaluating three atlas selection criteria, including image similarity based on affine transform and image similarity based on diffeomorphic transform and BMI. We name the three criteria as affine, diffeomorphic, and BMI for short, respectively. The evaluation was performed on a workstation with 2.4 GHz Xeon E5-2630 CPU and 449G RAM, the atlas registration time was 180 s for affine registration and 2000 s for diffeomorphic registration per target-atlas pair, and the label fusion time was 1000 s for fusing K = 10 selected atlases.
For visual inspection of the experimental results, [Figure 2] lists the atlas selection and label fusion results of representative male and female subjects with normal and hypernormal fat amount. For each method, the most and least similar subjects from the 77 subjects are demonstrated. Observing [Figure 3], the results clearly demonstrate that no matter which selection method is used, and no matter for normal or fat subjects, the most similar atlas always has similar torsos fat percentage to the target subject, while the least similar subject noticeably differs from the target subject in terms of fat percentage. It can also be noticed that the most or least similar atlases selected by different methods are sometimes the same, leading to visually similar label fusion results.
|Figure 2: Visual inspection of the atlas selection and label fusion results displayed via coronal slices. Group (a-d) demonstrate the results of normal male, fat male, normal female, and fat female, respectively. In each group, the left image is the target image, and the three columns on the right are for the most similar atlas, least similar atlas, and the label fusion results of the 10 selected atlases, respectively. The three rows in each group represent different atlas selection method, which are body mass index, affine transform, and diffeomorphic metrics from top to bottom|
Click here to view
|Figure 3: Dice values of the segmentation results based on differently selected atlases. The crossed bridge and the asterisk means the difference is significant between two methods|
Click here to view
Quantitative comparison between different selection methods is conducted based on the segmentation accuracy of different organs, which is measured using the Dice coefficient
Where RS and RG represent the organ regions of the segmentation result and the ground truth segmentation, respectively. |•| the region volume and ∩ denotes the overlapping part of two regions. [Figure 3] shows the organ-wise mean ± standard deviation of Dice coefficient resulted from different atlas selection methods, through the leave-one-out test of all 40 males and 38 females. As revealed from [Figure 3], all the methods perform best for the lungs, heart, and liver, with high Dice coefficient (Dice >0.9) and relatively small standard deviations. The skeleton, spleen, and kidneys have relatively lower Dice coefficient (Dice <0.85). We will further analyze the reason for organ-wise accuracy performance in the discussion section.
To measure whether the inter-method difference is significant, a critical P value for the hypothesis test is set at 0.05, and the significant results are reported as crossed bridges and asterisks in [Figure 3]. The P values are calculated using t-test for normally distributed samples, or Wilcoxon signed-rank test if the normality is not met. The false discovery rate correction for multiple tests was applied (at q = 0.05) to limit the accumulation of false-positive errors.
In addition to the three tested atlas selection methods, we also tested the Dice results of using all available atlases, the best 10, worst 10, and 10 random atlases from the atlas set. The 10 random atlases are selected from all atlas set according to uniform distribution. To select the best 10 and worst 10 atlases, all the available atlases of the same gender are registered to the target image. Given the ground truth segmentation of both the atlases and the target is known, the similarity between the registered atlas and the target subject is calculated as the average Dice of all organs. In this way, the selected best and worst atlases are considered as the gold standard for best and worst atlas selections, and the corresponding segmentation accuracy using the best 10 and worst 10 atlases is considered as the upper and lower limits that a certain atlas selection method can achieve.
The selections of “all atlas,” “best 10,” “worst 10,” and “10 random” are compared along with “affine,” “diffeomorphic,” and “BMI” methods altogether in [Figure 3]. Observing the performance of different atlas selection methods, the best 10 selection has the highest Dice coefficients for most organs, and the worst 10 always has the lowest mean Dice values. This means selecting good atlases is essential for improving segmentation accuracy. By looking at the bridges and the asterisks, the differences between the seven atlas selection methods are not significant for the skeleton, lungs, heart, and liver. The difference is not significant even between the best 10 and worst 10 selections, meaning that atlas selection is not necessary for these organs.
| Discussion|| |
This study compares the image similarity criterion with the BMI criterion for atlas selection purpose. Since BMI can be computed conveniently from the DICOM header information, we prefer to use the BMI if it can lead to comparable segmentation accuracy to the image similarity criterion. As revealed by [Figure 2], no matter which atlas selection method is used, the most similar atlas always has a similar body fat percentage to the target subject. This result implies that BMI is effective for torso atlas selection. As we conducted further quantitative evaluation [Figure 3], it turns out that the Dice coefficients resulted from BMI criterion are not significantly different from Dice coefficients of the image similarity criterion. Therefore, we suggest using BMI as a more efficient alternative to the widely used image similarity criterion.
For the image similarity criterion, we compared two types of spatial transforms, affine transform and diffeomorphic transform. Some previous brain atlas selection studies prefer the affine transform more than the diffeomorphic transform,, because if the diffeomorphic transform is used for atlas selection, it is difficult to tell whether the image similarity comes from the natural anatomical similarity or from the diffeomorphic morphing. As reflected from [Figure 3], the “affine” method is significantly better than the “worst 10” selection for male left kidney and female spleen and left kidney, while the “diffeomorphic” method is not significantly different from the “worst 10” for most organs. This means that “affine” outperforms the “worst 10” more than the “diffeomorphic.” Considering that affine transform is also more computationally efficient than the diffeomorphic transform, it is obvious that affine transform should be preferred. This conclusion is similar to the previous brain atlas selection studies.
[Figure 3] shows that different organs have different levels of Dice coefficients. The lungs get good segmentation because it is a big and high-contrast organ and it is easy to be accurately registered in the atlas registration step. The heart and liver are two big organs adjacent to the lungs. Therefore, good registration of the lungs helps to accurately align the liver and heart. The skeleton, spleen, and kidneys have lower Dice coefficient (Dice <0.85) because their elongated shapes (for the skeleton and spleen) and flexible anatomical positions (for the spleen and kidney) make a perfect registration impossible. We also observe that significant differences between the seven methods only exist for the spleen and kidneys. This is because the skeleton, lungs, heart, and liver are high-contrast and big-sized organs, they tend to get consistent segmentation accuracy regardless which atlas selection method is used. In contrast, the spleen and kidneys have irregular shapes, small sizes, low boundary contrast, and variable anatomical positions; their segmentation accuracy relies more on the atlas selection method. It is unfortunate to see that none of “BMI,” “affine,” and “diffeomorphic” significantly outperforms random selection for the spleen and kidneys. This means the evaluated atlas selection methods are all not effective for spleen and kidneys. We still need to develop better atlas selection methods for these abdominal organs. For future research, we may borrow ideas from several more advanced atlas selection algorithms , and test their effectiveness on low-dose CT images.
Regarding using BMI for atlas selection, a limitation of this study is that we did not consider effect of ethnic group difference on BMI calculation. As some articles suggested,, different ethnic groups have different fat amount in torso, which may cause more heterogeneity in calculating BMI. In the future studies, we will need to conduct careful evaluation on the ethnic effects on BMI-based atlas selection.
| Conclusions|| |
This study has revealed useful information for future development of multi-atlas segmentation method for low-dose torso CT images. The BMI criterion yields comparable Dice coefficient to the image similarity criterion but with much less computation time. All the compared methods performed almost equally for the skeleton, lungs, and heart, meaning that atlas selection is not necessary for these organs. For the spleen and kidneys, none of the selection methods yields significantly better accuracy than random atlas selection. We will develop more effective atlas selection method for the next step. The new method should make use of the BMI information and should focus on improving the segmentation accuracy of abdominal organs such as spleen and kidneys.
Financial support and sponsorship
This research is supported by the general program of National Natural Science Fund of China No. 61571076, the youth program of National Natural Science Fund of China No. 81401475, the general program of Liaoning Science and Technology Project No. 2015020040, the cultivating program of Major National Natural Science Fund of China No. 91546123, the National Key Research and Development Program No. 2016YFC0103101 and 2016YFC0103102, the Since and Technology Star Project Fund of Dalian City No. 2016RQ019, and the Basic Research Funding of Dalian University of Technology No. DUT15LN02.
Conflicts of interest
There are no conflicts of interest.
| References|| |
Wang H, Udupa JK, Odhner D, Tong Y, Zhao L, Torigian DA, et al
. Automatic anatomy recognition in whole-body PET/CT images. Med Phys 2016;43:613.
Aljabar P, Heckmann R, Hammers A, Hajnal JV, Rueckert D. Classifier Selection Strategies for Label Fusion Using Large Atlas Databases[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention. Berlin, Heidelberg: Springer; 2007. p. 523-31.
Tang X, Yoshida S, Hsu J, Huisman TA, Faria AV, Oishi K, et al
. Multi-contrast multi-atlas parcellation of diffusion tensor imaging of the human brain. PLoS One 2014;9:e96985.
Park MT, Pipitone J, Baer LH, Winterburn JL, Shah Y, Chavez S, et al
. Derivation of high-resolution MRI atlases of the human cerebellum at 3T and segmentation using multiple automatically generated templates. Neuroimage 2014;95:217-31.
Depa M, Sabuncu MR, Holmvang G, Nezafat R, Schmidt EJ, Golland P, et al
. Robust atlas-based segmentation of highly variable anatomy: Left atrium segmentation. Stat Atlases Comput Models Heart 2010;6364:85-94.
van Rikxoort EM, Isgum I, Arzhaeva Y, Staring M, Klein S, Viergever MA, et al
. Adaptive local multi-atlas segmentation: Application to the heart and the caudate nucleus. Med Image Anal 2010;14:39-49.
Wang L, Chen KC, Gao Y, Shi F, Liao S, Li G, et al
. Automated bone segmentation from dental CBCT images using patch-based sparse representation and convex optimization. Med Phys 2014;41:043503.
Acosta O, Dowling J, Drean G, Simon A, Crevoisier RD, Haigron P. Multi-Atlas-Based Segmentation of Pelvic Structures from CT Scans for Planning in Prostate Cancer Radiotherapy[M]//Abdomen and Thoracic Imaging. Springer US; 2014. p. 623-56.
Iglesias JE, Sabuncu MR. Multi-atlas segmentation of biomedical images: A survey. Med Image Anal 2015;24:205-19.
Jimenez-Del-Toro O, Muller H, Krenn M, Gruenberg K, Taha AA, Winterstein M, et al
. Cloud-based evaluation of anatomical structure segmentation and landmark detection algorithms: VISCERAL anatomy benchmarks. IEEE Trans Med Imaging 2016;35:2459-75.
Aljabar P, Heckemann RA, Hammers A, Hajnal JV, Rueckert D. Multi-atlas based segmentation of brain images: Atlas selection and its effect on accuracy. Neuroimage 2009;46:726-38.
Avants BB, Tustison NJ, Song G, Gee JC. Ants: Open-source tools for normalization and neuroanatomy. HeanetIe 2009;10:1-11.
Artaechevarria X, Munoz-Barrutia A, Ortiz-de-Solorzano C. Combination strategies in multi-atlas image segmentation: Application to brain MR data. IEEE Trans Med Imaging 2009;28:1266-77.
Avants BB, Yushkevich P, Pluta J, Minkoff D, Korczykowski M, Detre J, et al
. The optimal template effect in hippocampus studies of diseased populations. Neuroimage 2010;49:2457-66.
Avants BB, Tustison NJ, Song G, Cook PA, Klein A, Gee JC, et al
. Areproducible evaluation of ANTs similarity metric performance in brain image registration. Neuroimage 2011;54:2033-44.
Shen K, Bourgeat P, Dowson N, Meriaudeau F, Salvado O. Atlas Selection Strategy Using Least Angle Regression in Multi-Atlas Segmentation Propagation[C]//Biomedical Imaging: From Nano to Macro, 2011 IEEE International Symposium on. IEEE; 2011. p. 1746-9.
Xu Z, Burke RP, Lee CP, Baucom RB, Poulose BK, Abramson RG, et al
. Efficient multi-atlas abdominal segmentation on clinically acquired CT with SIMPLE context learning. Med Image Anal 2015;24:18-27.
Deurenberg P, Yap M, van Staveren WA. Body mass index and percent body fat: A meta analysis among different ethnic groups. Int J Obes Relat Metab Disord 1998;22:1164-71.
Deurenberg P, Deurenberg-Yap M, Guricci S. Asians are different from Caucasians and from each other in their body mass index/body fat per cent relationship. Obes Rev 2002;3:141-6.
[Figure 1], [Figure 2], [Figure 3]