UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation

Zongwei Zhou; Md Mahfuzur Rahman Siddiquee; Nima Tajbakhsh; Jianming Liang

doi:10.1109/TMI.2019.2959609

UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation

Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, Jianming Liang

Health Solutions, College of (CHS)

Research output: Contribution to journal › Article › peer-review

1825 Scopus citations

Abstract

The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations: (1) their optimal depth is apriori unknown, requiring extensive architecture search or inefficient ensemble of models of varying depths; and (2) their skip connections impose an unnecessarily restrictive fusion scheme, forcing aggregation only at the same-scale feature maps of the encoder and decoder sub-networks. To overcome these two limitations, we propose UNet++, a new neural architecture for semantic and instance segmentation, by (1) alleviating the unknown network depth with an efficient ensemble of U-Nets of varying depths, which partially share an encoder and co-learn simultaneously using deep supervision; (2) redesigning skip connections to aggregate features of varying semantic scales at the decoder sub-networks, leading to a highly flexible feature fusion scheme; and (3) devising a pruning scheme to accelerate the inference speed of UNet++. We have evaluated UNet++ using six different medical image segmentation datasets, covering multiple imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and electron microscopy (EM), and demonstrating that (1) UNet++ consistently outperforms the baseline models for the task of semantic segmentation across different datasets and backbone architectures; (2) UNet++ enhances segmentation quality of varying-size objects - an improvement over the fixed-depth U-Net; (3) Mask RCNN++ (Mask R-CNN with UNet++ design) outperforms the original Mask R-CNN for the task of instance segmentation; and (4) pruned UNet++ models achieve significant speedup while showing only modest performance degradation. Our implementation and pre-trained models are available at https://github.com/MrGiovanni/UNetPlusPlus.

Original language	English (US)
Article number	8932614
Pages (from-to)	1856-1867
Number of pages	12
Journal	IEEE Transactions on Medical Imaging
Volume	39
Issue number	6
DOIs	https://doi.org/10.1109/TMI.2019.2959609
State	Published - Jun 2020

Keywords

Neuronal structure segmentation
brain tumor segmentation
cell segmentation
deep supervision
instance segmentation
liver segmentation
lung nodule segmentation
medical image segmentation
model pruning
nuclei segmentation
semantic segmentation

ASJC Scopus subject areas

Software
Radiological and Ultrasound Technology
Computer Science Applications
Electrical and Electronic Engineering

Access to Document

10.1109/TMI.2019.2959609

Cite this

@article{bc252c91e7ec4c8b802b5aedc91f3e57,

title = "UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation",

abstract = "The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations: (1) their optimal depth is apriori unknown, requiring extensive architecture search or inefficient ensemble of models of varying depths; and (2) their skip connections impose an unnecessarily restrictive fusion scheme, forcing aggregation only at the same-scale feature maps of the encoder and decoder sub-networks. To overcome these two limitations, we propose UNet++, a new neural architecture for semantic and instance segmentation, by (1) alleviating the unknown network depth with an efficient ensemble of U-Nets of varying depths, which partially share an encoder and co-learn simultaneously using deep supervision; (2) redesigning skip connections to aggregate features of varying semantic scales at the decoder sub-networks, leading to a highly flexible feature fusion scheme; and (3) devising a pruning scheme to accelerate the inference speed of UNet++. We have evaluated UNet++ using six different medical image segmentation datasets, covering multiple imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and electron microscopy (EM), and demonstrating that (1) UNet++ consistently outperforms the baseline models for the task of semantic segmentation across different datasets and backbone architectures; (2) UNet++ enhances segmentation quality of varying-size objects - an improvement over the fixed-depth U-Net; (3) Mask RCNN++ (Mask R-CNN with UNet++ design) outperforms the original Mask R-CNN for the task of instance segmentation; and (4) pruned UNet++ models achieve significant speedup while showing only modest performance degradation. Our implementation and pre-trained models are available at https://github.com/MrGiovanni/UNetPlusPlus.",

keywords = "Neuronal structure segmentation, brain tumor segmentation, cell segmentation, deep supervision, instance segmentation, liver segmentation, lung nodule segmentation, medical image segmentation, model pruning, nuclei segmentation, semantic segmentation",

author = "Zongwei Zhou and Siddiquee, {Md Mahfuzur Rahman} and Nima Tajbakhsh and Jianming Liang",

note = "Funding Information: Manuscript received September 5, 2019; revised October 28, 2019; accepted November 2, 2019. Date of publication December 13, 2019; date of current version June 1, 2020. This work was supported in part by the ASU and Mayo Clinic through a Seed Grant and an Innovation Grant and in part by the NIH under Award R01HL128785. The content is solely the responsibility of the authors and does not necessarily represent the official views of NIH. (Corresponding author: Jianming Liang.) Z. Zhou, N. Tajbakhsh, and J. Liang are with the Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259 USA (e-mail: zongweiz@asu.edu; ntajbakh@asu.edu; jianming.liang@ asu.edu). Publisher Copyright: {\textcopyright} 1982-2012 IEEE.",

year = "2020",

month = jun,

doi = "10.1109/TMI.2019.2959609",

language = "English (US)",

volume = "39",

pages = "1856--1867",

journal = "IEEE Transactions on Medical Imaging",

issn = "0278-0062",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "6",

}

TY - JOUR

T1 - UNet++

T2 - Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation

AU - Zhou, Zongwei

AU - Siddiquee, Md Mahfuzur Rahman

AU - Tajbakhsh, Nima

AU - Liang, Jianming

N1 - Funding Information: Manuscript received September 5, 2019; revised October 28, 2019; accepted November 2, 2019. Date of publication December 13, 2019; date of current version June 1, 2020. This work was supported in part by the ASU and Mayo Clinic through a Seed Grant and an Innovation Grant and in part by the NIH under Award R01HL128785. The content is solely the responsibility of the authors and does not necessarily represent the official views of NIH. (Corresponding author: Jianming Liang.) Z. Zhou, N. Tajbakhsh, and J. Liang are with the Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259 USA (e-mail: zongweiz@asu.edu; ntajbakh@asu.edu; jianming.liang@ asu.edu). Publisher Copyright: © 1982-2012 IEEE.

PY - 2020/6

Y1 - 2020/6

N2 - The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations: (1) their optimal depth is apriori unknown, requiring extensive architecture search or inefficient ensemble of models of varying depths; and (2) their skip connections impose an unnecessarily restrictive fusion scheme, forcing aggregation only at the same-scale feature maps of the encoder and decoder sub-networks. To overcome these two limitations, we propose UNet++, a new neural architecture for semantic and instance segmentation, by (1) alleviating the unknown network depth with an efficient ensemble of U-Nets of varying depths, which partially share an encoder and co-learn simultaneously using deep supervision; (2) redesigning skip connections to aggregate features of varying semantic scales at the decoder sub-networks, leading to a highly flexible feature fusion scheme; and (3) devising a pruning scheme to accelerate the inference speed of UNet++. We have evaluated UNet++ using six different medical image segmentation datasets, covering multiple imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and electron microscopy (EM), and demonstrating that (1) UNet++ consistently outperforms the baseline models for the task of semantic segmentation across different datasets and backbone architectures; (2) UNet++ enhances segmentation quality of varying-size objects - an improvement over the fixed-depth U-Net; (3) Mask RCNN++ (Mask R-CNN with UNet++ design) outperforms the original Mask R-CNN for the task of instance segmentation; and (4) pruned UNet++ models achieve significant speedup while showing only modest performance degradation. Our implementation and pre-trained models are available at https://github.com/MrGiovanni/UNetPlusPlus.

AB - The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations: (1) their optimal depth is apriori unknown, requiring extensive architecture search or inefficient ensemble of models of varying depths; and (2) their skip connections impose an unnecessarily restrictive fusion scheme, forcing aggregation only at the same-scale feature maps of the encoder and decoder sub-networks. To overcome these two limitations, we propose UNet++, a new neural architecture for semantic and instance segmentation, by (1) alleviating the unknown network depth with an efficient ensemble of U-Nets of varying depths, which partially share an encoder and co-learn simultaneously using deep supervision; (2) redesigning skip connections to aggregate features of varying semantic scales at the decoder sub-networks, leading to a highly flexible feature fusion scheme; and (3) devising a pruning scheme to accelerate the inference speed of UNet++. We have evaluated UNet++ using six different medical image segmentation datasets, covering multiple imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and electron microscopy (EM), and demonstrating that (1) UNet++ consistently outperforms the baseline models for the task of semantic segmentation across different datasets and backbone architectures; (2) UNet++ enhances segmentation quality of varying-size objects - an improvement over the fixed-depth U-Net; (3) Mask RCNN++ (Mask R-CNN with UNet++ design) outperforms the original Mask R-CNN for the task of instance segmentation; and (4) pruned UNet++ models achieve significant speedup while showing only modest performance degradation. Our implementation and pre-trained models are available at https://github.com/MrGiovanni/UNetPlusPlus.

KW - Neuronal structure segmentation

KW - brain tumor segmentation

KW - cell segmentation

KW - deep supervision

KW - instance segmentation

KW - liver segmentation

KW - lung nodule segmentation

KW - medical image segmentation

KW - model pruning

KW - nuclei segmentation

KW - semantic segmentation

UR - http://www.scopus.com/inward/record.url?scp=85084466306&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85084466306&partnerID=8YFLogxK

U2 - 10.1109/TMI.2019.2959609

DO - 10.1109/TMI.2019.2959609

M3 - Article

C2 - 31841402

AN - SCOPUS:85084466306

SN - 0278-0062

VL - 39

SP - 1856

EP - 1867

JO - IEEE Transactions on Medical Imaging

JF - IEEE Transactions on Medical Imaging

IS - 6

M1 - 8932614

ER -

UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this