A Progressive Subnetwork Searching Framework for Dynamic Inference

Li Yang; Zhezhi He; Yu Cao; Deliang Fan

doi:10.1109/TNNLS.2022.3199703

A Progressive Subnetwork Searching Framework for Dynamic Inference

Li Yang, Zhezhi He, Yu Cao, Deliang Fan

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Deep neural network (DNN) model compression is a popular and important optimization method for efficient and fast hardware acceleration. However, the compressed model is usually fixed, without the capability to tune the computing complexity (i.e., latency in hardware) on-the-fly, depending on dynamic latency requirements, workloads, and computing hardware resource allocation. To address this challenge, dynamic DNN with run-time adaption of computing structures has been constructed through training with a cross-entropy objective function consisting of multiple subnets sampled from the supernet. Our investigations in this work show that the performance of dynamic inference highly relies on the quality of subnet sampling. To construct a dynamic DNN with multiple high-quality subnets, we propose a progressive subnetwork searching framework, which is embedded with several proposed new techniques, including trainable noise ranking, channel-group sampling, selective fine-tuning, and subnet filtering. Our proposed framework empowers the target dynamic DNN with higher accuracy for all the subnets compared with prior works on both the Canadian Institute for Advanced Research dataset with 10 classes (CIFAR-10) and ImageNet datasets. Specifically, compared with United States-Neural Network (US-NN), our method achieves 0.9% average accuracy gain for Alexnet, 2.5% for ResNet18, 1.1% for Visual Geometry Group (VGG)11, and 0.58% for MobileNetv1, on the ImageNet dataset, respectively. Moreover, to demonstrate run-time tuning of computing latency of dynamic DNN in real computing system, we have deployed our constructed dynamic networks into Nvidia Titan graphics processing unit (GPU) and Intel Xeon central processing unit (CPU), showing great improvement over prior works. The code is available at https://github.com/ASU-ESIC-FAN-Lab/Dynamic-inference.

Original language	English (US)
Pages (from-to)	3809-3820
Number of pages	12
Journal	IEEE Transactions on Neural Networks and Learning Systems
Volume	35
Issue number	3
DOIs	https://doi.org/10.1109/TNNLS.2022.3199703
State	Published - Mar 1 2024

Keywords

Deep neural network (DNN)
dynamic inference
dynamic network
subnet searching

ASJC Scopus subject areas

Software
Artificial Intelligence
Computer Networks and Communications
Computer Science Applications

Access to Document

10.1109/TNNLS.2022.3199703

Cite this

@article{924a6ecf62764df6a63351e6f81fbb25,

title = "A Progressive Subnetwork Searching Framework for Dynamic Inference",

abstract = "Deep neural network (DNN) model compression is a popular and important optimization method for efficient and fast hardware acceleration. However, the compressed model is usually fixed, without the capability to tune the computing complexity (i.e., latency in hardware) on-the-fly, depending on dynamic latency requirements, workloads, and computing hardware resource allocation. To address this challenge, dynamic DNN with run-time adaption of computing structures has been constructed through training with a cross-entropy objective function consisting of multiple subnets sampled from the supernet. Our investigations in this work show that the performance of dynamic inference highly relies on the quality of subnet sampling. To construct a dynamic DNN with multiple high-quality subnets, we propose a progressive subnetwork searching framework, which is embedded with several proposed new techniques, including trainable noise ranking, channel-group sampling, selective fine-tuning, and subnet filtering. Our proposed framework empowers the target dynamic DNN with higher accuracy for all the subnets compared with prior works on both the Canadian Institute for Advanced Research dataset with 10 classes (CIFAR-10) and ImageNet datasets. Specifically, compared with United States-Neural Network (US-NN), our method achieves 0.9% average accuracy gain for Alexnet, 2.5% for ResNet18, 1.1% for Visual Geometry Group (VGG)11, and 0.58% for MobileNetv1, on the ImageNet dataset, respectively. Moreover, to demonstrate run-time tuning of computing latency of dynamic DNN in real computing system, we have deployed our constructed dynamic networks into Nvidia Titan graphics processing unit (GPU) and Intel Xeon central processing unit (CPU), showing great improvement over prior works. The code is available at https://github.com/ASU-ESIC-FAN-Lab/Dynamic-inference.",

keywords = "Deep neural network (DNN), dynamic inference, dynamic network, subnet searching",

author = "Li Yang and Zhezhi He and Yu Cao and Deliang Fan",

note = "Publisher Copyright: {\textcopyright} 2012 IEEE.",

year = "2024",

month = mar,

day = "1",

doi = "10.1109/TNNLS.2022.3199703",

language = "English (US)",

volume = "35",

pages = "3809--3820",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

number = "3",

}

TY - JOUR

T1 - A Progressive Subnetwork Searching Framework for Dynamic Inference

AU - Yang, Li

AU - He, Zhezhi

AU - Cao, Yu

AU - Fan, Deliang

PY - 2024/3/1

Y1 - 2024/3/1

N2 - Deep neural network (DNN) model compression is a popular and important optimization method for efficient and fast hardware acceleration. However, the compressed model is usually fixed, without the capability to tune the computing complexity (i.e., latency in hardware) on-the-fly, depending on dynamic latency requirements, workloads, and computing hardware resource allocation. To address this challenge, dynamic DNN with run-time adaption of computing structures has been constructed through training with a cross-entropy objective function consisting of multiple subnets sampled from the supernet. Our investigations in this work show that the performance of dynamic inference highly relies on the quality of subnet sampling. To construct a dynamic DNN with multiple high-quality subnets, we propose a progressive subnetwork searching framework, which is embedded with several proposed new techniques, including trainable noise ranking, channel-group sampling, selective fine-tuning, and subnet filtering. Our proposed framework empowers the target dynamic DNN with higher accuracy for all the subnets compared with prior works on both the Canadian Institute for Advanced Research dataset with 10 classes (CIFAR-10) and ImageNet datasets. Specifically, compared with United States-Neural Network (US-NN), our method achieves 0.9% average accuracy gain for Alexnet, 2.5% for ResNet18, 1.1% for Visual Geometry Group (VGG)11, and 0.58% for MobileNetv1, on the ImageNet dataset, respectively. Moreover, to demonstrate run-time tuning of computing latency of dynamic DNN in real computing system, we have deployed our constructed dynamic networks into Nvidia Titan graphics processing unit (GPU) and Intel Xeon central processing unit (CPU), showing great improvement over prior works. The code is available at https://github.com/ASU-ESIC-FAN-Lab/Dynamic-inference.

AB - Deep neural network (DNN) model compression is a popular and important optimization method for efficient and fast hardware acceleration. However, the compressed model is usually fixed, without the capability to tune the computing complexity (i.e., latency in hardware) on-the-fly, depending on dynamic latency requirements, workloads, and computing hardware resource allocation. To address this challenge, dynamic DNN with run-time adaption of computing structures has been constructed through training with a cross-entropy objective function consisting of multiple subnets sampled from the supernet. Our investigations in this work show that the performance of dynamic inference highly relies on the quality of subnet sampling. To construct a dynamic DNN with multiple high-quality subnets, we propose a progressive subnetwork searching framework, which is embedded with several proposed new techniques, including trainable noise ranking, channel-group sampling, selective fine-tuning, and subnet filtering. Our proposed framework empowers the target dynamic DNN with higher accuracy for all the subnets compared with prior works on both the Canadian Institute for Advanced Research dataset with 10 classes (CIFAR-10) and ImageNet datasets. Specifically, compared with United States-Neural Network (US-NN), our method achieves 0.9% average accuracy gain for Alexnet, 2.5% for ResNet18, 1.1% for Visual Geometry Group (VGG)11, and 0.58% for MobileNetv1, on the ImageNet dataset, respectively. Moreover, to demonstrate run-time tuning of computing latency of dynamic DNN in real computing system, we have deployed our constructed dynamic networks into Nvidia Titan graphics processing unit (GPU) and Intel Xeon central processing unit (CPU), showing great improvement over prior works. The code is available at https://github.com/ASU-ESIC-FAN-Lab/Dynamic-inference.

KW - Deep neural network (DNN)

KW - dynamic inference

KW - dynamic network

KW - subnet searching

UR - http://www.scopus.com/inward/record.url?scp=85137893229&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85137893229&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2022.3199703

DO - 10.1109/TNNLS.2022.3199703

M3 - Article

AN - SCOPUS:85137893229

SN - 2162-237X

VL - 35

SP - 3809

EP - 3820

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

IS - 3

ER -

A Progressive Subnetwork Searching Framework for Dynamic Inference

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this