TY - GEN
T1 - Non-uniform DNN structured subnets sampling for dynamic inference
AU - Yang, Li
AU - He, Zhezhi
AU - Cao, Yu
AU - Fan, Deliang
N1 - Funding Information:
Figure 9: Nine dynamic subnets for ResNet20 trade-off between accuracy and FLOPS unimportant weight channels can be clearly distinguished according to the absolute value of the norm for our method. VI. CONCLUSION In this work, we target to construct a dynamic DNN structure through a novel sub-network sampling method via non-uniform channel selection. Experiments on CIFAR-10 and ImageNet both validate the effectiveness of the method. Beyond that, we test the inference latency for each subnet on Titan GPU and Xeon CPU to show the trade-off between accuracy and latency. VII. ACKNOWLEDGEMENT This work is supported in part by the National Science Foundation under Grant No.2005209, No. 1931871 and Semiconductor Research Corporation nCORE
Publisher Copyright:
© 2020 IEEE.
PY - 2020/7
Y1 - 2020/7
N2 - With the success of Deep Neural Networks (DNN), many recent works have been focusing on developing hardware accelerator for power and resource-limited system via model compression techniques, such as quantization, pruning, low-rank approximation and etc. However, almost all existing compressed DNNs are fixed after deployment, which lacks run-time adaptive structure to adapt to its dynamic hardware resource allocation, power budget, throughput requirement, as well as dynamic workload. As the countermeasure, to construct a novel run-time dynamic DNN structure, we propose a novel DNN sub-network sampling method via non-uniform channel selection for subnets generation. Thus, user can trade off between power, speed, computing load and accuracy on-the-fly after the deployment, depending on the dynamic requirements or specifications of the given system. We verify the proposed model on both CIFAR-10 and ImageNet dataset using ResNets, which outperforms the same sub-nets trained individually and other related works. It shows that, our method can achieve latency trade-off among 13.4, 24.6, 41.3, 62.1(ms) and 30.5, 38.7, 51, 65.4(ms) for GPU with 128 batch-size and CPU respectively on ImageNet using ResNet18.
AB - With the success of Deep Neural Networks (DNN), many recent works have been focusing on developing hardware accelerator for power and resource-limited system via model compression techniques, such as quantization, pruning, low-rank approximation and etc. However, almost all existing compressed DNNs are fixed after deployment, which lacks run-time adaptive structure to adapt to its dynamic hardware resource allocation, power budget, throughput requirement, as well as dynamic workload. As the countermeasure, to construct a novel run-time dynamic DNN structure, we propose a novel DNN sub-network sampling method via non-uniform channel selection for subnets generation. Thus, user can trade off between power, speed, computing load and accuracy on-the-fly after the deployment, depending on the dynamic requirements or specifications of the given system. We verify the proposed model on both CIFAR-10 and ImageNet dataset using ResNets, which outperforms the same sub-nets trained individually and other related works. It shows that, our method can achieve latency trade-off among 13.4, 24.6, 41.3, 62.1(ms) and 30.5, 38.7, 51, 65.4(ms) for GPU with 128 batch-size and CPU respectively on ImageNet using ResNet18.
UR - http://www.scopus.com/inward/record.url?scp=85093362080&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85093362080&partnerID=8YFLogxK
U2 - 10.1109/DAC18072.2020.9218736
DO - 10.1109/DAC18072.2020.9218736
M3 - Conference contribution
AN - SCOPUS:85093362080
T3 - Proceedings - Design Automation Conference
BT - 2020 57th ACM/IEEE Design Automation Conference, DAC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 57th ACM/IEEE Design Automation Conference, DAC 2020
Y2 - 20 July 2020 through 24 July 2020
ER -