Processing-in-Memory Accelerator for Dynamic Neural Network with Run-Time Tuning of Accuracy, Power and Latency

Li Yang; Zhezhi He; Shaahin Angizi; Deliang Fan

doi:10.1109/SOCC49529.2020.9524770

Processing-in-Memory Accelerator for Dynamic Neural Network with Run-Time Tuning of Accuracy, Power and Latency

Li Yang, Zhezhi He, Shaahin Angizi, Deliang Fan

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Scopus citations

Abstract

With the widely deployment of powerful deep neural network (DNN) into smart, but resource limited IoT devices, many prior works have been proposed to compress DNN in a hardware-aware manner to reduce the computing complexity, while maintaining accuracy, such as weight quantization, pruning, convolution decomposition, etc. However, in typical DNN compression methods, a smaller, but fixed, network structure is generated from a relative large background model for resource limited hardware accelerator deployment. However, such optimization lacks the ability to tune its structure on-the-fly to best fit for a dynamic computing hardware resource allocation and workloads. In this paper, we mainly review two of our prior works [1], [2] to address this issue, discussing how to construct a dynamic DNN structure through either uniform or non-uniform channel selection based sub-network sampling. The constructed dynamic DNN could tune its computing path to involve different number of channels, thus providing the ability to trade-off between speed, power and accuracy on-the-fly after model deployment. Correspondingly, an emerging Spin-Orbit Torque Magnetic Random-Access-Memory (SOT-MRAM) based Processing-In-Memory (PIM) accelerator will also be discussed for such dynamic neural network structure.

Original language	English (US)
Title of host publication	Proceedings - 33rd IEEE International System on Chip Conference, SOCC 2020
Editors	Gang Qu, Jinjun Xiong, Danella Zhao, Venki Muthukumar, Md Farhadur Reza, Ramalingam Sridhar
Publisher	IEEE Computer Society
Pages	117-122
Number of pages	6
ISBN (Electronic)	9781728187457
DOIs	https://doi.org/10.1109/SOCC49529.2020.9524770
State	Published - Sep 8 2020
Event	33rd IEEE International System on Chip Conference, SOCC 2020 - Virtual, Las Vegas, United States Duration: Sep 8 2020 → Sep 11 2020

Publication series

Name	International System on Chip Conference
Volume	2020-September
ISSN (Print)	2164-1676
ISSN (Electronic)	2164-1706

Conference

Conference	33rd IEEE International System on Chip Conference, SOCC 2020
Country/Territory	United States
City	Virtual, Las Vegas
Period	9/8/20 → 9/11/20

Keywords

Dynamic neural network
Processing-in-Memory

ASJC Scopus subject areas

Hardware and Architecture
Control and Systems Engineering
Electrical and Electronic Engineering

Access to Document

10.1109/SOCC49529.2020.9524770

Cite this

Yang, L., He, Z., Angizi, S., & Fan, D. (2020). Processing-in-Memory Accelerator for Dynamic Neural Network with Run-Time Tuning of Accuracy, Power and Latency. In G. Qu, J. Xiong, D. Zhao, V. Muthukumar, M. F. Reza, & R. Sridhar (Eds.), Proceedings - 33rd IEEE International System on Chip Conference, SOCC 2020 (pp. 117-122). (International System on Chip Conference; Vol. 2020-September). IEEE Computer Society. https://doi.org/10.1109/SOCC49529.2020.9524770

Processing-in-Memory Accelerator for Dynamic Neural Network with Run-Time Tuning of Accuracy, Power and Latency. / Yang, Li; He, Zhezhi; Angizi, Shaahin et al.
Proceedings - 33rd IEEE International System on Chip Conference, SOCC 2020. ed. / Gang Qu; Jinjun Xiong; Danella Zhao; Venki Muthukumar; Md Farhadur Reza; Ramalingam Sridhar. IEEE Computer Society, 2020. p. 117-122 (International System on Chip Conference; Vol. 2020-September).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Yang, L, He, Z, Angizi, S & Fan, D 2020, Processing-in-Memory Accelerator for Dynamic Neural Network with Run-Time Tuning of Accuracy, Power and Latency. in G Qu, J Xiong, D Zhao, V Muthukumar, MF Reza & R Sridhar (eds), Proceedings - 33rd IEEE International System on Chip Conference, SOCC 2020. International System on Chip Conference, vol. 2020-September, IEEE Computer Society, pp. 117-122, 33rd IEEE International System on Chip Conference, SOCC 2020, Virtual, Las Vegas, United States, 9/8/20. https://doi.org/10.1109/SOCC49529.2020.9524770

Yang L, He Z, Angizi S, Fan D. Processing-in-Memory Accelerator for Dynamic Neural Network with Run-Time Tuning of Accuracy, Power and Latency. In Qu G, Xiong J, Zhao D, Muthukumar V, Reza MF, Sridhar R, editors, Proceedings - 33rd IEEE International System on Chip Conference, SOCC 2020. IEEE Computer Society. 2020. p. 117-122. (International System on Chip Conference). doi: 10.1109/SOCC49529.2020.9524770

Yang, Li ; He, Zhezhi ; Angizi, Shaahin et al. / Processing-in-Memory Accelerator for Dynamic Neural Network with Run-Time Tuning of Accuracy, Power and Latency. Proceedings - 33rd IEEE International System on Chip Conference, SOCC 2020. editor / Gang Qu ; Jinjun Xiong ; Danella Zhao ; Venki Muthukumar ; Md Farhadur Reza ; Ramalingam Sridhar. IEEE Computer Society, 2020. pp. 117-122 (International System on Chip Conference).

@inproceedings{1d7b519637c74001a0ba0967e61baa39,

title = "Processing-in-Memory Accelerator for Dynamic Neural Network with Run-Time Tuning of Accuracy, Power and Latency",

abstract = "With the widely deployment of powerful deep neural network (DNN) into smart, but resource limited IoT devices, many prior works have been proposed to compress DNN in a hardware-aware manner to reduce the computing complexity, while maintaining accuracy, such as weight quantization, pruning, convolution decomposition, etc. However, in typical DNN compression methods, a smaller, but fixed, network structure is generated from a relative large background model for resource limited hardware accelerator deployment. However, such optimization lacks the ability to tune its structure on-the-fly to best fit for a dynamic computing hardware resource allocation and workloads. In this paper, we mainly review two of our prior works [1], [2] to address this issue, discussing how to construct a dynamic DNN structure through either uniform or non-uniform channel selection based sub-network sampling. The constructed dynamic DNN could tune its computing path to involve different number of channels, thus providing the ability to trade-off between speed, power and accuracy on-the-fly after model deployment. Correspondingly, an emerging Spin-Orbit Torque Magnetic Random-Access-Memory (SOT-MRAM) based Processing-In-Memory (PIM) accelerator will also be discussed for such dynamic neural network structure. ",

keywords = "Dynamic neural network, Processing-in-Memory",

author = "Li Yang and Zhezhi He and Shaahin Angizi and Deliang Fan",

note = "Funding Information: This work is supported in part by the National Science Foundation under Grant No.2005209, No.2003749, No.1931871 and Semiconductor Research Corporation nCORE Publisher Copyright: {\textcopyright} 2020 IEEE.; 33rd IEEE International System on Chip Conference, SOCC 2020 ; Conference date: 08-09-2020 Through 11-09-2020",

year = "2020",

month = sep,

day = "8",

doi = "10.1109/SOCC49529.2020.9524770",

language = "English (US)",

series = "International System on Chip Conference",

publisher = "IEEE Computer Society",

pages = "117--122",

editor = "Gang Qu and Jinjun Xiong and Danella Zhao and Venki Muthukumar and Reza, {Md Farhadur} and Ramalingam Sridhar",

booktitle = "Proceedings - 33rd IEEE International System on Chip Conference, SOCC 2020",

}

TY - GEN

T1 - Processing-in-Memory Accelerator for Dynamic Neural Network with Run-Time Tuning of Accuracy, Power and Latency

AU - Yang, Li

AU - He, Zhezhi

AU - Angizi, Shaahin

AU - Fan, Deliang

N1 - Funding Information: This work is supported in part by the National Science Foundation under Grant No.2005209, No.2003749, No.1931871 and Semiconductor Research Corporation nCORE Publisher Copyright: © 2020 IEEE.

PY - 2020/9/8

Y1 - 2020/9/8

N2 - With the widely deployment of powerful deep neural network (DNN) into smart, but resource limited IoT devices, many prior works have been proposed to compress DNN in a hardware-aware manner to reduce the computing complexity, while maintaining accuracy, such as weight quantization, pruning, convolution decomposition, etc. However, in typical DNN compression methods, a smaller, but fixed, network structure is generated from a relative large background model for resource limited hardware accelerator deployment. However, such optimization lacks the ability to tune its structure on-the-fly to best fit for a dynamic computing hardware resource allocation and workloads. In this paper, we mainly review two of our prior works [1], [2] to address this issue, discussing how to construct a dynamic DNN structure through either uniform or non-uniform channel selection based sub-network sampling. The constructed dynamic DNN could tune its computing path to involve different number of channels, thus providing the ability to trade-off between speed, power and accuracy on-the-fly after model deployment. Correspondingly, an emerging Spin-Orbit Torque Magnetic Random-Access-Memory (SOT-MRAM) based Processing-In-Memory (PIM) accelerator will also be discussed for such dynamic neural network structure.

AB - With the widely deployment of powerful deep neural network (DNN) into smart, but resource limited IoT devices, many prior works have been proposed to compress DNN in a hardware-aware manner to reduce the computing complexity, while maintaining accuracy, such as weight quantization, pruning, convolution decomposition, etc. However, in typical DNN compression methods, a smaller, but fixed, network structure is generated from a relative large background model for resource limited hardware accelerator deployment. However, such optimization lacks the ability to tune its structure on-the-fly to best fit for a dynamic computing hardware resource allocation and workloads. In this paper, we mainly review two of our prior works [1], [2] to address this issue, discussing how to construct a dynamic DNN structure through either uniform or non-uniform channel selection based sub-network sampling. The constructed dynamic DNN could tune its computing path to involve different number of channels, thus providing the ability to trade-off between speed, power and accuracy on-the-fly after model deployment. Correspondingly, an emerging Spin-Orbit Torque Magnetic Random-Access-Memory (SOT-MRAM) based Processing-In-Memory (PIM) accelerator will also be discussed for such dynamic neural network structure.

KW - Dynamic neural network

KW - Processing-in-Memory

UR - http://www.scopus.com/inward/record.url?scp=85115327016&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85115327016&partnerID=8YFLogxK

U2 - 10.1109/SOCC49529.2020.9524770

DO - 10.1109/SOCC49529.2020.9524770

M3 - Conference contribution

AN - SCOPUS:85115327016

T3 - International System on Chip Conference

SP - 117

EP - 122

BT - Proceedings - 33rd IEEE International System on Chip Conference, SOCC 2020

A2 - Qu, Gang

A2 - Xiong, Jinjun

A2 - Zhao, Danella

A2 - Muthukumar, Venki

A2 - Reza, Md Farhadur

A2 - Sridhar, Ramalingam

PB - IEEE Computer Society

T2 - 33rd IEEE International System on Chip Conference, SOCC 2020

Y2 - 8 September 2020 through 11 September 2020

ER -

Processing-in-Memory Accelerator for Dynamic Neural Network with Run-Time Tuning of Accuracy, Power and Latency

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this