Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks

Gokul Krishnan; Yufei Ma; Yu Cao

doi:10.1109/ICSICT49897.2020.9278024

Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks

Gokul Krishnan, Yufei Ma, Yu Cao

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

5 Scopus citations

Abstract

DNN pruning approaches usually trim model parameters without exploiting the intrinsic graph properties and hardware preferences. As a result, an FPGA accelerator may not directly benefit from such random pruning, with additional cost on indexing and control modules. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique that integrates local clusters and global sparsity in the Small-World graph and the data locality in the FPGA dataflow. The proposed technique hierarchically trims the DNN into a sparse graph before training, which follows both the Small-World property and FPGA dataflow preferences, such as grouped non-zero and zero parameters to skip data load and corresponding computation. The pruned model is then trained for a given dataset and fine-Tuned to achieve the best accuracy. We evaluate the proposed technique for multiple DNNs with different datasets. It achieves state-of-The-Art sparsity ratio of up to 76% for CIFAR-10, 84% for CIFAR-100, and 76% for the SVHN datasets. Moreover, the generated sparse DNN achieves up to 4× improvement in throughput for an output stationary FPGA architecture across different DNNs with a marginal hardware overhead.

Original language	English (US)
Title of host publication	2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Proceedings
Editors	Shaofeng Yu, Xiaona Zhu, Ting-Ao Tang
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781728162355
DOIs	https://doi.org/10.1109/ICSICT49897.2020.9278024
State	Published - Nov 3 2020
Event	15th IEEE International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Virtual, Kunming, China Duration: Nov 3 2020 → Nov 6 2020

Publication series

Name	2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Proceedings

Conference

Conference	15th IEEE International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020
Country/Territory	China
City	Virtual, Kunming
Period	11/3/20 → 11/6/20

Keywords

Deep Neural Network
Graph Efficiency
Pruning
Small-World graph
Sparse FPGA Accelerator

ASJC Scopus subject areas

Electrical and Electronic Engineering

Access to Document

10.1109/ICSICT49897.2020.9278024

Cite this

Krishnan, G., Ma, Y., & Cao, Y. (2020). Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks. In S. Yu, X. Zhu, & T.-A. Tang (Eds.), 2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Proceedings Article 9278024 (2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICSICT49897.2020.9278024

Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks. / Krishnan, Gokul; Ma, Yufei; Cao, Yu.
2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Proceedings. ed. / Shaofeng Yu; Xiaona Zhu; Ting-Ao Tang. Institute of Electrical and Electronics Engineers Inc., 2020. 9278024 (2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Krishnan, G, Ma, Y & Cao, Y 2020, Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks. in S Yu, X Zhu & T-A Tang (eds), 2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Proceedings., 9278024, 2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Proceedings, Institute of Electrical and Electronics Engineers Inc., 15th IEEE International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020, Virtual, Kunming, China, 11/3/20. https://doi.org/10.1109/ICSICT49897.2020.9278024

Krishnan G, Ma Y, Cao Y. Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks. In Yu S, Zhu X, Tang TA, editors, 2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2020. 9278024. (2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Proceedings). doi: 10.1109/ICSICT49897.2020.9278024

Krishnan, Gokul ; Ma, Yufei ; Cao, Yu. / Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks. 2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Proceedings. editor / Shaofeng Yu ; Xiaona Zhu ; Ting-Ao Tang. Institute of Electrical and Electronics Engineers Inc., 2020. (2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Proceedings).

@inproceedings{d85a9fbefbc945fa88eb4b509783d734,

title = "Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks",

abstract = "DNN pruning approaches usually trim model parameters without exploiting the intrinsic graph properties and hardware preferences. As a result, an FPGA accelerator may not directly benefit from such random pruning, with additional cost on indexing and control modules. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique that integrates local clusters and global sparsity in the Small-World graph and the data locality in the FPGA dataflow. The proposed technique hierarchically trims the DNN into a sparse graph before training, which follows both the Small-World property and FPGA dataflow preferences, such as grouped non-zero and zero parameters to skip data load and corresponding computation. The pruned model is then trained for a given dataset and fine-Tuned to achieve the best accuracy. We evaluate the proposed technique for multiple DNNs with different datasets. It achieves state-of-The-Art sparsity ratio of up to 76% for CIFAR-10, 84% for CIFAR-100, and 76% for the SVHN datasets. Moreover, the generated sparse DNN achieves up to 4× improvement in throughput for an output stationary FPGA architecture across different DNNs with a marginal hardware overhead.",

keywords = "Deep Neural Network, Graph Efficiency, Pruning, Small-World graph, Sparse FPGA Accelerator",

author = "Gokul Krishnan and Yufei Ma and Yu Cao",

note = "Publisher Copyright: {\textcopyright} 2020 IEEE.; 15th IEEE International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 ; Conference date: 03-11-2020 Through 06-11-2020",

year = "2020",

month = nov,

day = "3",

doi = "10.1109/ICSICT49897.2020.9278024",

language = "English (US)",

series = "2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

editor = "Shaofeng Yu and Xiaona Zhu and Ting-Ao Tang",

booktitle = "2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Proceedings",

}

TY - GEN

T1 - Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks

AU - Krishnan, Gokul

AU - Ma, Yufei

AU - Cao, Yu

PY - 2020/11/3

Y1 - 2020/11/3

N2 - DNN pruning approaches usually trim model parameters without exploiting the intrinsic graph properties and hardware preferences. As a result, an FPGA accelerator may not directly benefit from such random pruning, with additional cost on indexing and control modules. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique that integrates local clusters and global sparsity in the Small-World graph and the data locality in the FPGA dataflow. The proposed technique hierarchically trims the DNN into a sparse graph before training, which follows both the Small-World property and FPGA dataflow preferences, such as grouped non-zero and zero parameters to skip data load and corresponding computation. The pruned model is then trained for a given dataset and fine-Tuned to achieve the best accuracy. We evaluate the proposed technique for multiple DNNs with different datasets. It achieves state-of-The-Art sparsity ratio of up to 76% for CIFAR-10, 84% for CIFAR-100, and 76% for the SVHN datasets. Moreover, the generated sparse DNN achieves up to 4× improvement in throughput for an output stationary FPGA architecture across different DNNs with a marginal hardware overhead.

AB - DNN pruning approaches usually trim model parameters without exploiting the intrinsic graph properties and hardware preferences. As a result, an FPGA accelerator may not directly benefit from such random pruning, with additional cost on indexing and control modules. Inspired by the observation that the brain and real-world networks follow a Small-World model, we propose a graph-based progressive structural pruning technique that integrates local clusters and global sparsity in the Small-World graph and the data locality in the FPGA dataflow. The proposed technique hierarchically trims the DNN into a sparse graph before training, which follows both the Small-World property and FPGA dataflow preferences, such as grouped non-zero and zero parameters to skip data load and corresponding computation. The pruned model is then trained for a given dataset and fine-Tuned to achieve the best accuracy. We evaluate the proposed technique for multiple DNNs with different datasets. It achieves state-of-The-Art sparsity ratio of up to 76% for CIFAR-10, 84% for CIFAR-100, and 76% for the SVHN datasets. Moreover, the generated sparse DNN achieves up to 4× improvement in throughput for an output stationary FPGA architecture across different DNNs with a marginal hardware overhead.

KW - Deep Neural Network

KW - Graph Efficiency

KW - Pruning

KW - Small-World graph

KW - Sparse FPGA Accelerator

UR - http://www.scopus.com/inward/record.url?scp=85099213253&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85099213253&partnerID=8YFLogxK

U2 - 10.1109/ICSICT49897.2020.9278024

DO - 10.1109/ICSICT49897.2020.9278024

M3 - Conference contribution

AN - SCOPUS:85099213253

T3 - 2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Proceedings

BT - 2020 IEEE 15th International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020 - Proceedings

A2 - Yu, Shaofeng

A2 - Zhu, Xiaona

A2 - Tang, Ting-Ao

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 15th IEEE International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2020

Y2 - 3 November 2020 through 6 November 2020

ER -

Small-world-based Structural Pruning for Efficient FPGA Inference of Deep Neural Networks

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this