A fully onchip binarized convolutional neural network FPGA impelmentation with accurate inference

Li Yang; Zhezhi He; Deliang Fan

doi:10.1145/3218603.3218615

A fully onchip binarized convolutional neural network FPGA impelmentation with accurate inference

Li Yang, Zhezhi He, Deliang Fan

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

33 Scopus citations

Abstract

Deep convolutional neural network has taken an important role in machine learning algorithm which has been widely used in computer vision tasks. However, its enormous model size and massive computation cost have became the main obstacle for deployment of such powerful algorithm in low power and resource limited embedded system, such as FPGA. Recent works have shown the binarized neural networks (BNN), utilizing binarized (i.e. +1 and -1) convolution kernel and binary activation function, can significantly reduce the model size and computation complexity, which paves a new road for energy-efficient FPGA implementation. In this work, we first propose a new BNN algorithm, called Parallel-Convolution BNN (i.e. PC-BNN), which replaces the original binary convolution layer in conventional BNN with two parallel binary convolution layers. PC-BNN achieves ∼86% on CIFAR-10 dataset with only 2.3Mb parameter size. We then deploy our proposed PC-BNN into the Xilinx PYNQ Z1 FPGA board with only 4.9Mb on-chip RAM. Since the ultra-small network parameter, it is feasible to store the whole network parameter into on-chip RAM, which could greatly reduce the energy and delay overhead to load network parameter from off-chip memory. Meanwhile, a new data streaming pipeline architecture is proposed in PC-BNN FPGA implementation to further improve throughput. The experiment results show that our PCBNN based FPGA implementation achieves 930 frames per second, 387.5 FPS/Watt and 396×10^-4 FPS/LUT, which are among the best throughput and energy efficiency compared to most recent works.

Original language	English (US)
Title of host publication	ISLPED 2018 - Proceedings of the 2018 International Symposium on Low Power Electronics and Design
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Print)	9781450357043
DOIs	https://doi.org/10.1145/3218603.3218615
State	Published - Jul 23 2018
Externally published	Yes
Event	23rd IEEE/ACM International Symposium on Low Power Electronics and Design, ISLPED 2018 - Bellevue, United States Duration: Jul 23 2018 → Jul 25 2018

Publication series

Name	Proceedings of the International Symposium on Low Power Electronics and Design
ISSN (Print)	1533-4678

Conference

Conference	23rd IEEE/ACM International Symposium on Low Power Electronics and Design, ISLPED 2018
Country/Territory	United States
City	Bellevue
Period	7/23/18 → 7/25/18

Keywords

Binarized convolutional neural network (BNN)
Convolutional neural network (CNN)
Field-programmable gate array (FPGA)

ASJC Scopus subject areas

General Engineering

Access to Document

10.1145/3218603.3218615

Cite this

Yang, L., He, Z., & Fan, D. (2018). A fully onchip binarized convolutional neural network FPGA impelmentation with accurate inference. In ISLPED 2018 - Proceedings of the 2018 International Symposium on Low Power Electronics and Design Article a50 (Proceedings of the International Symposium on Low Power Electronics and Design). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1145/3218603.3218615

A fully onchip binarized convolutional neural network FPGA impelmentation with accurate inference. / Yang, Li; He, Zhezhi; Fan, Deliang.
ISLPED 2018 - Proceedings of the 2018 International Symposium on Low Power Electronics and Design. Institute of Electrical and Electronics Engineers Inc., 2018. a50 (Proceedings of the International Symposium on Low Power Electronics and Design).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Yang, L, He, Z & Fan, D 2018, A fully onchip binarized convolutional neural network FPGA impelmentation with accurate inference. in ISLPED 2018 - Proceedings of the 2018 International Symposium on Low Power Electronics and Design., a50, Proceedings of the International Symposium on Low Power Electronics and Design, Institute of Electrical and Electronics Engineers Inc., 23rd IEEE/ACM International Symposium on Low Power Electronics and Design, ISLPED 2018, Bellevue, United States, 7/23/18. https://doi.org/10.1145/3218603.3218615

Yang L, He Z, Fan D. A fully onchip binarized convolutional neural network FPGA impelmentation with accurate inference. In ISLPED 2018 - Proceedings of the 2018 International Symposium on Low Power Electronics and Design. Institute of Electrical and Electronics Engineers Inc. 2018. a50. (Proceedings of the International Symposium on Low Power Electronics and Design). doi: 10.1145/3218603.3218615

Yang, Li ; He, Zhezhi ; Fan, Deliang. / A fully onchip binarized convolutional neural network FPGA impelmentation with accurate inference. ISLPED 2018 - Proceedings of the 2018 International Symposium on Low Power Electronics and Design. Institute of Electrical and Electronics Engineers Inc., 2018. (Proceedings of the International Symposium on Low Power Electronics and Design).

@inproceedings{09da2432c84e40d495d36455ea2c327d,

title = "A fully onchip binarized convolutional neural network FPGA impelmentation with accurate inference",

abstract = "Deep convolutional neural network has taken an important role in machine learning algorithm which has been widely used in computer vision tasks. However, its enormous model size and massive computation cost have became the main obstacle for deployment of such powerful algorithm in low power and resource limited embedded system, such as FPGA. Recent works have shown the binarized neural networks (BNN), utilizing binarized (i.e. +1 and -1) convolution kernel and binary activation function, can significantly reduce the model size and computation complexity, which paves a new road for energy-efficient FPGA implementation. In this work, we first propose a new BNN algorithm, called Parallel-Convolution BNN (i.e. PC-BNN), which replaces the original binary convolution layer in conventional BNN with two parallel binary convolution layers. PC-BNN achieves ∼86% on CIFAR-10 dataset with only 2.3Mb parameter size. We then deploy our proposed PC-BNN into the Xilinx PYNQ Z1 FPGA board with only 4.9Mb on-chip RAM. Since the ultra-small network parameter, it is feasible to store the whole network parameter into on-chip RAM, which could greatly reduce the energy and delay overhead to load network parameter from off-chip memory. Meanwhile, a new data streaming pipeline architecture is proposed in PC-BNN FPGA implementation to further improve throughput. The experiment results show that our PCBNN based FPGA implementation achieves 930 frames per second, 387.5 FPS/Watt and 396×10-4 FPS/LUT, which are among the best throughput and energy efficiency compared to most recent works.",

keywords = "Binarized convolutional neural network (BNN), Convolutional neural network (CNN), Field-programmable gate array (FPGA)",

author = "Li Yang and Zhezhi He and Deliang Fan",

note = "Publisher Copyright: {\textcopyright} 2018 Association for Computing Machinery.; 23rd IEEE/ACM International Symposium on Low Power Electronics and Design, ISLPED 2018 ; Conference date: 23-07-2018 Through 25-07-2018",

year = "2018",

month = jul,

day = "23",

doi = "10.1145/3218603.3218615",

language = "English (US)",

isbn = "9781450357043",

series = "Proceedings of the International Symposium on Low Power Electronics and Design",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "ISLPED 2018 - Proceedings of the 2018 International Symposium on Low Power Electronics and Design",

}

TY - GEN

T1 - A fully onchip binarized convolutional neural network FPGA impelmentation with accurate inference

AU - Yang, Li

AU - He, Zhezhi

AU - Fan, Deliang

PY - 2018/7/23

Y1 - 2018/7/23

N2 - Deep convolutional neural network has taken an important role in machine learning algorithm which has been widely used in computer vision tasks. However, its enormous model size and massive computation cost have became the main obstacle for deployment of such powerful algorithm in low power and resource limited embedded system, such as FPGA. Recent works have shown the binarized neural networks (BNN), utilizing binarized (i.e. +1 and -1) convolution kernel and binary activation function, can significantly reduce the model size and computation complexity, which paves a new road for energy-efficient FPGA implementation. In this work, we first propose a new BNN algorithm, called Parallel-Convolution BNN (i.e. PC-BNN), which replaces the original binary convolution layer in conventional BNN with two parallel binary convolution layers. PC-BNN achieves ∼86% on CIFAR-10 dataset with only 2.3Mb parameter size. We then deploy our proposed PC-BNN into the Xilinx PYNQ Z1 FPGA board with only 4.9Mb on-chip RAM. Since the ultra-small network parameter, it is feasible to store the whole network parameter into on-chip RAM, which could greatly reduce the energy and delay overhead to load network parameter from off-chip memory. Meanwhile, a new data streaming pipeline architecture is proposed in PC-BNN FPGA implementation to further improve throughput. The experiment results show that our PCBNN based FPGA implementation achieves 930 frames per second, 387.5 FPS/Watt and 396×10-4 FPS/LUT, which are among the best throughput and energy efficiency compared to most recent works.

AB - Deep convolutional neural network has taken an important role in machine learning algorithm which has been widely used in computer vision tasks. However, its enormous model size and massive computation cost have became the main obstacle for deployment of such powerful algorithm in low power and resource limited embedded system, such as FPGA. Recent works have shown the binarized neural networks (BNN), utilizing binarized (i.e. +1 and -1) convolution kernel and binary activation function, can significantly reduce the model size and computation complexity, which paves a new road for energy-efficient FPGA implementation. In this work, we first propose a new BNN algorithm, called Parallel-Convolution BNN (i.e. PC-BNN), which replaces the original binary convolution layer in conventional BNN with two parallel binary convolution layers. PC-BNN achieves ∼86% on CIFAR-10 dataset with only 2.3Mb parameter size. We then deploy our proposed PC-BNN into the Xilinx PYNQ Z1 FPGA board with only 4.9Mb on-chip RAM. Since the ultra-small network parameter, it is feasible to store the whole network parameter into on-chip RAM, which could greatly reduce the energy and delay overhead to load network parameter from off-chip memory. Meanwhile, a new data streaming pipeline architecture is proposed in PC-BNN FPGA implementation to further improve throughput. The experiment results show that our PCBNN based FPGA implementation achieves 930 frames per second, 387.5 FPS/Watt and 396×10-4 FPS/LUT, which are among the best throughput and energy efficiency compared to most recent works.

KW - Binarized convolutional neural network (BNN)

KW - Convolutional neural network (CNN)

KW - Field-programmable gate array (FPGA)

UR - http://www.scopus.com/inward/record.url?scp=85051514861&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051514861&partnerID=8YFLogxK

U2 - 10.1145/3218603.3218615

DO - 10.1145/3218603.3218615

M3 - Conference contribution

AN - SCOPUS:85051514861

SN - 9781450357043

T3 - Proceedings of the International Symposium on Low Power Electronics and Design

BT - ISLPED 2018 - Proceedings of the 2018 International Symposium on Low Power Electronics and Design

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 23rd IEEE/ACM International Symposium on Low Power Electronics and Design, ISLPED 2018

Y2 - 23 July 2018 through 25 July 2018

ER -

A fully onchip binarized convolutional neural network FPGA impelmentation with accurate inference

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this