Efficient Network Construction through Structural Plasticity

Xiaocong Du, Zheng Li, Yufei Ma, Yu Cao

Research output: Contribution to journalArticle

Abstract

Deep Neural Networks (DNNs) on hardware is facing excessive computation cost due to the massive number of parameters. A typical training pipeline to mitigate over-parameterization is to pre-define a DNN structure with redundant learning units (filters and neurons) with the goal of high accuracy, then to prune redundant learning units after training with the purpose of efficient inference. We argue that it is sub-optimal to introduce redundancy into training in order to reduce redundancy later in inference. Moreover, the fixed network structure further results in poor adaption to dynamic tasks, such as lifelong learning. In contrast, structural plasticity plays an indispensable role in mammalian brains to achieve compact and accurate learning. Throughout the lifetime, active connections are continuously created while those that are no longer important are degenerated. Inspired by such observation, we propose a training scheme, namely Continuous Growth and Pruning (CGaP), where we start the training from a small network seed, then literally execute continuous growth by adding important learning units and finally prune secondary ones for efficient inference. The inference model generated from CGaP is sparse in the structure, largely decreasing the inference power and latency when deployed on hardware platforms. With popular DNN structures on representative datasets, the efficacy of CGaP is benchmarked by both algorithmic simulation and architectural modeling on Field-programmable Gate Arrays (FPGA). For example, CGaP decreases the FLOPs, model size, DRAM access energy and inference latency by 63.3%, 64.0%, 11.8% and 40.2%, respectively, for ResNet-110 on CIFAR-10.

Original languageEnglish (US)
JournalIEEE Journal on Emerging and Selected Topics in Circuits and Systems
DOIs
StateAccepted/In press - Jan 1 2019

Fingerprint

Plasticity
Redundancy
Hardware
Dynamic random access storage
Parameterization
Neurons
Seed
Field programmable gate arrays (FPGA)
Brain
Pipelines
Deep neural networks
Costs

Keywords

  • algorithm-hardware co-design
  • Deep learning
  • hardware acceleration
  • model pruning
  • structural plasticity

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Cite this

Efficient Network Construction through Structural Plasticity. / Du, Xiaocong; Li, Zheng; Ma, Yufei; Cao, Yu.

In: IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 01.01.2019.

Research output: Contribution to journalArticle

@article{2ae4288d03b740a88d21a2d30fd8a597,
title = "Efficient Network Construction through Structural Plasticity",
abstract = "Deep Neural Networks (DNNs) on hardware is facing excessive computation cost due to the massive number of parameters. A typical training pipeline to mitigate over-parameterization is to pre-define a DNN structure with redundant learning units (filters and neurons) with the goal of high accuracy, then to prune redundant learning units after training with the purpose of efficient inference. We argue that it is sub-optimal to introduce redundancy into training in order to reduce redundancy later in inference. Moreover, the fixed network structure further results in poor adaption to dynamic tasks, such as lifelong learning. In contrast, structural plasticity plays an indispensable role in mammalian brains to achieve compact and accurate learning. Throughout the lifetime, active connections are continuously created while those that are no longer important are degenerated. Inspired by such observation, we propose a training scheme, namely Continuous Growth and Pruning (CGaP), where we start the training from a small network seed, then literally execute continuous growth by adding important learning units and finally prune secondary ones for efficient inference. The inference model generated from CGaP is sparse in the structure, largely decreasing the inference power and latency when deployed on hardware platforms. With popular DNN structures on representative datasets, the efficacy of CGaP is benchmarked by both algorithmic simulation and architectural modeling on Field-programmable Gate Arrays (FPGA). For example, CGaP decreases the FLOPs, model size, DRAM access energy and inference latency by 63.3{\%}, 64.0{\%}, 11.8{\%} and 40.2{\%}, respectively, for ResNet-110 on CIFAR-10.",
keywords = "algorithm-hardware co-design, Deep learning, hardware acceleration, model pruning, structural plasticity",
author = "Xiaocong Du and Zheng Li and Yufei Ma and Yu Cao",
year = "2019",
month = "1",
day = "1",
doi = "10.1109/JETCAS.2019.2933233",
language = "English (US)",
journal = "IEEE Journal on Emerging and Selected Topics in Circuits and Systems",
issn = "2156-3357",
publisher = "IEEE Circuits and Systems Society",

}

TY - JOUR

T1 - Efficient Network Construction through Structural Plasticity

AU - Du, Xiaocong

AU - Li, Zheng

AU - Ma, Yufei

AU - Cao, Yu

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Deep Neural Networks (DNNs) on hardware is facing excessive computation cost due to the massive number of parameters. A typical training pipeline to mitigate over-parameterization is to pre-define a DNN structure with redundant learning units (filters and neurons) with the goal of high accuracy, then to prune redundant learning units after training with the purpose of efficient inference. We argue that it is sub-optimal to introduce redundancy into training in order to reduce redundancy later in inference. Moreover, the fixed network structure further results in poor adaption to dynamic tasks, such as lifelong learning. In contrast, structural plasticity plays an indispensable role in mammalian brains to achieve compact and accurate learning. Throughout the lifetime, active connections are continuously created while those that are no longer important are degenerated. Inspired by such observation, we propose a training scheme, namely Continuous Growth and Pruning (CGaP), where we start the training from a small network seed, then literally execute continuous growth by adding important learning units and finally prune secondary ones for efficient inference. The inference model generated from CGaP is sparse in the structure, largely decreasing the inference power and latency when deployed on hardware platforms. With popular DNN structures on representative datasets, the efficacy of CGaP is benchmarked by both algorithmic simulation and architectural modeling on Field-programmable Gate Arrays (FPGA). For example, CGaP decreases the FLOPs, model size, DRAM access energy and inference latency by 63.3%, 64.0%, 11.8% and 40.2%, respectively, for ResNet-110 on CIFAR-10.

AB - Deep Neural Networks (DNNs) on hardware is facing excessive computation cost due to the massive number of parameters. A typical training pipeline to mitigate over-parameterization is to pre-define a DNN structure with redundant learning units (filters and neurons) with the goal of high accuracy, then to prune redundant learning units after training with the purpose of efficient inference. We argue that it is sub-optimal to introduce redundancy into training in order to reduce redundancy later in inference. Moreover, the fixed network structure further results in poor adaption to dynamic tasks, such as lifelong learning. In contrast, structural plasticity plays an indispensable role in mammalian brains to achieve compact and accurate learning. Throughout the lifetime, active connections are continuously created while those that are no longer important are degenerated. Inspired by such observation, we propose a training scheme, namely Continuous Growth and Pruning (CGaP), where we start the training from a small network seed, then literally execute continuous growth by adding important learning units and finally prune secondary ones for efficient inference. The inference model generated from CGaP is sparse in the structure, largely decreasing the inference power and latency when deployed on hardware platforms. With popular DNN structures on representative datasets, the efficacy of CGaP is benchmarked by both algorithmic simulation and architectural modeling on Field-programmable Gate Arrays (FPGA). For example, CGaP decreases the FLOPs, model size, DRAM access energy and inference latency by 63.3%, 64.0%, 11.8% and 40.2%, respectively, for ResNet-110 on CIFAR-10.

KW - algorithm-hardware co-design

KW - Deep learning

KW - hardware acceleration

KW - model pruning

KW - structural plasticity

UR - http://www.scopus.com/inward/record.url?scp=85070679891&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85070679891&partnerID=8YFLogxK

U2 - 10.1109/JETCAS.2019.2933233

DO - 10.1109/JETCAS.2019.2933233

M3 - Article

JO - IEEE Journal on Emerging and Selected Topics in Circuits and Systems

JF - IEEE Journal on Emerging and Selected Topics in Circuits and Systems

SN - 2156-3357

ER -