4 Citations (Scopus)

Abstract

Memory array architectures have been proposed for on-chip acceleration of weighted sum and weight update in the neuro-inspired machine learning algorithms. As the learning algorithms usually operate on a large weight matrix size, an efficient mapping of a large weight matrix on the hardware accelerator may require partitioning the matrix into multiple sub-arrays. In this work, we built a circuit-level macro simulator to evaluate the performance of partitioning a 512×512 weight matrix into the SRAM and RRAM based accelerators. Generally, with more partitioning and finer granularity of the array architecture, the read/write latency and the dynamic read/write energy will decrease due to an increased computation parallelism at the expense of larger area and leakage power, as shown in the case of the SRAM accelerator. However, the RRAM accelerator does not improve the read latency and read energy beyond a certain partition point because the overhead due to multiple intermediate stages of adders and registers will dominate.

Original languageEnglish (US)
Title of host publicationISCAS 2016 - IEEE International Symposium on Circuits and Systems
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2310-2313
Number of pages4
Volume2016-July
ISBN (Electronic)9781479953400
DOIs
StatePublished - Jul 29 2016
Event2016 IEEE International Symposium on Circuits and Systems, ISCAS 2016 - Montreal, Canada
Duration: May 22 2016May 25 2016

Other

Other2016 IEEE International Symposium on Circuits and Systems, ISCAS 2016
CountryCanada
CityMontreal
Period5/22/165/25/16

Fingerprint

Static random access storage
Particle accelerators
Learning algorithms
Adders
Macros
Learning systems
Simulators
Hardware
Data storage equipment
RRAM
Networks (circuits)

Keywords

  • granularity
  • hardware acceleration
  • neuromorphic computing
  • partition
  • RRAM
  • SRAM

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Cite this

Chen, P. Y., & Yu, S. (2016). Partition SRAM and RRAM based synaptic arrays for neuro-inspired computing. In ISCAS 2016 - IEEE International Symposium on Circuits and Systems (Vol. 2016-July, pp. 2310-2313). [7539046] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISCAS.2016.7539046

Partition SRAM and RRAM based synaptic arrays for neuro-inspired computing. / Chen, Pai Yu; Yu, Shimeng.

ISCAS 2016 - IEEE International Symposium on Circuits and Systems. Vol. 2016-July Institute of Electrical and Electronics Engineers Inc., 2016. p. 2310-2313 7539046.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chen, PY & Yu, S 2016, Partition SRAM and RRAM based synaptic arrays for neuro-inspired computing. in ISCAS 2016 - IEEE International Symposium on Circuits and Systems. vol. 2016-July, 7539046, Institute of Electrical and Electronics Engineers Inc., pp. 2310-2313, 2016 IEEE International Symposium on Circuits and Systems, ISCAS 2016, Montreal, Canada, 5/22/16. https://doi.org/10.1109/ISCAS.2016.7539046
Chen PY, Yu S. Partition SRAM and RRAM based synaptic arrays for neuro-inspired computing. In ISCAS 2016 - IEEE International Symposium on Circuits and Systems. Vol. 2016-July. Institute of Electrical and Electronics Engineers Inc. 2016. p. 2310-2313. 7539046 https://doi.org/10.1109/ISCAS.2016.7539046
Chen, Pai Yu ; Yu, Shimeng. / Partition SRAM and RRAM based synaptic arrays for neuro-inspired computing. ISCAS 2016 - IEEE International Symposium on Circuits and Systems. Vol. 2016-July Institute of Electrical and Electronics Engineers Inc., 2016. pp. 2310-2313
@inproceedings{bcac4aa9f5da44e29cf02be7f748e540,
title = "Partition SRAM and RRAM based synaptic arrays for neuro-inspired computing",
abstract = "Memory array architectures have been proposed for on-chip acceleration of weighted sum and weight update in the neuro-inspired machine learning algorithms. As the learning algorithms usually operate on a large weight matrix size, an efficient mapping of a large weight matrix on the hardware accelerator may require partitioning the matrix into multiple sub-arrays. In this work, we built a circuit-level macro simulator to evaluate the performance of partitioning a 512×512 weight matrix into the SRAM and RRAM based accelerators. Generally, with more partitioning and finer granularity of the array architecture, the read/write latency and the dynamic read/write energy will decrease due to an increased computation parallelism at the expense of larger area and leakage power, as shown in the case of the SRAM accelerator. However, the RRAM accelerator does not improve the read latency and read energy beyond a certain partition point because the overhead due to multiple intermediate stages of adders and registers will dominate.",
keywords = "granularity, hardware acceleration, neuromorphic computing, partition, RRAM, SRAM",
author = "Chen, {Pai Yu} and Shimeng Yu",
year = "2016",
month = "7",
day = "29",
doi = "10.1109/ISCAS.2016.7539046",
language = "English (US)",
volume = "2016-July",
pages = "2310--2313",
booktitle = "ISCAS 2016 - IEEE International Symposium on Circuits and Systems",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Partition SRAM and RRAM based synaptic arrays for neuro-inspired computing

AU - Chen, Pai Yu

AU - Yu, Shimeng

PY - 2016/7/29

Y1 - 2016/7/29

N2 - Memory array architectures have been proposed for on-chip acceleration of weighted sum and weight update in the neuro-inspired machine learning algorithms. As the learning algorithms usually operate on a large weight matrix size, an efficient mapping of a large weight matrix on the hardware accelerator may require partitioning the matrix into multiple sub-arrays. In this work, we built a circuit-level macro simulator to evaluate the performance of partitioning a 512×512 weight matrix into the SRAM and RRAM based accelerators. Generally, with more partitioning and finer granularity of the array architecture, the read/write latency and the dynamic read/write energy will decrease due to an increased computation parallelism at the expense of larger area and leakage power, as shown in the case of the SRAM accelerator. However, the RRAM accelerator does not improve the read latency and read energy beyond a certain partition point because the overhead due to multiple intermediate stages of adders and registers will dominate.

AB - Memory array architectures have been proposed for on-chip acceleration of weighted sum and weight update in the neuro-inspired machine learning algorithms. As the learning algorithms usually operate on a large weight matrix size, an efficient mapping of a large weight matrix on the hardware accelerator may require partitioning the matrix into multiple sub-arrays. In this work, we built a circuit-level macro simulator to evaluate the performance of partitioning a 512×512 weight matrix into the SRAM and RRAM based accelerators. Generally, with more partitioning and finer granularity of the array architecture, the read/write latency and the dynamic read/write energy will decrease due to an increased computation parallelism at the expense of larger area and leakage power, as shown in the case of the SRAM accelerator. However, the RRAM accelerator does not improve the read latency and read energy beyond a certain partition point because the overhead due to multiple intermediate stages of adders and registers will dominate.

KW - granularity

KW - hardware acceleration

KW - neuromorphic computing

KW - partition

KW - RRAM

KW - SRAM

UR - http://www.scopus.com/inward/record.url?scp=84983451764&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84983451764&partnerID=8YFLogxK

U2 - 10.1109/ISCAS.2016.7539046

DO - 10.1109/ISCAS.2016.7539046

M3 - Conference contribution

VL - 2016-July

SP - 2310

EP - 2313

BT - ISCAS 2016 - IEEE International Symposium on Circuits and Systems

PB - Institute of Electrical and Electronics Engineers Inc.

ER -