Abstract
Memory array architectures have been proposed for on-chip acceleration of weighted sum and weight update in the neuro-inspired machine learning algorithms. As the learning algorithms usually operate on a large weight matrix size, an efficient mapping of a large weight matrix on the hardware accelerator may require partitioning the matrix into multiple sub-arrays. In this work, we built a circuit-level macro simulator to evaluate the performance of partitioning a 512×512 weight matrix into the SRAM and RRAM based accelerators. Generally, with more partitioning and finer granularity of the array architecture, the read/write latency and the dynamic read/write energy will decrease due to an increased computation parallelism at the expense of larger area and leakage power, as shown in the case of the SRAM accelerator. However, the RRAM accelerator does not improve the read latency and read energy beyond a certain partition point because the overhead due to multiple intermediate stages of adders and registers will dominate.
Original language | English (US) |
---|---|
Title of host publication | ISCAS 2016 - IEEE International Symposium on Circuits and Systems |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 2310-2313 |
Number of pages | 4 |
Volume | 2016-July |
ISBN (Electronic) | 9781479953400 |
DOIs | |
State | Published - Jul 29 2016 |
Event | 2016 IEEE International Symposium on Circuits and Systems, ISCAS 2016 - Montreal, Canada Duration: May 22 2016 → May 25 2016 |
Other
Other | 2016 IEEE International Symposium on Circuits and Systems, ISCAS 2016 |
---|---|
Country/Territory | Canada |
City | Montreal |
Period | 5/22/16 → 5/25/16 |
Keywords
- granularity
- hardware acceleration
- neuromorphic computing
- partition
- RRAM
- SRAM
ASJC Scopus subject areas
- Electrical and Electronic Engineering