TY - JOUR
T1 - High throughput data mapping for coarse-grained reconfigurable architectures
AU - Kim, Yongjoo
AU - Lee, Jongeun
AU - Shrivastava, Aviral
AU - Yoon, Jonghee W.
AU - Cho, Doosan
AU - Paek, Yunheung
N1 - Funding Information:
Manuscript received April 28, 2011; accepted June 3, 2011. Date of current version October 19, 2011. This work was supported in part by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science, and Technology (MEST), under Grant 2010-0011534, in part by the Korea Science and Engineering Foundation (KOSEF) NRL Program grant funded by the MEST, under Grant 2011-0018609, in part by the Engineering Research Center of Excellence Program of MEST/KOSEF, under Grant 2011-0000975, in part by the Sunchon National University Research Fund in 2011, in part by the Basic Science Research Program, under Grant 2010-0024529, through the National Research Foundation of Korea (NRF) funded by the MEST, and the IDEC, and in part by funding from the National Science Foundation, under Grants CCF-0916652 and CCF-1055094 (CAREER), in part by the NSF I/UCRC for Embedded Systems, under Grant IIP-0856090, Raytheon, Intel, Microsoft Research, SFAz, and Stardust Foundation. This paper was recommended by Associate Editor M. Hutton.
PY - 2011/11
Y1 - 2011/11
N2 - Coarse-grained reconfigurable arrays (CGRAs) are a very promising platform, providing both up to 10-100 MOps/mW of power efficiency and software programmability. However, this promise of CGRAs critically hinges on the effectiveness of application mapping onto CGRA platforms. While previous solutions have greatly improved the computation speed, they have largely ignored the impact of the local memory architecture on the achievable power and performance. This paper motivates the need for memory-aware application mapping for CGRAs, and proposes an effective solution for application mapping that considers the effects of various memory architecture parameters including the number of banks, local memory size, and the communication bandwidth between the local memory and the external main memory. Further we propose efficient methods to handle dependent data on a double-buffering local memory, which is necessary for recurrent loops. Our proposed solution achieves 59% reduction in the energy-delay product, which factors into about 47% and 22% reduction in the energy consumption and runtime, respectively, as compared to memory-unaware mapping for realistic local memory architectures. We also show that our scheme scales across a range of applications and memory parameters, and the runtime overhead of handling recurrent loops by our proposed methods can be less than 1%.
AB - Coarse-grained reconfigurable arrays (CGRAs) are a very promising platform, providing both up to 10-100 MOps/mW of power efficiency and software programmability. However, this promise of CGRAs critically hinges on the effectiveness of application mapping onto CGRA platforms. While previous solutions have greatly improved the computation speed, they have largely ignored the impact of the local memory architecture on the achievable power and performance. This paper motivates the need for memory-aware application mapping for CGRAs, and proposes an effective solution for application mapping that considers the effects of various memory architecture parameters including the number of banks, local memory size, and the communication bandwidth between the local memory and the external main memory. Further we propose efficient methods to handle dependent data on a double-buffering local memory, which is necessary for recurrent loops. Our proposed solution achieves 59% reduction in the energy-delay product, which factors into about 47% and 22% reduction in the energy consumption and runtime, respectively, as compared to memory-unaware mapping for realistic local memory architectures. We also show that our scheme scales across a range of applications and memory parameters, and the runtime overhead of handling recurrent loops by our proposed methods can be less than 1%.
KW - Array mapping
KW - bank conflict
KW - coarse-grained reconfigurable architecture
KW - compilation
KW - multi-bank memory
UR - http://www.scopus.com/inward/record.url?scp=80054828144&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80054828144&partnerID=8YFLogxK
U2 - 10.1109/TCAD.2011.2161217
DO - 10.1109/TCAD.2011.2161217
M3 - Article
AN - SCOPUS:80054828144
SN - 0278-0070
VL - 30
SP - 1599
EP - 1609
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 11
M1 - 6046176
ER -