ID-cache: Instruction and memory divergence based cache management for GPUs

Akhil Arunkumar; Shin Ying Lee; Carole-Jean Wu

doi:10.1109/IISWC.2016.7581276

ID-cache: Instruction and memory divergence based cache management for GPUs

Akhil Arunkumar, Shin Ying Lee, Carole-Jean Wu

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

11 Scopus citations

Abstract

Modern graphic processing units (GPUs) are not only able to perform graphics rendering, but also perform general purpose parallel computations (GPGPUs). It has been shown that the GPU L1 data cache and the on chip interconnect bandwidth are important sources of performance bottlenecks and inefficiencies in GPGPUs. Through this work, we aim to understand the sources of inefficiencies and possible opportunities for more efficient cache and interconnect bandwidth management on the GPUs. We do so by understanding the predictability of reuse behavior and spatial utilization of cache lines using program level information such as the instruction PC, and runtime behavior such as the extent of memory divergence. Through our characterization results, we demonstrate that a) PC, and memory divergence can be used to efficiently bypass zero reuse cache lines from the cache; b) memory divergence information can further be used to dynamically insert cache lines of varying size granularities based on their spatial utilization. Finally, based on the insights derived through our characterization, we design a simple Instruction and memory Divergence cache management method that is able to achieve an average of 71% performance improvement for a wide variety of cache and interconnect sensitive applications.

Original language	English (US)
Title of host publication	Proceedings of the 2016 IEEE International Symposium on Workload Characterization, IISWC 2016
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	158-167
Number of pages	10
ISBN (Electronic)	9781509038954
DOIs	https://doi.org/10.1109/IISWC.2016.7581276
State	Published - Oct 3 2016
Event	2016 IEEE International Symposium on Workload Characterization, IISWC 2016 - Providence, United States Duration: Sep 25 2016 → Sep 27 2016

Publication series

Name	Proceedings of the 2016 IEEE International Symposium on Workload Characterization, IISWC 2016

Other

Other	2016 IEEE International Symposium on Workload Characterization, IISWC 2016
Country/Territory	United States
City	Providence
Period	9/25/16 → 9/27/16

ASJC Scopus subject areas

Hardware and Architecture

Access to Document

10.1109/IISWC.2016.7581276

Cite this

Arunkumar, A., Lee, S. Y., & Wu, C.-J. (2016). ID-cache: Instruction and memory divergence based cache management for GPUs. In Proceedings of the 2016 IEEE International Symposium on Workload Characterization, IISWC 2016 (pp. 158-167). Article 7581276 (Proceedings of the 2016 IEEE International Symposium on Workload Characterization, IISWC 2016). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IISWC.2016.7581276

ID-cache: Instruction and memory divergence based cache management for GPUs. / Arunkumar, Akhil; Lee, Shin Ying; Wu, Carole-Jean.
Proceedings of the 2016 IEEE International Symposium on Workload Characterization, IISWC 2016. Institute of Electrical and Electronics Engineers Inc., 2016. p. 158-167 7581276 (Proceedings of the 2016 IEEE International Symposium on Workload Characterization, IISWC 2016).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Arunkumar, A, Lee, SY & Wu, C-J 2016, ID-cache: Instruction and memory divergence based cache management for GPUs. in Proceedings of the 2016 IEEE International Symposium on Workload Characterization, IISWC 2016., 7581276, Proceedings of the 2016 IEEE International Symposium on Workload Characterization, IISWC 2016, Institute of Electrical and Electronics Engineers Inc., pp. 158-167, 2016 IEEE International Symposium on Workload Characterization, IISWC 2016, Providence, United States, 9/25/16. https://doi.org/10.1109/IISWC.2016.7581276

Arunkumar A, Lee SY, Wu CJ. ID-cache: Instruction and memory divergence based cache management for GPUs. In Proceedings of the 2016 IEEE International Symposium on Workload Characterization, IISWC 2016. Institute of Electrical and Electronics Engineers Inc. 2016. p. 158-167. 7581276. (Proceedings of the 2016 IEEE International Symposium on Workload Characterization, IISWC 2016). doi: 10.1109/IISWC.2016.7581276

Arunkumar, Akhil ; Lee, Shin Ying ; Wu, Carole-Jean. / ID-cache : Instruction and memory divergence based cache management for GPUs. Proceedings of the 2016 IEEE International Symposium on Workload Characterization, IISWC 2016. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 158-167 (Proceedings of the 2016 IEEE International Symposium on Workload Characterization, IISWC 2016).

@inproceedings{b8a58212b95f45129826ac67020e51ab,

title = "ID-cache: Instruction and memory divergence based cache management for GPUs",

abstract = "Modern graphic processing units (GPUs) are not only able to perform graphics rendering, but also perform general purpose parallel computations (GPGPUs). It has been shown that the GPU L1 data cache and the on chip interconnect bandwidth are important sources of performance bottlenecks and inefficiencies in GPGPUs. Through this work, we aim to understand the sources of inefficiencies and possible opportunities for more efficient cache and interconnect bandwidth management on the GPUs. We do so by understanding the predictability of reuse behavior and spatial utilization of cache lines using program level information such as the instruction PC, and runtime behavior such as the extent of memory divergence. Through our characterization results, we demonstrate that a) PC, and memory divergence can be used to efficiently bypass zero reuse cache lines from the cache; b) memory divergence information can further be used to dynamically insert cache lines of varying size granularities based on their spatial utilization. Finally, based on the insights derived through our characterization, we design a simple Instruction and memory Divergence cache management method that is able to achieve an average of 71% performance improvement for a wide variety of cache and interconnect sensitive applications.",

author = "Akhil Arunkumar and Lee, {Shin Ying} and Carole-Jean Wu",

note = "Funding Information: The authors would like to thank the anonymous reviewers for their insightful feedback. This work is supported in part by the National Science Foundation (Grant #CCF-1618039) and by Science Foundation Arizona under the Bisgrove Early Career Scholarship. Publisher Copyright: {\textcopyright} 2016 IEEE.; 2016 IEEE International Symposium on Workload Characterization, IISWC 2016 ; Conference date: 25-09-2016 Through 27-09-2016",

year = "2016",

month = oct,

day = "3",

doi = "10.1109/IISWC.2016.7581276",

language = "English (US)",

series = "Proceedings of the 2016 IEEE International Symposium on Workload Characterization, IISWC 2016",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "158--167",

booktitle = "Proceedings of the 2016 IEEE International Symposium on Workload Characterization, IISWC 2016",

}

TY - GEN

T1 - ID-cache

T2 - 2016 IEEE International Symposium on Workload Characterization, IISWC 2016

AU - Arunkumar, Akhil

AU - Lee, Shin Ying

AU - Wu, Carole-Jean

N1 - Funding Information: The authors would like to thank the anonymous reviewers for their insightful feedback. This work is supported in part by the National Science Foundation (Grant #CCF-1618039) and by Science Foundation Arizona under the Bisgrove Early Career Scholarship. Publisher Copyright: © 2016 IEEE.

PY - 2016/10/3

Y1 - 2016/10/3

N2 - Modern graphic processing units (GPUs) are not only able to perform graphics rendering, but also perform general purpose parallel computations (GPGPUs). It has been shown that the GPU L1 data cache and the on chip interconnect bandwidth are important sources of performance bottlenecks and inefficiencies in GPGPUs. Through this work, we aim to understand the sources of inefficiencies and possible opportunities for more efficient cache and interconnect bandwidth management on the GPUs. We do so by understanding the predictability of reuse behavior and spatial utilization of cache lines using program level information such as the instruction PC, and runtime behavior such as the extent of memory divergence. Through our characterization results, we demonstrate that a) PC, and memory divergence can be used to efficiently bypass zero reuse cache lines from the cache; b) memory divergence information can further be used to dynamically insert cache lines of varying size granularities based on their spatial utilization. Finally, based on the insights derived through our characterization, we design a simple Instruction and memory Divergence cache management method that is able to achieve an average of 71% performance improvement for a wide variety of cache and interconnect sensitive applications.

AB - Modern graphic processing units (GPUs) are not only able to perform graphics rendering, but also perform general purpose parallel computations (GPGPUs). It has been shown that the GPU L1 data cache and the on chip interconnect bandwidth are important sources of performance bottlenecks and inefficiencies in GPGPUs. Through this work, we aim to understand the sources of inefficiencies and possible opportunities for more efficient cache and interconnect bandwidth management on the GPUs. We do so by understanding the predictability of reuse behavior and spatial utilization of cache lines using program level information such as the instruction PC, and runtime behavior such as the extent of memory divergence. Through our characterization results, we demonstrate that a) PC, and memory divergence can be used to efficiently bypass zero reuse cache lines from the cache; b) memory divergence information can further be used to dynamically insert cache lines of varying size granularities based on their spatial utilization. Finally, based on the insights derived through our characterization, we design a simple Instruction and memory Divergence cache management method that is able to achieve an average of 71% performance improvement for a wide variety of cache and interconnect sensitive applications.

UR - http://www.scopus.com/inward/record.url?scp=84994779312&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994779312&partnerID=8YFLogxK

U2 - 10.1109/IISWC.2016.7581276

DO - 10.1109/IISWC.2016.7581276

M3 - Conference contribution

AN - SCOPUS:84994779312

T3 - Proceedings of the 2016 IEEE International Symposium on Workload Characterization, IISWC 2016

SP - 158

EP - 167

BT - Proceedings of the 2016 IEEE International Symposium on Workload Characterization, IISWC 2016

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 25 September 2016 through 27 September 2016

ER -

ID-cache: Instruction and memory divergence based cache management for GPUs

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this