Enabling multi-threaded applications on hybrid shared memory manycore architectures

Tushar Rawat, Aviral Shrivastava

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As the number of cores per chip increases, maintaining cache coherence becomes prohibitive for both power and performance. Non Coherent Cache (NCC) architectures do away with hardware-based cache coherence, but become difficult to program. Some existing architectures provide a middle ground by providing some shared memory in the hardware. Specifically, the 48-core Intel Single-chip Cloud Computer (SCC) provides some off-chip (DRAM) shared memory and some on-chip (SRAM) shared memory. We call such architectures Hybrid Shared Memory, or HSM, manycore architectures. However, how to efficiently execute multi-threaded programs on HSM architectures is an open problem. To be able to execute a multi-threaded program correctly on HSM architectures, the compiler must: i) identify all the shared data and map it to the shared memory, and ii) map the frequently accessed shared data to the on-chip shared memory. In this paper, we present a source-to-source translator written using CETUS (Dave et al. [1]) that identifies a conservative superset of all the shared data in a multi-threaded application, and maps it to the off-chip shared memory such that it enables execution on HSM architectures. This improves the performance of our benchmarks by 32x. Following, we identify and map the frequently accessed shared data to the on-chip shared memory. This further improves the performance of our benchmarks by 8x on average.

Original languageEnglish (US)
Title of host publicationProceedings -Design, Automation and Test in Europe, DATE
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages742-747
Number of pages6
Volume2015-April
ISBN (Print)9783981537048
StatePublished - Apr 22 2015
Event2015 Design, Automation and Test in Europe Conference and Exhibition, DATE 2015 - Grenoble, France
Duration: Mar 9 2015Mar 13 2015

Other

Other2015 Design, Automation and Test in Europe Conference and Exhibition, DATE 2015
CountryFrance
CityGrenoble
Period3/9/153/13/15

Fingerprint

Memory architecture
Data storage equipment
Dynamic random access storage
Static random access storage
Computer hardware
Hardware

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Rawat, T., & Shrivastava, A. (2015). Enabling multi-threaded applications on hybrid shared memory manycore architectures. In Proceedings -Design, Automation and Test in Europe, DATE (Vol. 2015-April, pp. 742-747). [7092485] Institute of Electrical and Electronics Engineers Inc..

Enabling multi-threaded applications on hybrid shared memory manycore architectures. / Rawat, Tushar; Shrivastava, Aviral.

Proceedings -Design, Automation and Test in Europe, DATE. Vol. 2015-April Institute of Electrical and Electronics Engineers Inc., 2015. p. 742-747 7092485.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Rawat, T & Shrivastava, A 2015, Enabling multi-threaded applications on hybrid shared memory manycore architectures. in Proceedings -Design, Automation and Test in Europe, DATE. vol. 2015-April, 7092485, Institute of Electrical and Electronics Engineers Inc., pp. 742-747, 2015 Design, Automation and Test in Europe Conference and Exhibition, DATE 2015, Grenoble, France, 3/9/15.
Rawat T, Shrivastava A. Enabling multi-threaded applications on hybrid shared memory manycore architectures. In Proceedings -Design, Automation and Test in Europe, DATE. Vol. 2015-April. Institute of Electrical and Electronics Engineers Inc. 2015. p. 742-747. 7092485
Rawat, Tushar ; Shrivastava, Aviral. / Enabling multi-threaded applications on hybrid shared memory manycore architectures. Proceedings -Design, Automation and Test in Europe, DATE. Vol. 2015-April Institute of Electrical and Electronics Engineers Inc., 2015. pp. 742-747
@inproceedings{ef0ec1d0f5f84d9898f6b81e9b3529b9,
title = "Enabling multi-threaded applications on hybrid shared memory manycore architectures",
abstract = "As the number of cores per chip increases, maintaining cache coherence becomes prohibitive for both power and performance. Non Coherent Cache (NCC) architectures do away with hardware-based cache coherence, but become difficult to program. Some existing architectures provide a middle ground by providing some shared memory in the hardware. Specifically, the 48-core Intel Single-chip Cloud Computer (SCC) provides some off-chip (DRAM) shared memory and some on-chip (SRAM) shared memory. We call such architectures Hybrid Shared Memory, or HSM, manycore architectures. However, how to efficiently execute multi-threaded programs on HSM architectures is an open problem. To be able to execute a multi-threaded program correctly on HSM architectures, the compiler must: i) identify all the shared data and map it to the shared memory, and ii) map the frequently accessed shared data to the on-chip shared memory. In this paper, we present a source-to-source translator written using CETUS (Dave et al. [1]) that identifies a conservative superset of all the shared data in a multi-threaded application, and maps it to the off-chip shared memory such that it enables execution on HSM architectures. This improves the performance of our benchmarks by 32x. Following, we identify and map the frequently accessed shared data to the on-chip shared memory. This further improves the performance of our benchmarks by 8x on average.",
author = "Tushar Rawat and Aviral Shrivastava",
year = "2015",
month = "4",
day = "22",
language = "English (US)",
isbn = "9783981537048",
volume = "2015-April",
pages = "742--747",
booktitle = "Proceedings -Design, Automation and Test in Europe, DATE",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Enabling multi-threaded applications on hybrid shared memory manycore architectures

AU - Rawat, Tushar

AU - Shrivastava, Aviral

PY - 2015/4/22

Y1 - 2015/4/22

N2 - As the number of cores per chip increases, maintaining cache coherence becomes prohibitive for both power and performance. Non Coherent Cache (NCC) architectures do away with hardware-based cache coherence, but become difficult to program. Some existing architectures provide a middle ground by providing some shared memory in the hardware. Specifically, the 48-core Intel Single-chip Cloud Computer (SCC) provides some off-chip (DRAM) shared memory and some on-chip (SRAM) shared memory. We call such architectures Hybrid Shared Memory, or HSM, manycore architectures. However, how to efficiently execute multi-threaded programs on HSM architectures is an open problem. To be able to execute a multi-threaded program correctly on HSM architectures, the compiler must: i) identify all the shared data and map it to the shared memory, and ii) map the frequently accessed shared data to the on-chip shared memory. In this paper, we present a source-to-source translator written using CETUS (Dave et al. [1]) that identifies a conservative superset of all the shared data in a multi-threaded application, and maps it to the off-chip shared memory such that it enables execution on HSM architectures. This improves the performance of our benchmarks by 32x. Following, we identify and map the frequently accessed shared data to the on-chip shared memory. This further improves the performance of our benchmarks by 8x on average.

AB - As the number of cores per chip increases, maintaining cache coherence becomes prohibitive for both power and performance. Non Coherent Cache (NCC) architectures do away with hardware-based cache coherence, but become difficult to program. Some existing architectures provide a middle ground by providing some shared memory in the hardware. Specifically, the 48-core Intel Single-chip Cloud Computer (SCC) provides some off-chip (DRAM) shared memory and some on-chip (SRAM) shared memory. We call such architectures Hybrid Shared Memory, or HSM, manycore architectures. However, how to efficiently execute multi-threaded programs on HSM architectures is an open problem. To be able to execute a multi-threaded program correctly on HSM architectures, the compiler must: i) identify all the shared data and map it to the shared memory, and ii) map the frequently accessed shared data to the on-chip shared memory. In this paper, we present a source-to-source translator written using CETUS (Dave et al. [1]) that identifies a conservative superset of all the shared data in a multi-threaded application, and maps it to the off-chip shared memory such that it enables execution on HSM architectures. This improves the performance of our benchmarks by 32x. Following, we identify and map the frequently accessed shared data to the on-chip shared memory. This further improves the performance of our benchmarks by 8x on average.

UR - http://www.scopus.com/inward/record.url?scp=84945939139&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84945939139&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9783981537048

VL - 2015-April

SP - 742

EP - 747

BT - Proceedings -Design, Automation and Test in Europe, DATE

PB - Institute of Electrical and Electronics Engineers Inc.

ER -