Remote Atomic Extension (RAE) for scalable high performance computing

Xi Wang; Brody Williams; John D. Leidel; Alan Ehret; Michel Kinsy; Yong Chen

doi:10.1109/DAC18072.2020.9218589

Remote Atomic Extension (RAE) for scalable high performance computing

Xi Wang, Brody Williams, John D. Leidel, Alan Ehret, Michel Kinsy, Yong Chen

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

5 Scopus citations

Abstract

Emerging data-intensive applications such as graph analytics, machine learning, and data-driven scientific computing are driving the evolution of high-performance computing (HPC) systems from monolithic to scaled-out, heterogeneous, and complex architectures. In these systems, enormous data sets are mapped to discrete nodes to improve the performance of the system by using distributed storage and computing resources. As such, these data distributions induce frequent cross-node data transactions which challenge the performance of large-scale systems. Global atomic operations are one emerging class of the remote data operations that enable lock-free remote shared data operations. However, the cross-node read-modify-write operations consist of multiple distinct data operations and specific atomicity management, which induces a large amount of overhead. As such, these global atomic operations require an efficient communication methodology Existing advanced compo-nents, such as network interface controllers, network fabrics, network-on-chip (NoC) interconnects, are architected together to improve the system performance. However, complex software infrastructures are needed to provide integration between each discrete component. As a result, the redundant software routines across distinct devices induce a large amount of overhead that causes performance degradationIn this paper, we propose a remote atomic extension (RAE) design that provides inherent ISA-level instructions and micro-architecture support for remote atomic operations based on the RISC-V instruction set architecture (ISA). We design a toolchain and evaluate the RAE infrastructure via simulation. Our experiment results show that RAE eliminates 89.71% of the redundant software instructions used for remote atomic accesses and improves the performance by 17.61% on average (up to 23.35%), compared with the OpenSHMEM.

Original language	English (US)
Title of host publication	2020 57th ACM/IEEE Design Automation Conference, DAC 2020
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781450367257
DOIs	https://doi.org/10.1109/DAC18072.2020.9218589
State	Published - Jul 2020
Externally published	Yes
Event	57th ACM/IEEE Design Automation Conference, DAC 2020 - Virtual, San Francisco, United States Duration: Jul 20 2020 → Jul 24 2020

Publication series

Name	Proceedings - Design Automation Conference
Volume	2020-July
ISSN (Print)	0738-100X

Conference

Conference	57th ACM/IEEE Design Automation Conference, DAC 2020
Country/Territory	United States
City	Virtual, San Francisco
Period	7/20/20 → 7/24/20

ASJC Scopus subject areas

Computer Science Applications
Control and Systems Engineering
Electrical and Electronic Engineering
Modeling and Simulation

Access to Document

10.1109/DAC18072.2020.9218589

Cite this

Wang, X., Williams, B., Leidel, J. D., Ehret, A., Kinsy, M., & Chen, Y. (2020). Remote Atomic Extension (RAE) for scalable high performance computing. In 2020 57th ACM/IEEE Design Automation Conference, DAC 2020 Article 9218589 (Proceedings - Design Automation Conference; Vol. 2020-July). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/DAC18072.2020.9218589

Remote Atomic Extension (RAE) for scalable high performance computing. / Wang, Xi; Williams, Brody; Leidel, John D. et al.
2020 57th ACM/IEEE Design Automation Conference, DAC 2020. Institute of Electrical and Electronics Engineers Inc., 2020. 9218589 (Proceedings - Design Automation Conference; Vol. 2020-July).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Wang, X, Williams, B, Leidel, JD, Ehret, A, Kinsy, M & Chen, Y 2020, Remote Atomic Extension (RAE) for scalable high performance computing. in 2020 57th ACM/IEEE Design Automation Conference, DAC 2020., 9218589, Proceedings - Design Automation Conference, vol. 2020-July, Institute of Electrical and Electronics Engineers Inc., 57th ACM/IEEE Design Automation Conference, DAC 2020, Virtual, San Francisco, United States, 7/20/20. https://doi.org/10.1109/DAC18072.2020.9218589

@inproceedings{2dfa6b0b06de47f59800c4d006ad8ddd,

title = "Remote Atomic Extension (RAE) for scalable high performance computing",

abstract = "Emerging data-intensive applications such as graph analytics, machine learning, and data-driven scientific computing are driving the evolution of high-performance computing (HPC) systems from monolithic to scaled-out, heterogeneous, and complex architectures. In these systems, enormous data sets are mapped to discrete nodes to improve the performance of the system by using distributed storage and computing resources. As such, these data distributions induce frequent cross-node data transactions which challenge the performance of large-scale systems. Global atomic operations are one emerging class of the remote data operations that enable lock-free remote shared data operations. However, the cross-node read-modify-write operations consist of multiple distinct data operations and specific atomicity management, which induces a large amount of overhead. As such, these global atomic operations require an efficient communication methodology Existing advanced compo-nents, such as network interface controllers, network fabrics, network-on-chip (NoC) interconnects, are architected together to improve the system performance. However, complex software infrastructures are needed to provide integration between each discrete component. As a result, the redundant software routines across distinct devices induce a large amount of overhead that causes performance degradationIn this paper, we propose a remote atomic extension (RAE) design that provides inherent ISA-level instructions and micro-architecture support for remote atomic operations based on the RISC-V instruction set architecture (ISA). We design a toolchain and evaluate the RAE infrastructure via simulation. Our experiment results show that RAE eliminates 89.71% of the redundant software instructions used for remote atomic accesses and improves the performance by 17.61% on average (up to 23.35%), compared with the OpenSHMEM.",

author = "Xi Wang and Brody Williams and Leidel, {John D.} and Alan Ehret and Michel Kinsy and Yong Chen",

note = "Funding Information: ACKNOWLEDGMENT We are thankful to the anonymous reviewers for their valuable feedback. This research is supported in part by the National Science Foundation under grant CNS-1338078, CNS-1362134, CCF-1409946, CCF-1718336, OAC-1835892, and CNS-1817094. Publisher Copyright: {\textcopyright} 2020 IEEE.; 57th ACM/IEEE Design Automation Conference, DAC 2020 ; Conference date: 20-07-2020 Through 24-07-2020",

year = "2020",

month = jul,

doi = "10.1109/DAC18072.2020.9218589",

language = "English (US)",

series = "Proceedings - Design Automation Conference",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2020 57th ACM/IEEE Design Automation Conference, DAC 2020",

}

TY - GEN

T1 - Remote Atomic Extension (RAE) for scalable high performance computing

AU - Wang, Xi

AU - Williams, Brody

AU - Leidel, John D.

AU - Ehret, Alan

AU - Kinsy, Michel

AU - Chen, Yong

N1 - Funding Information: ACKNOWLEDGMENT We are thankful to the anonymous reviewers for their valuable feedback. This research is supported in part by the National Science Foundation under grant CNS-1338078, CNS-1362134, CCF-1409946, CCF-1718336, OAC-1835892, and CNS-1817094. Publisher Copyright: © 2020 IEEE.

PY - 2020/7

Y1 - 2020/7

N2 - Emerging data-intensive applications such as graph analytics, machine learning, and data-driven scientific computing are driving the evolution of high-performance computing (HPC) systems from monolithic to scaled-out, heterogeneous, and complex architectures. In these systems, enormous data sets are mapped to discrete nodes to improve the performance of the system by using distributed storage and computing resources. As such, these data distributions induce frequent cross-node data transactions which challenge the performance of large-scale systems. Global atomic operations are one emerging class of the remote data operations that enable lock-free remote shared data operations. However, the cross-node read-modify-write operations consist of multiple distinct data operations and specific atomicity management, which induces a large amount of overhead. As such, these global atomic operations require an efficient communication methodology Existing advanced compo-nents, such as network interface controllers, network fabrics, network-on-chip (NoC) interconnects, are architected together to improve the system performance. However, complex software infrastructures are needed to provide integration between each discrete component. As a result, the redundant software routines across distinct devices induce a large amount of overhead that causes performance degradationIn this paper, we propose a remote atomic extension (RAE) design that provides inherent ISA-level instructions and micro-architecture support for remote atomic operations based on the RISC-V instruction set architecture (ISA). We design a toolchain and evaluate the RAE infrastructure via simulation. Our experiment results show that RAE eliminates 89.71% of the redundant software instructions used for remote atomic accesses and improves the performance by 17.61% on average (up to 23.35%), compared with the OpenSHMEM.

AB - Emerging data-intensive applications such as graph analytics, machine learning, and data-driven scientific computing are driving the evolution of high-performance computing (HPC) systems from monolithic to scaled-out, heterogeneous, and complex architectures. In these systems, enormous data sets are mapped to discrete nodes to improve the performance of the system by using distributed storage and computing resources. As such, these data distributions induce frequent cross-node data transactions which challenge the performance of large-scale systems. Global atomic operations are one emerging class of the remote data operations that enable lock-free remote shared data operations. However, the cross-node read-modify-write operations consist of multiple distinct data operations and specific atomicity management, which induces a large amount of overhead. As such, these global atomic operations require an efficient communication methodology Existing advanced compo-nents, such as network interface controllers, network fabrics, network-on-chip (NoC) interconnects, are architected together to improve the system performance. However, complex software infrastructures are needed to provide integration between each discrete component. As a result, the redundant software routines across distinct devices induce a large amount of overhead that causes performance degradationIn this paper, we propose a remote atomic extension (RAE) design that provides inherent ISA-level instructions and micro-architecture support for remote atomic operations based on the RISC-V instruction set architecture (ISA). We design a toolchain and evaluate the RAE infrastructure via simulation. Our experiment results show that RAE eliminates 89.71% of the redundant software instructions used for remote atomic accesses and improves the performance by 17.61% on average (up to 23.35%), compared with the OpenSHMEM.

UR - http://www.scopus.com/inward/record.url?scp=85093984399&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85093984399&partnerID=8YFLogxK

U2 - 10.1109/DAC18072.2020.9218589

DO - 10.1109/DAC18072.2020.9218589

M3 - Conference contribution

AN - SCOPUS:85093984399

T3 - Proceedings - Design Automation Conference

BT - 2020 57th ACM/IEEE Design Automation Conference, DAC 2020

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 57th ACM/IEEE Design Automation Conference, DAC 2020

Y2 - 20 July 2020 through 24 July 2020

ER -

Remote Atomic Extension (RAE) for scalable high performance computing

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this