Remote Atomic Extension (RAE) for scalable high performance computing

Xi Wang, Brody Williams, John D. Leidel, Alan Ehret, Michel Kinsy, Yong Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Emerging data-intensive applications such as graph analytics, machine learning, and data-driven scientific computing are driving the evolution of high-performance computing (HPC) systems from monolithic to scaled-out, heterogeneous, and complex architectures. In these systems, enormous data sets are mapped to discrete nodes to improve the performance of the system by using distributed storage and computing resources. As such, these data distributions induce frequent cross-node data transactions which challenge the performance of large-scale systems. Global atomic operations are one emerging class of the remote data operations that enable lock-free remote shared data operations. However, the cross-node read-modify-write operations consist of multiple distinct data operations and specific atomicity management, which induces a large amount of overhead. As such, these global atomic operations require an efficient communication methodology Existing advanced compo-nents, such as network interface controllers, network fabrics, network-on-chip (NoC) interconnects, are architected together to improve the system performance. However, complex software infrastructures are needed to provide integration between each discrete component. As a result, the redundant software routines across distinct devices induce a large amount of overhead that causes performance degradationIn this paper, we propose a remote atomic extension (RAE) design that provides inherent ISA-level instructions and micro-architecture support for remote atomic operations based on the RISC-V instruction set architecture (ISA). We design a toolchain and evaluate the RAE infrastructure via simulation. Our experiment results show that RAE eliminates 89.71% of the redundant software instructions used for remote atomic accesses and improves the performance by 17.61% on average (up to 23.35%), compared with the OpenSHMEM.

Original languageEnglish (US)
Title of host publication2020 57th ACM/IEEE Design Automation Conference, DAC 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781450367257
DOIs
StatePublished - Jul 2020
Externally publishedYes
Event57th ACM/IEEE Design Automation Conference, DAC 2020 - Virtual, San Francisco, United States
Duration: Jul 20 2020Jul 24 2020

Publication series

NameProceedings - Design Automation Conference
Volume2020-July
ISSN (Print)0738-100X

Conference

Conference57th ACM/IEEE Design Automation Conference, DAC 2020
Country/TerritoryUnited States
CityVirtual, San Francisco
Period7/20/207/24/20

ASJC Scopus subject areas

  • Computer Science Applications
  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Remote Atomic Extension (RAE) for scalable high performance computing'. Together they form a unique fingerprint.

Cite this