Performance characterization, prediction, and optimization for heterogeneous systems with multi-level memory interference

Shin Ying Lee, Carole-Jean Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Scopus citations

Abstract

Modern computer systems are accelerator-rich, equipped with many types of hardware accelerators to speed up computation. For example, graphics processing units (GPUs) are a type of accelerators that are widely employed to accelerate parallel workloads. In order to well utilize different accelerators to gain better execution time speedup or reduce total energy consumption, many scheduling algorithms have been proposed to select the optimal target device to process an OpenCL kernel according to the kernel's individual characteristics. However, in a real computer system, there are a lot of workloads co-located together on a single machine and would be processed on different devices simultaneously. The CPU cores and accelerators may contend shared resources, such as the host main memory and shared last-level cache. Thus, it is not robust to schedule an OpenCL kernel execution by simply considering the characteristics of the kernel. To maximize the system throughput, it is important to consider the execution behavior of all co-located applications when performing OpenCL kernel execution scheduling. In this paper, we provide a detailed characterization study demonstrating that scheduling an OpenCL kernel to run on different devices can introduce varying performance impact to itself and the other co-located applications due to memory interference. Based on the characterization results, we then develop a light-weight, scalable performance degradation predictor specifically for heterogeneous computer systems, called HeteroPDP. HeteroPDP aims to dynamically predict and balance the execution time slowdown of all co-located applications in a heterogeneous computation environment. Our real system evaluation results show that comparing with always running an OpenCL kernel on the host CPU, HeteroPDP is able to achieve 3X execution time speedup when an OpenCL kernel runs alone and improve the system fairness from 24% to 65% when an OpenCL kernel is co-located with other applications.

Original languageEnglish (US)
Title of host publicationProceedings of the 2017 IEEE International Symposium on Workload Characterization, IISWC 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages43-53
Number of pages11
ISBN (Electronic)9781538612323
DOIs
StatePublished - Dec 5 2017
Event2017 IEEE International Symposium on Workload Characterization, IISWC 2017 - Seattle, United States
Duration: Oct 1 2017Oct 3 2017

Publication series

NameProceedings of the 2017 IEEE International Symposium on Workload Characterization, IISWC 2017
Volume2017-January

Other

Other2017 IEEE International Symposium on Workload Characterization, IISWC 2017
Country/TerritoryUnited States
CitySeattle
Period10/1/1710/3/17

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Performance characterization, prediction, and optimization for heterogeneous systems with multi-level memory interference'. Together they form a unique fingerprint.

Cite this