Integrating cooling awareness with thermal aware workload placement for HPC data centers

Ayan Banerjee, Tridib Mukherjee, Georgios Varsamopoulos, Sandeep Gupta

Research output: Contribution to journalArticlepeer-review

47 Scopus citations

Abstract

High Performance Computing (HPC) data centers are becoming increasingly dense; the associated power-density and energy consumption of their operation are increasing. Up to half of the total energy is attributed to cooling the data center; greening the data center operations to reduce both computing and cooling energy is imperative. To this effect, this paper integrates awareness of the dynamic behavior of the cooling unit with thermal awareness while performing spatial workload scheduling (i.e. workload placement) in HPC data centers. The paper first proposes a coordinated cooling-aware job placement and cooling management algorithm, Highest Thermostat Setting (HTS). HTS is aware of dynamic behavior of the Computer Room Air Conditioner (CRAC) units and places jobs to reduce cooling demands from the CRACs. HTS also dynamically updates the CRAC thermostat set point to reduce cooling energy consumption. Further, the Energy Inefficiency Ratio of SPatial job scheduling (a.k.a. job placement) algorithms, also referred as SP-EIR, is analyzed by comparing the total (computing + cooling) energy consumption incurred by the algorithms with the minimum possible energy consumption, while assuming that the job start times are already decided to meet the Service Level Agreements (SLAs). This analysis is performed for two cooling models, constant and dynamic, to show how the constant cooling model assumption in previous research misses out on opportunities to save energy. Simulation results based on power measurements and job traces from the ASU HPC data center show that: (i) HTS has 15% lower SP-EIR compared to LRH, a thermal-aware spatial scheduling algorithm; and (ii) in conjunction with FCFS-Backfill, HTS increases the throughput per unit energy by 6.89% and 5.56%, respectively, over LRH and MTDP (an energy-efficient spatial scheduling algorithm with server consolidation).

Original languageEnglish (US)
Pages (from-to)134-150
Number of pages17
JournalSustainable Computing: Informatics and Systems
Volume1
Issue number2
DOIs
StatePublished - Jun 2011

Keywords

  • Cooling aware
  • Dynamic cooling model
  • Energy efficiency
  • Energy efficiency metrics and bounds
  • Spatial job scheduling

ASJC Scopus subject areas

  • General Computer Science
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Integrating cooling awareness with thermal aware workload placement for HPC data centers'. Together they form a unique fingerprint.

Cite this