TY - JOUR
T1 - Integrating cooling awareness with thermal aware workload placement for HPC data centers
AU - Banerjee, Ayan
AU - Mukherjee, Tridib
AU - Varsamopoulos, Georgios
AU - Gupta, Sandeep
N1 - Funding Information:
Further, the results presented in this paper are simulation based since actual experiments in physical data centers may require shutting down servers, which in turn may affect SLAs. In an ongoing project named BlueTool [35] ( http://impact.asu.edu/BlueTool/wiki/index.php/Main_Page ) funded by NSF infrastructure grant (CNS#0855277) Impact Lab is developing a data center testbed. Such a testbed can be used to evaluate the proposed algorithms in real scenarios and is considered as an important future work.
PY - 2011/6
Y1 - 2011/6
N2 - High Performance Computing (HPC) data centers are becoming increasingly dense; the associated power-density and energy consumption of their operation are increasing. Up to half of the total energy is attributed to cooling the data center; greening the data center operations to reduce both computing and cooling energy is imperative. To this effect, this paper integrates awareness of the dynamic behavior of the cooling unit with thermal awareness while performing spatial workload scheduling (i.e. workload placement) in HPC data centers. The paper first proposes a coordinated cooling-aware job placement and cooling management algorithm, Highest Thermostat Setting (HTS). HTS is aware of dynamic behavior of the Computer Room Air Conditioner (CRAC) units and places jobs to reduce cooling demands from the CRACs. HTS also dynamically updates the CRAC thermostat set point to reduce cooling energy consumption. Further, the Energy Inefficiency Ratio of SPatial job scheduling (a.k.a. job placement) algorithms, also referred as SP-EIR, is analyzed by comparing the total (computing + cooling) energy consumption incurred by the algorithms with the minimum possible energy consumption, while assuming that the job start times are already decided to meet the Service Level Agreements (SLAs). This analysis is performed for two cooling models, constant and dynamic, to show how the constant cooling model assumption in previous research misses out on opportunities to save energy. Simulation results based on power measurements and job traces from the ASU HPC data center show that: (i) HTS has 15% lower SP-EIR compared to LRH, a thermal-aware spatial scheduling algorithm; and (ii) in conjunction with FCFS-Backfill, HTS increases the throughput per unit energy by 6.89% and 5.56%, respectively, over LRH and MTDP (an energy-efficient spatial scheduling algorithm with server consolidation).
AB - High Performance Computing (HPC) data centers are becoming increasingly dense; the associated power-density and energy consumption of their operation are increasing. Up to half of the total energy is attributed to cooling the data center; greening the data center operations to reduce both computing and cooling energy is imperative. To this effect, this paper integrates awareness of the dynamic behavior of the cooling unit with thermal awareness while performing spatial workload scheduling (i.e. workload placement) in HPC data centers. The paper first proposes a coordinated cooling-aware job placement and cooling management algorithm, Highest Thermostat Setting (HTS). HTS is aware of dynamic behavior of the Computer Room Air Conditioner (CRAC) units and places jobs to reduce cooling demands from the CRACs. HTS also dynamically updates the CRAC thermostat set point to reduce cooling energy consumption. Further, the Energy Inefficiency Ratio of SPatial job scheduling (a.k.a. job placement) algorithms, also referred as SP-EIR, is analyzed by comparing the total (computing + cooling) energy consumption incurred by the algorithms with the minimum possible energy consumption, while assuming that the job start times are already decided to meet the Service Level Agreements (SLAs). This analysis is performed for two cooling models, constant and dynamic, to show how the constant cooling model assumption in previous research misses out on opportunities to save energy. Simulation results based on power measurements and job traces from the ASU HPC data center show that: (i) HTS has 15% lower SP-EIR compared to LRH, a thermal-aware spatial scheduling algorithm; and (ii) in conjunction with FCFS-Backfill, HTS increases the throughput per unit energy by 6.89% and 5.56%, respectively, over LRH and MTDP (an energy-efficient spatial scheduling algorithm with server consolidation).
KW - Cooling aware
KW - Dynamic cooling model
KW - Energy efficiency
KW - Energy efficiency metrics and bounds
KW - Spatial job scheduling
UR - http://www.scopus.com/inward/record.url?scp=79955576534&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79955576534&partnerID=8YFLogxK
U2 - 10.1016/j.suscom.2011.02.003
DO - 10.1016/j.suscom.2011.02.003
M3 - Article
AN - SCOPUS:79955576534
SN - 2210-5379
VL - 1
SP - 134
EP - 150
JO - Sustainable Computing: Informatics and Systems
JF - Sustainable Computing: Informatics and Systems
IS - 2
ER -